Can your program survive reruns?

The ability to perform repetitive tasks in the exact same manner every time is one of the key advantages of a computer. Programs are written to behave in the exact same manner every time they’re run. Yet sometimes, they don’t.

So the question is, do you write code such that your program can survive rerunning itself?

What I mean is, if your program somehow fails, can it be rerun with minimal fuss on your part? The database equivalent is the transaction. Within a transaction, any insert, update or delete operation will either all complete or all fail. If some operation fails, everything is rolled back, and nothing is done. There’s zero fuss from your part in this case, although you might have to check why it failed.

Some programs when rerun (after failure), require a lot of intervention. I deal with batch programs that typically do the following

  • download files via FTP
  • validate file content
  • upload file content into database

In between those steps, there’s opening of file pointers to read file content and write logs. There’s updating of database tables of process status and timestamps. There’s retrieval of data, and use of temporary tables. There’s writing a bunch of output files.

The point is, there’s no transactional control present. If something failed somewhere, I had to dig through the log files, check up on the database table data, and check on any file output, just to see where it went wrong. Any operations prior to the error had already been done. Any corrections I needed to do will either be undoing all those prior operations, or set things up such that the program could continue from where it left off (assuming the program could continue from where it left off).

Sometimes, it could take hours to find out where the error was, and reset everything back to the state before the program was run. I had to make sure. Because if the second run was a failure again, it could be a result of my undoing of things. Or it could be a result of the same error again. Or even a different error. I wouldn’t know.

These programs are usually the legacy programs. I guess the previous programmers were very optimistic, that the programs would always run as intended. I’ve seen my colleague lose an entire day just to patch things up so the program could be run again.

Sometimes, you have to change something, like modifying the file sequence number stored in the database so that on the next run, the correct file is retrieved. That’s fine. What you should concentrate on is minimal fuss.

Simple instructions commented in the program code is a way of alleviating future agonies. Something like “update this, delete that and run again with this parameter”. It’s beneficial to you and future programmers maintaining the code.

Do you have suggestions to write code that survives reruns? Leave your comments!