Heaps of by-products

Do you maintain many programs? Do you maintain many programs, each producing an inordinate amount of log files, output files, intermediate files and database records? Do your programs create heaps of digital by-products?

Hansel and Gretel

Are you paranoid? No? Then stop spawning unnecessary log files and other output. I’ve seen a simple requirement being mangled into several programs, and each program absolutely must create a log file and an output file for the next program to consume as input. There were database tables to check for sequencing purposes, holding tables for data and tables to store the progress of the programs. All for one simple requirement.

The original programmer might have wanted to modularise the requirement. Fine. But an entire program to take the place of a function?

Perhaps the programmer designed it that way so it’s easy to debug and trace errors. Hence the copious littering of log files to keep track of where the program executed. Like Hansel and Gretel leaving breadcrumbs. Then why were the log files not containing enough useful information?

Stop creating digital waste by-products for the sake of creating them. If that intermediate step doesn’t require a log file, don’t write one “just in case”, or just so you can “keep track”.

Disk space may be getting cheaper, but that’s not the point. The next programmer have to wade through heaps of rubbish just to find out what went wrong. And that’s without looking at the source code.

Databases do more than store data

“But what if I need intermediate output files?” you ask.

For?

“The Unix shell script can sort data records in a text file.”

So can a database. That’s what the “order by” clause is for.

“I need to concatenate records from multiple files.”

So can a database. That’s what tables are for, inserting and storing data.

“I need to check for duplicates, and the duplicates can be stored in a text file and I can use shell scripts to sort and…”

So can a database! In fact, a database can probably do a better job than whatever crazy algorithm you can come up with. Primary keys can prevent duplicates. If you still want to store and keep track of the duplicate records, a simple “group by” clause can retrieve the duplicates easily.

People seem to forget that databases can be used to do more than just store data. You can perform a good many different operations in a database environment.

“But wouldn’t a C program run through a file and sum up a number in a certain column based on certain criteria be faster to execute on the Unix machine?”

Perhaps. I don’t understand why the concern over performance and memory management, and that Unix C programs and shell scripts are better than anything done on Windows (no this isn’t a *nix/Windows comparison/war), and better than anything done by a database engine. Use the right tools.

It’s hard to parse a text file. Yes, I know all about the magical grep and other commands of the Unix environment. And there are many text files, with little to help narrow down searches.

In the database environment, I can sort records, do simple math calculations, do complicated update statements involving several tables. I can pump all bad records to another table in a single statement. I can remove bad records from a table in a single statement.

And I can search records, even by a specific column. How do you do that with a normal input text file? Use the right tools. I seem to have said this before…

Landfills and the Delete button

Physical waste is traditionally buried in landfills. We’re starting to run out of land. Digital waste is handled by the ubiquitous Delete button. And we seem to have no shortage of disk space.

Do not abuse this freedom in treating “stuff”. Just because you can create stuff in the digital landscape doesn’t mean you have to create trash.