Deciphering column types in design documents

When I first started working, I’ve never even heard of design specifications. The few sentences of a programming question for a university assignment barely made it as design requirements. I think the longest description went slightly over half a page, and that’s because it was explaining some scientific logic behind the question.

So when I was first handed the design documents of an existing application system, my eyes kind of glazed over the arcane language… The first few pages were usually full of important-sounding sentences but really means very little to the programmer. Well, most of it anyway. They’re about how this application was to do X, because Y happened and Z wasn’t very happy about it, and application A could almost do the same thing except for condition B.

It wasn’t a critical period when I joined the team, so things were a bit quiet and I had time to learn. Have I told you I didn’t know a single thing about SQL at the time? I was picking that up too.

Flipping through the pages, I found a table describing column information. There were input files, and that table described the columns in the file. This was a few years ago, so the input files were what was termed “flat files”.

Each line in those files were of a fixed length, and each column occupies a specific position and a specific length in a line. The usual line types were the header, trailer, and data. The header and trailer lines were usually shorter than the data lines.

The header probably contains information such as

  • Timestamp of file (usually just the date it was generated)
  • Name of file
  • Application code (not our kind of code. Short acronym identifier of program)

The trailer probably contains information such as

  • Number of data lines (for reconciliation purposes)
  • Sum totals of stuff (monetary amount, duration and so on)
  • … you know, I think it’s usually just the above 2

Now the data lines were more interesting. They were loaded into the database, so the columns in the file usually match closely to that of the database table. Here’s where I both learned to read design documents and file formats, and picked up SQL all at one go…

Here are 2 examples:

9(8) with comment “ccyymmdd”. It means “8 numerals”, and the comment hints at … ? Century, last 2 digits of the year, the month, and the day.

9(6) with comment “ccyymm”. It means “6 numerals”, and I’m sure you can figure out what the comment means.

The “9” is a notation used to denote digits or numerals only. The number within brackets denote the number of digits. Let’s try…

X(9) which means 9 alphanumeric characters.
X(57) with comment “filler”. It means … ? 57 alphanumeric characters, probably just spaces because this column is a filler.

I have no idea why “X” denotes alphanumeric… For that matter, I don’t know why “9” is used to represent digits too. As for the filler column, remember the header and trailer lines? They are shorter than the data lines, so a column is specially made so that each line, whether it’s a header, data or trailer line, can fit snugly into one line. No, XML wasn’t invented yet… I think.

Now for some obscure ones…

9(7)v99 which means there are 7 digits, followed by 2 digits.
9v9(5) means 1 digit, followed by 5 digits.

If they are all digits, what’s with the weird notation? The “v” means there’s an implied decimal point. So “9(7)v99” means a number which is up to 7 digits long, followed by 2 digits representing a number (below one) up to 2 decimal places.

Confused? “9(7)v99” is equivalent to numeric(9,2) in SQL-speak. 1234567.89 is an example.

So what’s the implied decimal point for? If I understand it correctly, the notations came from programming practices in COBOL, and the banking industry was making use of flat files to transfer data around. Since transmitting data was expensive (they didn’t have 500 gigabytes of hard disk space then…), every single byte counted.

Since it was understood that the figure in that particular column was a money value, the decimal point was taken out to save space. Tada! Instant saving of, I don’t know, tens and hundreds of kilobytes. And that practice flowed to other industries.

It’s a good thing my current team uses notations such as “char(8)”, “numeric(15,2)” and “int” to define column types. Hey wait, those look familiar…

  1. Ben Barden - Blog Tips

    Hey Vincent! Been a while. I didn’t realise you’d changed your theme. I like it. Love the favicon!

    In my last job, I had to deal with flat files too. Sure, I had an input file with fixed-width fields, field delimeters and some filler to make all the lines the same length.

    But I reckon I can outdo you on this one. On the very first job I did as a trainee programmer, the senior programmer was clearly unsure what I needed in order to define the field structure for the document production software I was in charge of. As a new programmer, all I knew was to ask for the field lengths, as that’s what they said on the training course. I couldn’t explain it any other way, but it did make sense.

    The senior programmer loaded a sample data file into a standard text editor in OpenVMS. She then did a screen dump, and pasted it into Word. Next, she printed out the Word document and gave it to me. Of course, the screen had automatically wrapped the lines so they didn’t go off the screen (because that would just be silly).

    The result was that I could see the order of the fields, which was nice, but I couldn’t figure out how long the fields were. There was no design document except for the contents of the letter that needed to go out. No field definitions anywhere, because they were fields that already existed in the system, and the programmers already knew the lengths… so why should they have to restate them in the document? But nobody showed me how to find out the field lengths. I didn’t stand a chance.

    So, I started trying to count the blank spaces between the fillers on the piece of paper. I remember measuring the distance of the fixed-width characters and drawing boxes of a similar size until I could figure out the field size. It was ridiculous. I couldn’t put my requirements into words, the senior programmer couldn’t figure out what I needed, and nobody explained things to me very well.

    I don’t know how I got it done, but I managed it somehow! Thankfully, I learned faster after that, although I did hit plenty of other confusing problems that took forever to figure out.

    Ben Barden – Blog Tipss last blog post..10 overused and abused HTML tags

Comments are closed.