Deciphering column types in design documents

When I first started working, I’ve never even heard of design specifications. The few sentences of a programming question for a university assignment barely made it as design requirements. I think the longest description went slightly over half a page, and that’s because it was explaining some scientific logic behind the question.

So when I was first handed the design documents of an existing application system, my eyes kind of glazed over the arcane language… The first few pages were usually full of important-sounding sentences but really means very little to the programmer. Well, most of it anyway. They’re about how this application was to do X, because Y happened and Z wasn’t very happy about it, and application A could almost do the same thing except for condition B.

It wasn’t a critical period when I joined the team, so things were a bit quiet and I had time to learn. Have I told you I didn’t know a single thing about SQL at the time? I was picking that up too.

Flipping through the pages, I found a table describing column information. There were input files, and that table described the columns in the file. This was a few years ago, so the input files were what was termed “flat files”.

Each line in those files were of a fixed length, and each column occupies a specific position and a specific length in a line. The usual line types were the header, trailer, and data. The header and trailer lines were usually shorter than the data lines.

The header probably contains information such as

  • Timestamp of file (usually just the date it was generated)
  • Name of file
  • Application code (not our kind of code. Short acronym identifier of program)

The trailer probably contains information such as

  • Number of data lines (for reconciliation purposes)
  • Sum totals of stuff (monetary amount, duration and so on)
  • … you know, I think it’s usually just the above 2

Now the data lines were more interesting. They were loaded into the database, so the columns in the file usually match closely to that of the database table. Here’s where I both learned to read design documents and file formats, and picked up SQL all at one go…

Here are 2 examples:

9(8) with comment “ccyymmdd”. It means “8 numerals”, and the comment hints at … ? Century, last 2 digits of the year, the month, and the day.

9(6) with comment “ccyymm”. It means “6 numerals”, and I’m sure you can figure out what the comment means.

The “9” is a notation used to denote digits or numerals only. The number within brackets denote the number of digits. Let’s try…

X(9) which means 9 alphanumeric characters.
X(57) with comment “filler”. It means … ? 57 alphanumeric characters, probably just spaces because this column is a filler.

I have no idea why “X” denotes alphanumeric… For that matter, I don’t know why “9” is used to represent digits too. As for the filler column, remember the header and trailer lines? They are shorter than the data lines, so a column is specially made so that each line, whether it’s a header, data or trailer line, can fit snugly into one line. No, XML wasn’t invented yet… I think.

Now for some obscure ones…

9(7)v99 which means there are 7 digits, followed by 2 digits.
9v9(5) means 1 digit, followed by 5 digits.

If they are all digits, what’s with the weird notation? The “v” means there’s an implied decimal point. So “9(7)v99” means a number which is up to 7 digits long, followed by 2 digits representing a number (below one) up to 2 decimal places.

Confused? “9(7)v99” is equivalent to numeric(9,2) in SQL-speak. 1234567.89 is an example.

So what’s the implied decimal point for? If I understand it correctly, the notations came from programming practices in COBOL, and the banking industry was making use of flat files to transfer data around. Since transmitting data was expensive (they didn’t have 500 gigabytes of hard disk space then…), every single byte counted.

Since it was understood that the figure in that particular column was a money value, the decimal point was taken out to save space. Tada! Instant saving of, I don’t know, tens and hundreds of kilobytes. And that practice flowed to other industries.

It’s a good thing my current team uses notations such as “char(8)”, “numeric(15,2)” and “int” to define column types. Hey wait, those look familiar…

Beginning C# – Variables and operations

So you’re sick with Hello World program code, and want to start doing something. Before you go writing your NASA-approved quantum space engine or the next blockbuster financial application, you have to know about how programs store and manipulate data.

Program data (such as rocket speed or stock prices) are stored in variables, which can be thought of as boxes holding information. We remember information differently from a computer. For example, the information “15” can be a number to us. Or part of an address such as “street 15”. Or 15 dollars. We can pretty much move this data around in our heads.

Here’s the thing: computers need structure. They must know that 15 is a number, and will only store 15 as a number. If 15 is to be part of an address, it must be stored as an address. Moving data from one form to another usually requires telling the computer explicitly to do so, which means you have to write specific code to do that. It’s easier to talk to a computer in its own terms, so we’ll learn about …

Variable types
There are a few variable types, and the most commonly used are those storing numbers and text. There are 2 kinds of number variables; whole numbers and numbers with decimal points (also known as real numbers). In C#, the whole number variables are byte, short and int. The other kind has float, double and decimal. There are more, but these are the common ones.

Why is there a difference? Because computers think, store and manipulate them differently. To the computer, a 7 and a 7.00 are two very different pieces of information. This is very important, especially if you’re doing mathematical calculations.

So we’ll go through them a little bit

  • byte – a whole number between 0 and 255 inclusive
  • short – a whole number between -32768 and 32767 inclusive
  • int – a whole number between -2147483648 and 2147483647 inclusive
  • float – a real number approximately between ±10-45 and ±1038
  • double – a real number approximately between ±10-323 and ±10308
  • decimal – a real number approximately between ±10-28 and ±1028

Why are the real numbers approximations? Again it’s due to the way computers store data. float and double variable types implement the IEEE standard. For now, we’ll simply learn more about these variable types through practical use. Which brings us to the sample code.

using System;
using System.Collections.Generic;
using System.Text;

namespace VariablesAndOperations
    class Program
        static void Main(string[] args)
            int number;
            double anothernumber, yetanothernumber;

            number = 5;
            anothernumber = 3.14159;
            yetanothernumber = 2 * anothernumber;

            number = 3 + 8;

            // This indicates the end of the program

            Console.WriteLine("End of program");

A closer look
The first two lines inside the Main function is

int number;
double anothernumber, yetanothernumber;

This is how we declare our variables. In this case, we tell the computer we want a variable of type int and we name it “number”. We also declare two variables of type double, named (unimaginatively) “anothernumber” and “yetanothernumber”. Variables of the same type can be declared on the same line by separating them with a comma.

The next three lines are

number = 5;
anothernumber = 3.14159;
yetanothernumber = 2 * anothernumber;

This is where we assign values to our variables. So we want the integer 5 to be stored in “number”. Then we want 3.14159 to be stored in “anothernumber”. The next line is more interesting. We want two times the value stored in “anothernumber” to be stored in “yetanothernumber”. This means we can assign any appropriate value to a variable, even if the value is stored in another variable.

Manipulating data basically involves mathematical operations (although they are others). They are addition, subtraction, multiplication and division. Their respective operations in code are +, -, * and /.

Skipping a couple lines down, we encounter the double forward slash //. This simply tells the computer that whatever follows after the // on the same line is to be ignored, because it’s meant to be read by a human.

Additional note
If you’ve tried running the previous Hello World program straight from the directory by double clicking, you’ll find that it comes up and then disappears. Which is what the following lines will prevent:

Console.WriteLine("End of program");

By printing “End of program”, you’ll know that the program has indeed reached the end. The Console.ReadLine(); tells the computer to wait for input till the Enter/Return key is hit. This automatically solves the disappearing problem.

Here’s the source code for you to play with.

Try assigning

number = 1 / 4;
anothernumber = 1.0 / 4.0;

and print out the answer! Can you figure out why the values aren’t what you think they should be?