If not using the database, please disconnect

I’m maintaining some Windows programs created by the PowerBuilder software. The original developer didn’t plan for the programs to be used by many people. So the instant one of the programs was run, a database connection to the Sybase database was opened. And left there.

As more programs were created in this manner (and added to the suite of programs my team is in charge of), the number of total users also increased. Since the connections were held in place, table locks between users became a real problem, because a user could be done with an operation, but still hold onto the table. This also meant the database became clogged up with connections, usually non-active.

The better solution is to open the connection when you’re going to do any database operation, and then close it once you’re done. But the original programs were developed like eons ago. If I understand it correctly, client programs back then assumed they had total control over the database. Contrast that with the web applications of today, and let me just say that, I have my work cut out for me…

I decided to write something on this after reading Raymond’s article on cookie licking. So if you’re not using any database functions, please disconnect.

If you depend on order, use an order by

It was an uneventful morning. All of us were at our computers, softly typing in the noiseless office. Tippity tap, tippity tap… (Except for me, since I was cranking out code to complete a project by a deadline, and was furiously testing the physical limits of the keyboard. But I digress…)

Dramatically lit office phone
[image by tysmith]

A phone rang, breaking the silence.

“Hello? Yes… Uh huh… REALLY?!”, answered my colleague.

What followed was a whirlwind of exclamations and activity back and forth between two of my colleagues as they discussed the situation. Some description of something in a report wasn’t correct, as reported by a user, and they were tracing the origins of the error. I didn’t find out what the problem was, since I had my own problems to deal with (tap-tap-tap).

Even with my earphones on, I could still vaguely figure out what’s going on. It’s not that they’re loud (although there’s that…), just that I’m aware of my surroundings. The problem boiled down to a select statement.

Let’s say the database table looked like this:

create table product
(
product_id char(8) not null,
effective_date datetime not null,
product_description varchar(50) not null
)
alter table product primary key (product_id, effective_date)

Based on the narrative I gave, and the structure of the table, you should reasonably be able to figure out the issue. No?

Alright, so the product description in a report wasn’t correct. There’s an effective date column, so I would think the record with the latest effective date was more relevant. What if the select statement had a where clause only on the product_id?

select product_id, product_description
from product
where product_id = 'PROD0001'

Based on the table structure and the primary key, there’s every reason to believe that there were multiple records with the same product ID. Why the original programmer failed to take note of this is beyond me…

In the case where there were 2 or more records with product ID as “PROD0001″, the default order was product ID ascending, then effective date ascending (according to the primary key). What happened to the description based on the latest effective date? It’s right at the bottom of the result set. What was required? The description based on the latest effective date.

Since the default was to use the record at the top of the result set, the description based on the earliest effective date was used. Hence the error.

There were multiple records for the same product IDs. The reason this problem didn’t occur was that the description was the same for the respective IDs. Until now.

My colleagues ordered the results by effective date in descending order, and all was well.

There were other instances where the result set of an unordered select statement came out fine, until the order was different. The data could have, by coincidence, been inserted in the correct order, hence the retrieval automatically had that order. The data could have, because of a primary key, been automatically ordered when retrieved. But that’s no excuse to depend on the default ordering.

If you depend on the order in your results, use the order by clause.

Chop off their heads

He looked cautiously around, examining every little detail in the room. Each step he took was slow and calculating. His eyes stared at the space above the floor, as if he could see the passing of his quarry through that very space. Raising his right arm, he negligently rested the head of a blood-stained axe upon his shoulder.

He stood still. A drop of blood dripped from the axe for what seemed an eternity, and splattered the floor. He turned, and light glinted off the axe where it wasn’t bloody. A gasp escaped from the closet. He grinned, and shrugged the axe onto both his hands.

He’s cold-hearted. He’s cruel. He’s a murderer.

Axe on chopping block
[image by Geoffery Holman]

No, I’m not writing a horror story. That wasn’t quite scary enough. Although what I’m going to tell you is frightening enough… It was a dark and stormy night… uh…

It was some data patching task. I was to delete some data from a table. I entered the SQL statement

delete from ImportantTable

and promptly executed that statement without providing the where clause!

Oh in the name of all that is good! My heart was pounding like I just finished running a marathon. My hands started sweating. I felt a heat spreading from my neck to my head. “What have I done!”

Luckily, I was using TOAD, a user interface for accessing Oracle databases. And in Oracle, as much as I hate it, any changes to database tables are not committed till you specify it so. There’s a commit button in TOAD. You can also type in “commit” and execute that.

So what I did was roll the changes back, with a handy “rollback” button. Whew…

Like I said before, I prefer any SQL statement I execute to be, you know, really executed. I’ve had experiences where I was debugging my web application and was wondering why the data wasn’t refreshed. The select query using TOAD returned the correct set of data. Why wasn’t the web application doing so too? Because the changes in the database weren’t committed. A waste of 2 hours of my life…

But that’s with Oracle. The other databases aren’t so forgiving. But I like that. Anyway, from then on, I’m very careful about executing update, insert and delete statements.

Still, that wasn’t enough for my paranoid mind, oh no no no. What if I need to have several statements on screen, and a few of them are updates and deletes? Perhaps you would suggest commenting them out. Well, in Query Analyzer (of SQL Server), all you need to do is highlight the statement and you can execute only the highlighted section. If you don’t highlight the commenting syntax (2 dashes or 2 forward slashes), the statement gets executed.

Well, this won’t do at all. So I came up with a fail-safe method; I chopped off the heads of any SQL statement performing “dangerous” operations. So all my “dangerous” statements look like this:

nsert into ImportantTable
values('CODE0001','Very important code')

pdate AnotherImportantTable
set id_desc='An alternative description'
where id_code='IMPT0001'

elete from SuperImportantTable
where price < 500

I add the appropriate header alphabet when I'm going to execute the statement. After execution, I lop the header alphabet again. This leaves the statement still on screen so I know what I did. And in the unfortunate event that I accidentally execute the entire statement when I wasn't supposed to, the execution will fail, because the statement isn't properly formed.

So that's my method of handling SQL statements. When in doubt, chop off their heads first.

Deciphering column types in design documents

When I first started working, I’ve never even heard of design specifications. The few sentences of a programming question for a university assignment barely made it as design requirements. I think the longest description went slightly over half a page, and that’s because it was explaining some scientific logic behind the question.

So when I was first handed the design documents of an existing application system, my eyes kind of glazed over the arcane language… The first few pages were usually full of important-sounding sentences but really means very little to the programmer. Well, most of it anyway. They’re about how this application was to do X, because Y happened and Z wasn’t very happy about it, and application A could almost do the same thing except for condition B.

It wasn’t a critical period when I joined the team, so things were a bit quiet and I had time to learn. Have I told you I didn’t know a single thing about SQL at the time? I was picking that up too.

Flipping through the pages, I found a table describing column information. There were input files, and that table described the columns in the file. This was a few years ago, so the input files were what was termed “flat files”.

Each line in those files were of a fixed length, and each column occupies a specific position and a specific length in a line. The usual line types were the header, trailer, and data. The header and trailer lines were usually shorter than the data lines.

The header probably contains information such as

  • Timestamp of file (usually just the date it was generated)
  • Name of file
  • Application code (not our kind of code. Short acronym identifier of program)

The trailer probably contains information such as

  • Number of data lines (for reconciliation purposes)
  • Sum totals of stuff (monetary amount, duration and so on)
  • … you know, I think it’s usually just the above 2

Now the data lines were more interesting. They were loaded into the database, so the columns in the file usually match closely to that of the database table. Here’s where I both learned to read design documents and file formats, and picked up SQL all at one go…

Here are 2 examples:

9(8) with comment “ccyymmdd”. It means “8 numerals”, and the comment hints at … ? Century, last 2 digits of the year, the month, and the day.

9(6) with comment “ccyymm”. It means “6 numerals”, and I’m sure you can figure out what the comment means.

The “9″ is a notation used to denote digits or numerals only. The number within brackets denote the number of digits. Let’s try…

X(9) which means 9 alphanumeric characters.
X(57) with comment “filler”. It means … ? 57 alphanumeric characters, probably just spaces because this column is a filler.

I have no idea why “X” denotes alphanumeric… For that matter, I don’t know why “9″ is used to represent digits too. As for the filler column, remember the header and trailer lines? They are shorter than the data lines, so a column is specially made so that each line, whether it’s a header, data or trailer line, can fit snugly into one line. No, XML wasn’t invented yet… I think.

Now for some obscure ones…

9(7)v99 which means there are 7 digits, followed by 2 digits.
9v9(5) means 1 digit, followed by 5 digits.

If they are all digits, what’s with the weird notation? The “v” means there’s an implied decimal point. So “9(7)v99″ means a number which is up to 7 digits long, followed by 2 digits representing a number (below one) up to 2 decimal places.

Confused? “9(7)v99″ is equivalent to numeric(9,2) in SQL-speak. 1234567.89 is an example.

So what’s the implied decimal point for? If I understand it correctly, the notations came from programming practices in COBOL, and the banking industry was making use of flat files to transfer data around. Since transmitting data was expensive (they didn’t have 500 gigabytes of hard disk space then…), every single byte counted.

Since it was understood that the figure in that particular column was a money value, the decimal point was taken out to save space. Tada! Instant saving of, I don’t know, tens and hundreds of kilobytes. And that practice flowed to other industries.

It’s a good thing my current team uses notations such as “char(8)”, “numeric(15,2)” and “int” to define column types. Hey wait, those look familiar…

You are debugging with the wrong database

I feel an urge to tell a story first. So here goes…

Once upon a time, in a far away land, a young prince lived in a shining castle. Although he had everything his heart desired, he was spoilt, selfish and unkind.

But then, one winter’s night, an old beggar woman came to the castle and offered him a single rose, in return for shelter from the bitter cold. Repulsed by her haggard appearance, the prince sneered at the gift, and turned the old woman away. But she warned him, not to be deceived by appearances, for beauty is found within…

Oops, wrong story. Let me try again.

Once upon a time…

In the Far East, there was an adventurer by the name of Wen Sen* (wuhn suhn). He wandered many lands, climbed many hills and even walked on glaciers. But he’s a scholar at heart, and so he set out in search of knowledge.

Mayan ruins explorer
[image by Steve Geer]

Wen Sen wanted to find out more about a particular village with strange inhabitants. After travelling many days on foot, he finally reached the village’s gates. Despite his raging thirst, his thirst for knowledge was greater. So he accosted the first villager with

Do you think it’s a coincidence that the first 3 prime numbers are also part of the Fibonacci sequence**?

Getting a blank stare, he leapt to the next villager with

What are your thoughts on the first 3 odd prime numbers forming 2 pairs of twin primes***?

Despite his fervour, Wen Sen didn’t get anything out of the villlagers. He even tried indecent questions such as “Are you divisible by 17?”. Dejected, he slumped at a corner of a building, thoroughly miserable.

An elderly woman approached him and asked,
“Are you alright, young man?”
“I’m fine. I just haven’t found what I am looking for,” Wen Sen sighed.
“Well, what are you looking for?” she asked.
“Your village is supposed to hold the key to unlocking the secrets of prime numbers,” Wen Sen breathed. “But everyone seems confused. What am I doing wrong?“.

“Oh,” the elderly woman’s eyebrows lifted. “You must be referring to our neighbouring Village 2357. This is Village 4680.”

The real story…

Well, I can’t remember the details. All I remembered was, I was testing my code, and the results on the web page didn’t match the results in the database.

I triple checked my code. I retrieved the results from the database to verify the data. Everything was in order. But why wasn’t the web page showing the correct set of data?

I forgot what triggered it, but I suddenly realised that I was connecting to the wrong database. I was working with databases in development, testing and production environments then. And I forgot to change a configuration setting.

From then on, I was careful about making sure that I’m in the correct database before I do anything else.

* Wen Sen are the closest Chinese characters to my name Vincent. It means “knowledge forest” or “culture forest”, depending on context. And depending on the Chinese characters used, of course. And no, my actual Chinese name isn’t Wen Sen.

** The first 3 primes are 2, 3 and 5. The Fibonacci sequence is 1, 1, 2, 3, 5, 8 and so on.

*** 3 and 5 form one pair of twin primes. 5 and 7 form another.