If you depend on order, use an order by

It was an uneventful morning. All of us were at our computers, softly typing in the noiseless office. Tippity tap, tippity tap… (Except for me, since I was cranking out code to complete a project by a deadline, and was furiously testing the physical limits of the keyboard. But I digress…)

Dramatically lit office phone
[image by tysmith]

A phone rang, breaking the silence.

“Hello? Yes… Uh huh… REALLY?!”, answered my colleague.

What followed was a whirlwind of exclamations and activity back and forth between two of my colleagues as they discussed the situation. Some description of something in a report wasn’t correct, as reported by a user, and they were tracing the origins of the error. I didn’t find out what the problem was, since I had my own problems to deal with (tap-tap-tap).

Even with my earphones on, I could still vaguely figure out what’s going on. It’s not that they’re loud (although there’s that…), just that I’m aware of my surroundings. The problem boiled down to a select statement.

Let’s say the database table looked like this:

create table product
product_id char(8) not null,
effective_date datetime not null,
product_description varchar(50) not null
alter table product primary key (product_id, effective_date)

Based on the narrative I gave, and the structure of the table, you should reasonably be able to figure out the issue. No?

Alright, so the product description in a report wasn’t correct. There’s an effective date column, so I would think the record with the latest effective date was more relevant. What if the select statement had a where clause only on the product_id?

select product_id, product_description
from product
where product_id = 'PROD0001'

Based on the table structure and the primary key, there’s every reason to believe that there were multiple records with the same product ID. Why the original programmer failed to take note of this is beyond me…

In the case where there were 2 or more records with product ID as “PROD0001”, the default order was product ID ascending, then effective date ascending (according to the primary key). What happened to the description based on the latest effective date? It’s right at the bottom of the result set. What was required? The description based on the latest effective date.

Since the default was to use the record at the top of the result set, the description based on the earliest effective date was used. Hence the error.

There were multiple records for the same product IDs. The reason this problem didn’t occur was that the description was the same for the respective IDs. Until now.

My colleagues ordered the results by effective date in descending order, and all was well.

There were other instances where the result set of an unordered select statement came out fine, until the order was different. The data could have, by coincidence, been inserted in the correct order, hence the retrieval automatically had that order. The data could have, because of a primary key, been automatically ordered when retrieved. But that’s no excuse to depend on the default ordering.

If you depend on the order in your results, use the order by clause.

  1. Ben Barden - Blog Tips

    I have learned a few lessons with ORDER BY – namely, don’t always order by unique ID, as some data can be backdated or postdated. Not just on blogs, but system processes might “fake” the date in some cases.

    Maybe a follow-up post is needed to explain what happens if you don’t use LIMIT on a big table? 😀 (I love SQL!)

    Ben Barden – Blog Tipss last blog post..10 underused and neglected HTML tags

  2. Vincent Tan

    I think the lesson is to order by the information that’s relevant. As you said, the unique ID may be a bad choice to be ordered by.

Comments are closed.