Parsing date formats with dd/MM/yyyy or MM/dd/yyyy?

I was angry and frustrated. The developer tested his code, and it failed. When he sent it to me for help, I tested it, and it worked. Isn’t it supposed to be the other way around? Someone writes the code and it works for them, but when it’s reviewed by you, it fails?

That developer couldn’t get the date selection right. I wrote a web control library which includes a date selection widget. When I designed it, to simplify things, I hardcoded the date parsing to be dd/MM/yyyy format (I’ll explain my choice later on).

What happened was, the developer either got the yellow screen of death (if you’re working in ASP.NET, you know what I’m talking about), or the day and month were swapped.

Finally, I asked him if his computer settings had MM/dd/yyyy as the default short date format. And if it was, to change it to dd/MM/yyyy. It was, and he changed it, and the code worked.

The default date setting on computers running on Windows is MM/dd/yyyy format. I change it to dd/MM/yyyy format because Singapore uses that format. And my users’ main business involves a big company residing in United Kingdom, which also uses the dd/MM/yyyy format.

Here’s how you change the settings. Go to Control Panel, and click on “Regional and Language Options” or something named similarly. You’ll get something like:

Regional and Language Options

Click on the “Customize” button to get:

Customize Regional Options

In the “Short date format” dropdownlist, enter “dd/MM/yyyy”. You may not find it under the dropdownlist options. This is what some people don’t know: You can type “dd/MM/yyyy” in the dropdownlist. Click “Apply” button, then “OK” button, and you’re set.

Here’s some code to test your changes:

System.Globalization.CultureInfo c = new System.Globalization.CultureInfo("en-GB", true);
DateTime dt1 = DateTime.ParseExact("05/08/2009", "dd/MM/yyyy", null);
DateTime dt2 = DateTime.Parse("05/08/2009");
DateTime dt3 = DateTime.ParseExact("05/08/2009", "dd/MM/yyyy", c);
DateTime dt4 = DateTime.Parse("05/08/2009", c);
Console.WriteLine(dt1.ToString("D").PadLeft(30) + " | " + dt1.ToString(c));
Console.WriteLine(dt2.ToString("D").PadLeft(30) + " | " + dt2.ToString("dd/MM/yyyy"));
Console.WriteLine(dt3.ToString("D").PadLeft(30) + " | " + dt3.ToString("MM/dd/yyyy"));
Console.WriteLine(dt4.ToString("D").PadLeft(30) + " | " + dt4.ToString());

Did you get the values you expected?

Here’s the comparison output from the American MM/dd/yyyy and British dd/MM/yyyy format:
Regional settings test output

Note that the code remained the same. Only the regional settings were changed.

So what I found out was that, as long as you explicitly state the date format (or use a culture setting with your desired date format), it works whether you’re parsing in the date string, or printing out the date.

Despite changing the settings, I still had a date format issue which I asked on StackOverflow. It was some time ago, but I didn’t get a satisfactory answer. Then again, it was my colleague’s problem and he’s fine with the weird behaviour… The problem could be because he didn’t explicitly set the output format, which displayed the wrong date on the screen, even though the regional settings are correct on the web server.

Deciphering column types in design documents

When I first started working, I’ve never even heard of design specifications. The few sentences of a programming question for a university assignment barely made it as design requirements. I think the longest description went slightly over half a page, and that’s because it was explaining some scientific logic behind the question.

So when I was first handed the design documents of an existing application system, my eyes kind of glazed over the arcane language… The first few pages were usually full of important-sounding sentences but really means very little to the programmer. Well, most of it anyway. They’re about how this application was to do X, because Y happened and Z wasn’t very happy about it, and application A could almost do the same thing except for condition B.

It wasn’t a critical period when I joined the team, so things were a bit quiet and I had time to learn. Have I told you I didn’t know a single thing about SQL at the time? I was picking that up too.

Flipping through the pages, I found a table describing column information. There were input files, and that table described the columns in the file. This was a few years ago, so the input files were what was termed “flat files”.

Each line in those files were of a fixed length, and each column occupies a specific position and a specific length in a line. The usual line types were the header, trailer, and data. The header and trailer lines were usually shorter than the data lines.

The header probably contains information such as

  • Timestamp of file (usually just the date it was generated)
  • Name of file
  • Application code (not our kind of code. Short acronym identifier of program)

The trailer probably contains information such as

  • Number of data lines (for reconciliation purposes)
  • Sum totals of stuff (monetary amount, duration and so on)
  • … you know, I think it’s usually just the above 2

Now the data lines were more interesting. They were loaded into the database, so the columns in the file usually match closely to that of the database table. Here’s where I both learned to read design documents and file formats, and picked up SQL all at one go…

Here are 2 examples:

9(8) with comment “ccyymmdd”. It means “8 numerals”, and the comment hints at … ? Century, last 2 digits of the year, the month, and the day.

9(6) with comment “ccyymm”. It means “6 numerals”, and I’m sure you can figure out what the comment means.

The “9” is a notation used to denote digits or numerals only. The number within brackets denote the number of digits. Let’s try…

X(9) which means 9 alphanumeric characters.
X(57) with comment “filler”. It means … ? 57 alphanumeric characters, probably just spaces because this column is a filler.

I have no idea why “X” denotes alphanumeric… For that matter, I don’t know why “9” is used to represent digits too. As for the filler column, remember the header and trailer lines? They are shorter than the data lines, so a column is specially made so that each line, whether it’s a header, data or trailer line, can fit snugly into one line. No, XML wasn’t invented yet… I think.

Now for some obscure ones…

9(7)v99 which means there are 7 digits, followed by 2 digits.
9v9(5) means 1 digit, followed by 5 digits.

If they are all digits, what’s with the weird notation? The “v” means there’s an implied decimal point. So “9(7)v99” means a number which is up to 7 digits long, followed by 2 digits representing a number (below one) up to 2 decimal places.

Confused? “9(7)v99” is equivalent to numeric(9,2) in SQL-speak. 1234567.89 is an example.

So what’s the implied decimal point for? If I understand it correctly, the notations came from programming practices in COBOL, and the banking industry was making use of flat files to transfer data around. Since transmitting data was expensive (they didn’t have 500 gigabytes of hard disk space then…), every single byte counted.

Since it was understood that the figure in that particular column was a money value, the decimal point was taken out to save space. Tada! Instant saving of, I don’t know, tens and hundreds of kilobytes. And that practice flowed to other industries.

It’s a good thing my current team uses notations such as “char(8)”, “numeric(15,2)” and “int” to define column types. Hey wait, those look familiar…

Date and time format mistakes in .NET and SQL

I’ve written about how you can manipulate dates and times in .NET before. Here, I’m going to highlight a few avoidable mistakes when manipulating them in .NET and in SQL for database queries.

Case matters

I’m not going to list down all the format strings you can use. Refer to this list instead. You might find this article by Microsoft on best practices useful. I am going to highlight these two letters, h and m.

The small “h” gives you the hour, in 12-hour notation without a leading zero (for less than 10 values). “hh” gives you the 12-hour representation with a leading zero.

The capital “H” gives you the hour in 24-hour notation (or military time) without a leading zero, and “HH” gives you the 24-hour representation with a leading zero.

It’s good user-friendly practice to include the “t” or “tt” notation if you’re using the small “h” to represent the hour (for “A” / “P” or “AM” / “PM” respectively). This way, you know if it’s in the morning or night.

The letter H isn’t so bad. At least you’re still referring to the hour. When you get to M, oh you better watch out.

The small “m” and “mm” gives you the minute (of the time) without and with a leading zero respectively.

The capital “M” and “MM” gives you the month without and with a leading zero respectively. For example, September would be 9 and 09 respectively. See, totally different thing from its small lettered counterpart. There’s the “MMM” and the “MMMM” format string, and I’ll leave it to you to experiment with it.

Now to a common mistake I see: “dd/mm/yyyy“. See any problems?

I’m going to give you a starter custom format string: “dd/MM/yyyy HH:mm:ss“. Burn that into your brain. You can swap “dd” and “MM” if you want. Use “-” instead of “/” as the separator if you want.

The TO_CHAR() function in Oracle PL/SQL

For the equivalent SQL statement to format dates and times in Oracle, here it is:

TO_CHAR(sysdate, 'DD/MM/YYYY HH24:MI:SS')

Note that the function parameter is case insensitive.

Note the difference between “MM” for month and “MI” for minute. For a 12-hour representation, use “HH” or “HH12”.

Tricky formats in Sybase and SQL Server

I don’t know why Sybase and SQL Server don’t just give the ability to easily customise date and time formats. Instead of the string formats in .NET and PL/SQL, they use number codes. Number codes! *urgh*

I actually have a table of the codes and the resulting formats printed out and placed beside my desk, somewhere buried in the chaotic mess of papers. I really just remember 3 number codes: 103, 112 and 120.

select convert(char(10), getdate(), 103)
-- gives something like 17/09/2008 in dd/MM/yyyy format
select convert(char(8), getdate(), 112)
-- gives something like 20080917 in yyyyMMdd format
select convert(char(19), getdate(), 120)
-- gives something like 2008-09-17 05:04:03 in yyyy-MM-dd HH:mm:ss format

I’m using the .NET format string notation in the comments. Note that number code 120 is only available in SQL Server. The other 2 codes are available in both Sybase and SQL Server. There are other codes (which you can explore here). I just frequently use those 3, particularly 112.

If you’re used to the American date display, then this might be useful:

select convert(char(10), getdate(), 101)
-- gives something like 09/17/2008 in MM/dd/yyyy format

Note that you can do something like

select convert(char(6), getdate(), 112)
-- gives something like 200809 in yyyyMM format

Note the char(6) part (versus char(8) originally). I do this quite often too.

Now for the mistake in T-SQL. What do you think is wrong with this?

select convert(char(10), EFF_DATE, 103) EFF_DATE
from Customers
order by EFF_DATE desc

Yes, I’m being deliberately vague. Use your powers of deduction to fill in the blanks.