The leap year 1900 “bug” in Excel

No it’s not really a bug in Microsoft Excel. What happens is that Excel will accept 29 Feb 1900 as a valid date.

“Wait, the year 1900 is not a leap year. 29 Feb 1900 is invalid!”

Yes, I agree. I wrote something about leap years before. From what I understand, it’s a historical issue, that

There are two kinds of Excel worksheets: those where the epoch for dates is 1/1/1900 (with a leap-year bug deliberately created for 1-2-3 compatibility that is too boring to describe here), and those where the epoch for dates is 1/1/1904.

[emphasis mine]

The “1-2-3” refers to Lotus 1-2-3, a spreadsheet program. To get the Lotus users to come over to Excel, Excel had to be able to import files in the Lotus format. Unfortunately, leap years weren’t well understood then, so Excel used the wrong leap year calculation, as did other spreadsheet software, which Microsoft acknowledges.

I’m talking about this because I’m creating Excel files in the Open XML format (for reporting purposes). In particular, I work with dates. For example, I have to create reports for call detail records that pass through satellites for my users. The date value is one of the most looked at piece of information.

Anyway, to study how dates are stored in Excel, I created an Excel file with date information such as “26/10/2009 12:34” and saved it. I used to save it in the (Excel 2003) XML format, but with Open XML and the Microsoft Office Compatibility Pack (for Office 2003 and earlier versions), I tried the .xlsx format.

I renamed the .xlsx to .zip (because Open XML files are just zip files), then unzipped the package. I looked at /xl/worksheets/sheet1.xml and looked for my date data. It disappeared! They’re just floating point numbers!

Actually, those floating point numbers represent the number of days since the epoch 1 Jan 1900.

How did I discover that? I can’t remember the details. I think I printed consecutive days such as 1 Jan 2009, 2 Jan 2009 and so on, and then checked the floating point number in the resulting XML file. I found that they differed by a difference of 1.

Then I made the brilliant mother of all light bulbs and postulated that perhaps the floating point numbers started counting from an epoch. The only epoch worth my while was 1 Jan 1900.

So I amended my test program to start generating date values
and so on.

I looked at the resulting XML file and saw 1, 2, 3, 4, 5 and so on. My next brilliant bit of deduction was that the trailing decimal values must be fractional values of days. Pleased with myself, I proceeded to test that theory.

It worked fine. Till I hit 29 Feb 1900. My instincts warned me, “That doesn’t look right.” Generally, I look out for edge cases, and in the case of dates, 29 Feb. I did some calculations and woah, 29 Feb 1900 isn’t valid! Yet there it was in the Excel file, displayed and formatted correctly by Excel.

That’s when I dug around and found Joel’s article.

So how do you fix it?

DateTime dtEpoch = new DateTime(1900, 1, 1, 0, 0, 0, 0);
DateTime dt = DateTime.ParseExact("05 Mar 1900", "dd MMM yyyy", null);
TimeSpan ts = dt - dtEpoch;
double fExcelDateTime;
// Excel has "bug" of treating 29 Feb 1900 as valid
// 29 Feb 1900 is 59 days after 1 Jan 1900, so just skip to 1 Mar 1900
if (ts.Days >= 59)
	fExcelDateTime = ts.TotalDays + 2.0;
	fExcelDateTime = ts.TotalDays + 1.0;

If the date in question is on or after 1 Mar 1900, simply add one to the floating point value.

You might wonder why we added one more to the value in each section of the if-else portion. If the date is on or after 1 Mar 1900, shouldn’t it be +1.0 instead of +2.0?

Because it’s a counting problem. We excluded the epoch date itself when we calculated the TimeSpan ts, so we’re adding it back in.

You might wonder why you couldn’t have used 31 Dec 1899 as the base. You can. I just think 31 Dec 1899 doesn’t quite ring a bell. 1 Jan 1900 is more striking.

“But I don’t use the year 1900 at all! I mean, even the year 2000 is, like, so last century. The year 1900 is like, Babylonian!”

Hmm… ok. *shrugs*

There’s also updated material and source code, together with more information on how to work with Open XML. Click here to find out more.

  1. SHamburg

    Your fix is wrong. Feb 29 is day 60. Not 59.
    It should say “if (ts.Days >= 60)” because, despite documentation, the base is Dec 31, 1899 (which has an Excel date of 0). 1 is Jan 1, 1900 and so on.


  2. Vincent

    SHamburg, the fix is based on the “epoch” I chose. In this case, I picked 1 Jan 1900, instead of 31 Dec 1899. As I mentioned in the article, 1 Jan 1900 made more sense and is easier to remember.

    If the “epoch” was 31 Dec 1899 (or 0 Jan 1900 according to the Wikipedia article), then yes you are right, the fix should be that Feb 29 is day 60.

    Thanks for pointing it out!

Comments are closed.