How to create a stylesheet in Excel Open XML

Today, I’ll show you how to create a stylesheet in Excel Open XML with the minimum required. The styles I need are:

  • Forced text format for long consecutive string of digits
  • Date format
  • Decimal format

We’ll be using the Open XML SDK 2.0 from Microsoft. As of this writing, it’s still in Community Technical Preview state (August 2009), so I’ll just let you search online, in case the final product is released by the time you read this article.

The stylesheet is represented by the DocumentFormat.OpenXml.Spreadsheet.Stylesheet class. Through my hours (and hours and hours…) of playing around with the code, I still had to use almost the same default stylesheet when I unzip a blank Excel file in Open XML format. In case you don’t know, an Excel Open XML file (or any of the Microsoft Office products in Open XML format such as Word and PowerPoint) is basically a zip file of folders and XML files (and perhaps some media resources).

This is the stylesheet XML file produced by the code which you’ll see in a bit. Download ExcelOpenXmlStyles.xml (which is actually named styles.xml in the original zip file, but I renamed it to avoid clashing with my other files).

You could write an XML file directly with that content (say, using the StreamWriter class). You just have to be careful of the XML structure, such as opening and closing of tags, and taking care of child tags.

Or you could use the SDK.

I haven’t found anyone writing on how to create styles to format the content of the Excel file. Maybe it’s because just creating an Excel file is already an awesome accomplishment… Unfortunately, I can’t remember where are the one or two articles I read to create a basic Excel file…

Well, I figured out how to use the SDK to create the file. What I did was look at a tag in the XML file (downloadable from above), search for the corresponding class entry in the Open XML SDK help file, and use that class in the code. It’s a tedious process, and some tags have a different-looking name as the class (such as the cellStyleXfs tag and its CellStyleFormats class, or the unintuitive xf tag and its CellFormat class.)

Anyway, here’s the code:

private static Stylesheet CreateStylesheet()
{
	Stylesheet ss = new Stylesheet();

	Fonts fts = new Fonts();
	DocumentFormat.OpenXml.Spreadsheet.Font ft = new DocumentFormat.OpenXml.Spreadsheet.Font();
	FontName ftn = new FontName();
	ftn.Val = "Calibri";
	FontSize ftsz = new FontSize();
	ftsz.Val = 11;
	ft.FontName = ftn;
	ft.FontSize = ftsz;
	fts.Append(ft);
	fts.Count = (uint)fts.ChildElements.Count;

	Fills fills = new Fills();
	Fill fill;
	PatternFill patternFill;
	fill = new Fill();
	patternFill = new PatternFill();
	patternFill.PatternType = PatternValues.None;
	fill.PatternFill = patternFill;
	fills.Append(fill);
	fill = new Fill();
	patternFill = new PatternFill();
	patternFill.PatternType = PatternValues.Gray125;
	fill.PatternFill = patternFill;
	fills.Append(fill);
	fills.Count = (uint)fills.ChildElements.Count;

	Borders borders = new Borders();
	Border border = new Border();
	border.LeftBorder = new LeftBorder();
	border.RightBorder = new RightBorder();
	border.TopBorder = new TopBorder();
	border.BottomBorder = new BottomBorder();
	border.DiagonalBorder = new DiagonalBorder();
	borders.Append(border);
	borders.Count = (uint)borders.ChildElements.Count;

	CellStyleFormats csfs = new CellStyleFormats();
	CellFormat cf = new CellFormat();
	cf.NumberFormatId = 0;
	cf.FontId = 0;
	cf.FillId = 0;
	cf.BorderId = 0;
	csfs.Append(cf);
	csfs.Count = (uint)csfs.ChildElements.Count;

	uint iExcelIndex = 164;
	NumberFormats nfs = new NumberFormats();
	CellFormats cfs = new CellFormats();

	cf = new CellFormat();
	cf.NumberFormatId = 0;
	cf.FontId = 0;
	cf.FillId = 0;
	cf.BorderId = 0;
	cf.FormatId = 0;
	cfs.Append(cf);

	NumberFormat nf;
	nf = new NumberFormat();
	nf.NumberFormatId = iExcelIndex++;
	nf.FormatCode = "dd/mm/yyyy hh:mm:ss";
	nfs.Append(nf);
	cf = new CellFormat();
	cf.NumberFormatId = nf.NumberFormatId;
	cf.FontId = 0;
	cf.FillId = 0;
	cf.BorderId = 0;
	cf.FormatId = 0;
	cf.ApplyNumberFormat = true;
	cfs.Append(cf);

	nf = new NumberFormat();
	nf.NumberFormatId = iExcelIndex++;
	nf.FormatCode = "#,##0.0000";
	nfs.Append(nf);
	cf = new CellFormat();
	cf.NumberFormatId = nf.NumberFormatId;
	cf.FontId = 0;
	cf.FillId = 0;
	cf.BorderId = 0;
	cf.FormatId = 0;
	cf.ApplyNumberFormat = true;
	cfs.Append(cf);

	// #,##0.00 is also Excel style index 4
	nf = new NumberFormat();
	nf.NumberFormatId = iExcelIndex++;
	nf.FormatCode = "#,##0.00";
	nfs.Append(nf);
	cf = new CellFormat();
	cf.NumberFormatId = nf.NumberFormatId;
	cf.FontId = 0;
	cf.FillId = 0;
	cf.BorderId = 0;
	cf.FormatId = 0;
	cf.ApplyNumberFormat = true;
	cfs.Append(cf);

	// @ is also Excel style index 49
	nf = new NumberFormat();
	nf.NumberFormatId = iExcelIndex++;
	nf.FormatCode = "@";
	nfs.Append(nf);
	cf = new CellFormat();
	cf.NumberFormatId = nf.NumberFormatId;
	cf.FontId = 0;
	cf.FillId = 0;
	cf.BorderId = 0;
	cf.FormatId = 0;
	cf.ApplyNumberFormat = true;
	cfs.Append(cf);

	nfs.Count = (uint)nfs.ChildElements.Count;
	cfs.Count = (uint)cfs.ChildElements.Count;

	ss.Append(nfs);
	ss.Append(fts);
	ss.Append(fills);
	ss.Append(borders);
	ss.Append(csfs);
	ss.Append(cfs);

	CellStyles css = new CellStyles();
	CellStyle cs = new CellStyle();
	cs.Name = "Normal";
	cs.FormatId = 0;
	cs.BuiltinId = 0;
	css.Append(cs);
	css.Count = (uint)css.ChildElements.Count;
	ss.Append(css);

	DifferentialFormats dfs = new DifferentialFormats();
	dfs.Count = 0;
	ss.Append(dfs);

	TableStyles tss = new TableStyles();
	tss.Count = 0;
	tss.DefaultTableStyle = "TableStyleMedium9";
	tss.DefaultPivotStyle = "PivotStyleLight16";
	ss.Append(tss);

	return ss;
}

The whole thing is actually very simple. There’s a SpreadsheetDocument class, which has as a child a WorkbookPart class, which has as a child a WorkbookStylesPart class, which has a Stylesheet property which you assign with the result of that function you see in the code above. *whew*

It’s logically structured. It’s just that I can’t find anything online or in the documentation about which classes I needed to use… hence the hours (and hours and hours…) of research and testing.

A few points to note:

  • The Font class used is different from System.Drawing.Font
  • System-defined style numbers are less than 164 (based on my experiments on custom styles). Hence the magic number. So custom style index numbers are 164 and above.
  • The style “#,##0.0000” is typically used by me for representing Internet traffic, as in 1,234.5670 MB.
  • The style “#,##0.00” is a standard format in Excel, with the style number 4
  • Forced text format is “@”, which is also a standard format in Excel, with the style number 49
  • The function is static because I’m using it in a console program.

There are a couple of classes used that I have no idea what they are used for. For example, the DifferentialFormats class and the TableStyles class. I just know that if I don’t create them as a child of the Stylesheet class, the Excel file will fail to open. This is the major time-drain of my research and experiments: determining the classes used to write the minimum code (or XML file).

Next time, I’ll show you how to insert an image into the Excel file. That one takes up a whole lot of code when compared to inserting an image file in HTML. I’ll conclude the whole Excel Open XML creation with the full code on generating a working Excel file. I’m setting all the pieces here piecemeal so I don’t have to explain everything in one shot.

There’s updated material and source code, together with more information on how to work with Open XML. Click here to find out more.

The leap year 1900 “bug” in Excel

No it’s not really a bug in Microsoft Excel. What happens is that Excel will accept 29 Feb 1900 as a valid date.

“Wait, the year 1900 is not a leap year. 29 Feb 1900 is invalid!”

Yes, I agree. I wrote something about leap years before. From what I understand, it’s a historical issue, that

There are two kinds of Excel worksheets: those where the epoch for dates is 1/1/1900 (with a leap-year bug deliberately created for 1-2-3 compatibility that is too boring to describe here), and those where the epoch for dates is 1/1/1904.

[emphasis mine]

The “1-2-3” refers to Lotus 1-2-3, a spreadsheet program. To get the Lotus users to come over to Excel, Excel had to be able to import files in the Lotus format. Unfortunately, leap years weren’t well understood then, so Excel used the wrong leap year calculation, as did other spreadsheet software, which Microsoft acknowledges.

I’m talking about this because I’m creating Excel files in the Open XML format (for reporting purposes). In particular, I work with dates. For example, I have to create reports for call detail records that pass through satellites for my users. The date value is one of the most looked at piece of information.

Anyway, to study how dates are stored in Excel, I created an Excel file with date information such as “26/10/2009 12:34” and saved it. I used to save it in the (Excel 2003) XML format, but with Open XML and the Microsoft Office Compatibility Pack (for Office 2003 and earlier versions), I tried the .xlsx format.

I renamed the .xlsx to .zip (because Open XML files are just zip files), then unzipped the package. I looked at /xl/worksheets/sheet1.xml and looked for my date data. It disappeared! They’re just floating point numbers!

Actually, those floating point numbers represent the number of days since the epoch 1 Jan 1900.

How did I discover that? I can’t remember the details. I think I printed consecutive days such as 1 Jan 2009, 2 Jan 2009 and so on, and then checked the floating point number in the resulting XML file. I found that they differed by a difference of 1.

Then I made the brilliant mother of all light bulbs and postulated that perhaps the floating point numbers started counting from an epoch. The only epoch worth my while was 1 Jan 1900.

So I amended my test program to start generating date values
01/01/1900
02/01/1900
03/01/1900
04/01/1900
05/01/1900
and so on.

I looked at the resulting XML file and saw 1, 2, 3, 4, 5 and so on. My next brilliant bit of deduction was that the trailing decimal values must be fractional values of days. Pleased with myself, I proceeded to test that theory.

It worked fine. Till I hit 29 Feb 1900. My instincts warned me, “That doesn’t look right.” Generally, I look out for edge cases, and in the case of dates, 29 Feb. I did some calculations and woah, 29 Feb 1900 isn’t valid! Yet there it was in the Excel file, displayed and formatted correctly by Excel.

That’s when I dug around and found Joel’s article.

So how do you fix it?

DateTime dtEpoch = new DateTime(1900, 1, 1, 0, 0, 0, 0);
DateTime dt = DateTime.ParseExact("05 Mar 1900", "dd MMM yyyy", null);
TimeSpan ts = dt - dtEpoch;
double fExcelDateTime;
// Excel has "bug" of treating 29 Feb 1900 as valid
// 29 Feb 1900 is 59 days after 1 Jan 1900, so just skip to 1 Mar 1900
if (ts.Days >= 59)
{
	fExcelDateTime = ts.TotalDays + 2.0;
}
else
{
	fExcelDateTime = ts.TotalDays + 1.0;
}

If the date in question is on or after 1 Mar 1900, simply add one to the floating point value.

You might wonder why we added one more to the value in each section of the if-else portion. If the date is on or after 1 Mar 1900, shouldn’t it be +1.0 instead of +2.0?

Because it’s a counting problem. We excluded the epoch date itself when we calculated the TimeSpan ts, so we’re adding it back in.

You might wonder why you couldn’t have used 31 Dec 1899 as the base. You can. I just think 31 Dec 1899 doesn’t quite ring a bell. 1 Jan 1900 is more striking.

“But I don’t use the year 1900 at all! I mean, even the year 2000 is, like, so last century. The year 1900 is like, Babylonian!”

Hmm… ok. *shrugs*

There’s also updated material and source code, together with more information on how to work with Open XML. Click here to find out more.

English Metric Units and Open XML

Emu
[image by blmurch]

In this article, you’ll find out how English Metric Units (or EMUs) are related to the Open XML format. So what are English Metric Units? They are … actually, I have no idea what they are. Here’s the best answer I could find on EMUs. And there’s the Open XML explanation on EMUs in Wikipedia.

I stumbled upon EMUs because I was trying to create an Excel spreadsheet using the Open XML format. The task require the addition of an image in one of the Excel sheets. You have no idea how much code I needed to write just to include one image. (I’ll tell you about that in another article. Let’s focus on EMUs here.)

Basically, there’s no unifying unit, no “one ring to bind them all”: centimetres, inches and points. So they used a new unit of measurement to represent the dimensions of an image. *sigh*

Anyway, while searching for how to convert pixels to EMUs, I stumbled upon 2 articles on StackOverflow: Pixel to Centimetre and Pixel to Point.

From the Wikipedia article, there are 914400 EMUs per inch, and there are 96 pixels to an inch. Therefore, I figured the pixel to EMU formula is

EMU = pixel * 914400 / 96

There’s a flaw, in that the dots per inch (DPI) may be different for different monitors. For example, there could be 72 pixels in an inch. The best I could do is to assume that, the image created with one monitor, will be used on another monitor with the same DPI. Thus:

EMU = pixel * 914400 / Resolution

Here’s some code to visualise that:

Bitmap bm = new Bitmap("yourimage.jpg");
DocumentFormat.OpenXml.Drawing.Extents extents = new DocumentFormat.OpenXml.Drawing.Extents();
extents.Cx = (long)bm.Width * (long)((float)914400 / bm.HorizontalResolution);
extents.Cy = (long)bm.Height * (long)((float)914400 / bm.VerticalResolution);

There, now you know how English Metric Units are related to Open XML formats.

There’s updated material and source code, together with more information on how to work with Open XML. Click here to find out more.

P.S. English Metric Units have no relation to emus. Other than their (unfortunate?) acronym being exactly the same as the name of the Dromaius novaehollandiae