Working on spreadsheet software library

As a natural and logical extension from my Open XML spreadsheet guide, I’m writing a software library to create and manipulate Open XML spreadsheets. (Never mind that decompiler project I was working on… 2 months of coding… sunk cost… moving on…).

I did some research (ok, an inordinate amount of research…) on the available spreadsheet software libraries out there, both free and commercial, both supporting Open XML (or .xlsx in any case) and the old .xls (Microsoft Excel in binary). I have 2 observations.

First, there’s a plethora of classes in the library. It’s sort of expected. There’s support for a lot of functionality, and it just burgeoned into many classes. Personally, I hate it when I have to learn a new library. There’s a whole bunch of documentation and classes I have to read up on and experiment to just do a simple thing (printing a string of characters is the first thing I try). When I first encountered the .NET Framework, I was crushed. It’s redeeming feature was its extensive documentation, which made learning easier.

Second, even though there’s support for a lot of functionality, it still takes quite a bit of code to accomplish what you want done (granted, much less than if you wrote low level code). Hey I wrote a guide on Open XML spreadsheets, I know how many lines of code needed to just create an empty Excel file, ok?

But these are spreadsheet software libraries!. They’re supposed to make your life easier. In fact, much easier.

I read that when the iPhone was designed, the engineers told Steve Jobs that it needed to have 4 or 5 buttons. Steve Jobs said no. One button (to rule them all). The iPhone now only has the 1 button.

So I took inspiration from that and designed my library to have that quality. Alright, alright, here’s a code sample:

SLDocument sl = new SLDocument();
sl.Save();

That will save an empty Excel file named “Book1.xlsx”. What, not Hello World enough for you?

SLDocument sl = new SLDocument();
sl.SaveAs("HelloWorld.xlsx");

There. Now the file is named “HelloWorld.xlsx”. What, sheet name? Most (if not all) of the libraries I researched required you to add a new worksheet to an empty file. All spreadsheets have at least one worksheet. Why force the programmer to do it anyway? You don’t see Microsoft Excel forcing the user to add worksheets in a newly created spreadsheet file, right? (Excel even has 3 worksheets added by default).

Alright, fine. The first worksheet’s name is by default “Sheet1”. You can rename it.

SLDocument sl = new SLDocument();
sl.RenameWorksheet(sl.DefaultFirstSheetName, "Hello");
sl.SaveAs("HelloWorld.xlsx");

There, happy? So, how do we set cell values?

SLDocument sl = new SLDocument();
sl.RenameWorksheet(sl.DefaultFirstSheetName, "Hello");
sl.SetCellValue(2, 3, 3.14159);
sl.SetCellValue(2, 4, "This is PI");
sl.SaveAs("HelloWorld.xlsx");

The cell with row 2, column 3 will have the value of PI. The cell with row 2, column 4 will have the string “This is PI”. Yes, the library supports cell references such as “C2” and “D2”. My opinion? They make better sense to a user with visual interface to the spreadsheet. It’s much harder to use when you’re programming with a non-visual interface to the spreadsheet. Good luck iterating through rows 2 to 500,000, with columns 1 to 1000 (financial reports, I’m looking at you…).

Want to add a new worksheet?

SLDocument sl = new SLDocument();
sl.RenameWorksheet(sl.DefaultFirstSheetName, "Hello");
sl.SetCellValue(2, 3, 3.14159);
sl.SetCellValue(2, 4, "This is PI");
sl.AddWorksheet("SecondWorksheet");
sl.SetCellValue(5, 5, "Why am I not first?");
sl.SaveAs("HelloWorld.xlsx");

Hey, a software library is supposed to make your life easy. The second worksheet’s name is *drum roll*, “SecondWorksheet”. The string “Why am I not first?” is in row 5, column 5 of the newly added worksheet. How does the library know which worksheet to add which cell value? By magic. Ok, fine, it automatically keeps track of worksheets.

When a user enters a cell value in Excel, does the user need to know which worksheet? No, because that information is implied. The user knows which worksheet because the user chose it already. And so does this software library.

Oh yeah, I even have basic theme support!

SLDocument sl = new SLDocument(SLThemeTypeValues.Flow);
sl.RenameWorksheet(sl.DefaultFirstSheetName, "Hello");
sl.SetCellValue(2, 3, 3.14159);
sl.SetCellValue(2, 4, "This is PI");
sl.SaveAs("HelloWorld.xlsx");

That gives you the Flow theme, one of the built-in themes in Microsoft Excel (note: only the fonts and font colours are supported). You can even design your own custom theme.

System.Drawing.Color[] clrs = new System.Drawing.Color[12];
clrs[0] = System.Drawing.Color.White;
clrs[1] = System.Drawing.Color.Black;
clrs[2] = System.Drawing.Color.WhiteSmoke;
clrs[3] = System.Drawing.Color.DarkSlateGray;
clrs[4] = System.Drawing.Color.DarkRed;
clrs[5] = System.Drawing.Color.OrangeRed;
clrs[6] = System.Drawing.Color.DarkGoldenrod;
clrs[7] = System.Drawing.Color.DarkOliveGreen;
clrs[8] = System.Drawing.Color.Navy;
clrs[9] = System.Drawing.Color.Indigo;
clrs[10] = System.Drawing.Color.SkyBlue;
clrs[11] = System.Drawing.Color.MediumPurple;

SLDocument sl = new SLDocument("ColourWheel", "Castellar", "Harrington", clrs);
sl.RenameWorksheet(sl.DefaultFirstSheetName, "Hello");
sl.SetCellValue(2, 3, 3.14159);
sl.SetCellValue(2, 4, "This is PI");
sl.SaveAs("HelloWorld.xlsx");

There are 12 colours you need to define. These correspond to the 2 light colours, 2 dark colours, 6 accent colours, the hyperlink colour and the followed hyperlink colour. “ColourWheel” is the theme name, “Castellar” is the major Latin font and “Harrington” is the minor Latin font. The major Latin font is used when you apply the Title named cell style. The minor Latin font is basically the body font.

What named cell style? A customer suggested supporting the feature.

Named cell styles

So how do you apply it?

sl.ApplyNamedCellStyle(2, 3, SLNamedCellStyleValues.Good);
sl.ApplyNamedCellStyle(2, 4, SLNamedCellStyleValues.Accent1);

I am finishing up version 1 of the library, and it will soon be available. I’m targeting a launch in January 2012. The software library will be called SpreadsheetLight. The primary idea is for it to be simple and clean. Simple for you to use, and you write clean code when you use it.

My internal tests show that SpreadsheetLight runs faster than 2 other free libraries. I won’t tell you which 2, because it’s not relevant, and because I’m automatically biased, and because it’s just not nice to the other programmers who contributed to those 2 libraries. The point is that it runs fast and is effective, which I’m happy about.