Rich strings and inline strings in spreadsheets

Quite a while ago, I was mucking around in Excel and I discovered you can set the text in a cell to different fonts! Even different colours! (Ok, you’re probably bored of me going on about spreadsheets and Open XML, but it’s all I’m thinking about right now…) Granted, it’s a limited set of font style manipulations, but I’ve always thought the text in a cell was completely subjected to the cell style. I never thought you could change anything within a cell. I’m not an expert Excel user, ok?

The term used is an “inline string”. At least that’s what it’s referred to in the Open XML SDK, as the InlineString class.

This gave me a problem. How do I implement this in my spreadsheet library? It’s not as easy as just setting a cell value. You’d have to set up all the fonts and colours and bolds and italics and underlines, and then dump that bunch of stuff into a cell.

If you do it by hand, you’ll run (haha, foretelling a pun) into the DocumentFormat.OpenXml.Spreadsheet.Run class. Basically, you’re appending style runs. Here’s how you do it in Excel:

Inline string

You select the text in the formula box (not within the cell). Then you apply any font styles you want.

Aaannd… here’s where I tell you how my spreadsheet library is going to make your life easier. Here’s a sample screenshot of a result:

SpreadsheetLight inline string

Let’s look at the source code to generate that.

SLDocument sl = new SLDocument(SLThemeTypeValues.Metro);

SLFont font;
SLRstType rst;

font = new SLFont();
font.FontColor = System.Drawing.Color.Red;
rst = new SLRstType();
rst.AppendText("Roses are ");
rst.AppendText("red", font);
sl.SetCellValue(2, 2, rst.ToInlineString());

font = new SLFont();
font.FontColor = System.Drawing.Color.Blue;
rst = new SLRstType();
rst.AppendText("And violets are ");
rst.AppendText("blue", font);
sl.SetCellValue(3, 2, rst.ToInlineString());

font = new SLFont();
font.Bold = true;
font.Italic = true;
font.Underline = UnderlineValues.Double;
font.SetFont(FontSchemeValues.Major, 11);
font.SetFontThemeColor(SLThemeColorIndexValues.Accent2Color);
rst = new SLRstType();
rst.AppendText("But seriously...", font);
sl.SetCellValue(4, 2, rst.ToInlineString());

font = new SLFont();
font.SetFont(FontSchemeValues.Minor, 15);
font.SetFontThemeColor(SLThemeColorIndexValues.Accent1Color);
rst = new SLRstType();
rst.AppendText("you don't ", font);

font = new SLFont();
font.Italic = true;
rst.AppendText("have ", font);

rst.AppendText("to ");

font = new SLFont();
font.Underline = UnderlineValues.Single;
font.FontColor = System.Drawing.Color.OrangeRed;
rst.AppendText("emphasise ", font);

rst.AppendText("it ");

font = new SLFont();
font.Bold = true;
font.SetFontThemeColor(SLThemeColorIndexValues.Accent3Color);
rst.AppendText("so ", font);

rst.AppendText("much...");

sl.SetCellValue(5, 2, rst.ToInlineString());

SLStyle style = new SLStyle();
style.Fill.SetPatternForegroundColor(SLThemeColorIndexValues.Accent1Color);
style.Fill.SetPatternType(PatternValues.Solid);
sl.SetCellStyle(7, 2, style);
sl.SetCellValue(7, 3, "Accent 1");

style = new SLStyle();
style.Fill.SetPatternForegroundColor(SLThemeColorIndexValues.Accent2Color);
style.Fill.SetPatternType(PatternValues.Solid);
sl.SetCellStyle(8, 2, style);
sl.SetCellValue(8, 3, "Accent 2");

style = new SLStyle();
style.Fill.SetPatternForegroundColor(SLThemeColorIndexValues.Accent3Color);
style.Fill.SetPatternType(PatternValues.Solid);
sl.SetCellStyle(9, 2, style);
sl.SetCellValue(9, 3, "Accent 3");

style = new SLStyle();
style.Fill.SetPatternForegroundColor(SLThemeColorIndexValues.Accent4Color);
style.Fill.SetPatternType(PatternValues.Solid);
sl.SetCellStyle(10, 2, style);
sl.SetCellValue(10, 3, "Accent 4");

style = new SLStyle();
style.Fill.SetPatternForegroundColor(SLThemeColorIndexValues.Accent5Color);
style.Fill.SetPatternType(PatternValues.Solid);
sl.SetCellStyle(11, 2, style);
sl.SetCellValue(11, 3, "Accent 5");

style = new SLStyle();
style.Fill.SetPatternForegroundColor(SLThemeColorIndexValues.Accent6Color);
style.Fill.SetPatternType(PatternValues.Solid);
sl.SetCellStyle(12, 2, style);
sl.SetCellValue(12, 3, "Accent 6");

sl.SaveAs("InlineString.xlsx");

I’m using the Metro theme, which means the major Latin font is Consolas, and the minor Latin font is Corbel. The body text is in minor Latin font.

You’ll notice the 2 new classes, SLFont and SLRstType classes. The SLRstType models after the Open XML SDK (abstract) class RstType. I think it stands for “rich string type” (r + st + type).

I have filled in 6 cells with the accent colours, just so you can see how the colours are used. The accent colours are tied to the theme used, as is the major and minor Latin fonts. So if you use these colours and fonts, the text is automatically formatted against the current theme.

This is an advantage if you set the font as the minor Latin font, instead of directly as “Corbel”. If the user changes the theme of your resulting spreadsheet, the text changes to the new theme’s minor Latin font. Of course, if you want the text to stay as “Corbel”, regardless of the theme, then set it directly and explicitly as “Corbel”. The SLFont class has overloaded functions for this.

P.S. Can you tell I’m excited about this? I’m going to launch this baby. Soon. I will limit such “promotional” articles, but I really think Excel gives me surprises, so I thought you might want to know what your user thinks is normal Excel activity.

SpreadsheetLight gradient fill function

I’m fascinated by gradient fills in a spreadsheet. More specifically, why would anyone want to have a cell with gradient colours? Is a standard block colour fill not enough? Is a texture image fill not enough? I guess this comes down to the visual aspect. Humans like to look at pretty colours. Especially if you have to stare at financial figures in a spreadsheet for hours.

So, that spreadsheet library I’m working on? It can do this:
Gradient fills in SpreadsheetLight

The code to do that is

SLDocument sl = new SLDocument(SLThemeTypeValues.Flow);

SLStyle style = new SLStyle();
style.Fill.SetCustomGradient(GradientValues.Linear, 45, null, null, null, null);
style.Fill.AppendGradientStop(0, SLThemeColorIndexValues.Light2Color);
style.Fill.AppendGradientStop(0.2, System.Drawing.Color.Red);
style.Fill.AppendGradientStop(0.4, System.Drawing.Color.Green);
style.Fill.AppendGradientStop(0.6, System.Drawing.Color.Blue);
style.Fill.AppendGradientStop(0.8, System.Drawing.Color.Yellow);
style.Fill.AppendGradientStop(1, SLThemeColorIndexValues.Accent1Color, 0.5);

sl.SetCellValue(2, 3, "Custom gradient function");
sl.SetCellStyle(2, 2, style);

style = new SLStyle();
style.Fill.SetGradient(SLGradientShadingStyleValues.DiagonalDown2, SLThemeColorIndexValues.Accent2Color, SLThemeColorIndexValues.Accent6Color);

sl.SetCellValue(4, 3, "Built-in gradient function");
sl.SetCellStyle(4, 2, style);

sl.SetColumnWidth(2, 24);
sl.SetRowHeight(2, 108);
sl.SetRowHeight(4, 108);

sl.SaveAs("GradientFill.xlsx");

The gradient stops are positioned from 0.0 to 1.0. The “built-in” functions (simulating Excel) allow you to specify only 2 colours, even though you can have more.

You will notice that the library allows you to use both theme colours and System.Drawing.Color’s. You can even specify a tint modifier (as seen in the last gradient stop), which range from -1.0 to 1.0 (-1.0 being completely dark, and 1.0 being completely white).

You might also notice that you don’t need to declare many variables from the library. For most of your work, you just need to know SLDocument class (which handles most of the spreadsheet’s functions), and the SLStyle class (which handles all your styling needs). Most of the functions are overloaded, which is why the functions are squeezed into fewer classes.

Here’s my rationale: I walk into a party. I don’t really know anyone. I find one person that I recognise. Probably the host. Then I let the host introduce me to everything. Who the interesting people are. Where’s the food. Where’s the washroom. Look, I don’t mind meeting people in the party, but I’m not really into that particular party. I just want to mingle a little so I can tell my friend that yes, I was at the party. Mission accomplished…

Then I go to that other party that I really wanted to go. (no offense to the host of the first party. “None taken.” Aww, isn’t he a nice guy?).

I don’t want to burden you with yet another software library to learn. So I’ve made it easy. 2 classes for most of your needs. If you’re using one of them intelligent code editing software, you’d get auto-completion too. Exploring what else a class can do for you is just a “.” away.

Yes, I’m finishing the library. It’ll be ready soon, ok? Just a couple of features more, and some testing, and I’ll launch version 1 of the product. I don’t give a flying fishball about eternal software betas. (Just launch already, dammit! Stupid software startups…)

Working on spreadsheet software library

As a natural and logical extension from my Open XML spreadsheet guide, I’m writing a software library to create and manipulate Open XML spreadsheets. (Never mind that decompiler project I was working on… 2 months of coding… sunk cost… moving on…).

I did some research (ok, an inordinate amount of research…) on the available spreadsheet software libraries out there, both free and commercial, both supporting Open XML (or .xlsx in any case) and the old .xls (Microsoft Excel in binary). I have 2 observations.

First, there’s a plethora of classes in the library. It’s sort of expected. There’s support for a lot of functionality, and it just burgeoned into many classes. Personally, I hate it when I have to learn a new library. There’s a whole bunch of documentation and classes I have to read up on and experiment to just do a simple thing (printing a string of characters is the first thing I try). When I first encountered the .NET Framework, I was crushed. It’s redeeming feature was its extensive documentation, which made learning easier.

Second, even though there’s support for a lot of functionality, it still takes quite a bit of code to accomplish what you want done (granted, much less than if you wrote low level code). Hey I wrote a guide on Open XML spreadsheets, I know how many lines of code needed to just create an empty Excel file, ok?

But these are spreadsheet software libraries!. They’re supposed to make your life easier. In fact, much easier.

I read that when the iPhone was designed, the engineers told Steve Jobs that it needed to have 4 or 5 buttons. Steve Jobs said no. One button (to rule them all). The iPhone now only has the 1 button.

So I took inspiration from that and designed my library to have that quality. Alright, alright, here’s a code sample:

SLDocument sl = new SLDocument();
sl.Save();

That will save an empty Excel file named “Book1.xlsx”. What, not Hello World enough for you?

SLDocument sl = new SLDocument();
sl.SaveAs("HelloWorld.xlsx");

There. Now the file is named “HelloWorld.xlsx”. What, sheet name? Most (if not all) of the libraries I researched required you to add a new worksheet to an empty file. All spreadsheets have at least one worksheet. Why force the programmer to do it anyway? You don’t see Microsoft Excel forcing the user to add worksheets in a newly created spreadsheet file, right? (Excel even has 3 worksheets added by default).

Alright, fine. The first worksheet’s name is by default “Sheet1″. You can rename it.

SLDocument sl = new SLDocument();
sl.RenameWorksheet(sl.DefaultFirstSheetName, "Hello");
sl.SaveAs("HelloWorld.xlsx");

There, happy? So, how do we set cell values?

SLDocument sl = new SLDocument();
sl.RenameWorksheet(sl.DefaultFirstSheetName, "Hello");
sl.SetCellValue(2, 3, 3.14159);
sl.SetCellValue(2, 4, "This is PI");
sl.SaveAs("HelloWorld.xlsx");

The cell with row 2, column 3 will have the value of PI. The cell with row 2, column 4 will have the string “This is PI”. Yes, the library supports cell references such as “C2″ and “D2″. My opinion? They make better sense to a user with visual interface to the spreadsheet. It’s much harder to use when you’re programming with a non-visual interface to the spreadsheet. Good luck iterating through rows 2 to 500,000, with columns 1 to 1000 (financial reports, I’m looking at you…).

Want to add a new worksheet?

SLDocument sl = new SLDocument();
sl.RenameWorksheet(sl.DefaultFirstSheetName, "Hello");
sl.SetCellValue(2, 3, 3.14159);
sl.SetCellValue(2, 4, "This is PI");
sl.AddWorksheet("SecondWorksheet");
sl.SetCellValue(5, 5, "Why am I not first?");
sl.SaveAs("HelloWorld.xlsx");

Hey, a software library is supposed to make your life easy. The second worksheet’s name is *drum roll*, “SecondWorksheet”. The string “Why am I not first?” is in row 5, column 5 of the newly added worksheet. How does the library know which worksheet to add which cell value? By magic. Ok, fine, it automatically keeps track of worksheets.

When a user enters a cell value in Excel, does the user need to know which worksheet? No, because that information is implied. The user knows which worksheet because the user chose it already. And so does this software library.

Oh yeah, I even have basic theme support!

SLDocument sl = new SLDocument(SLThemeTypeValues.Flow);
sl.RenameWorksheet(sl.DefaultFirstSheetName, "Hello");
sl.SetCellValue(2, 3, 3.14159);
sl.SetCellValue(2, 4, "This is PI");
sl.SaveAs("HelloWorld.xlsx");

That gives you the Flow theme, one of the built-in themes in Microsoft Excel (note: only the fonts and font colours are supported). You can even design your own custom theme.

System.Drawing.Color[] clrs = new System.Drawing.Color[12];
clrs[0] = System.Drawing.Color.White;
clrs[1] = System.Drawing.Color.Black;
clrs[2] = System.Drawing.Color.WhiteSmoke;
clrs[3] = System.Drawing.Color.DarkSlateGray;
clrs[4] = System.Drawing.Color.DarkRed;
clrs[5] = System.Drawing.Color.OrangeRed;
clrs[6] = System.Drawing.Color.DarkGoldenrod;
clrs[7] = System.Drawing.Color.DarkOliveGreen;
clrs[8] = System.Drawing.Color.Navy;
clrs[9] = System.Drawing.Color.Indigo;
clrs[10] = System.Drawing.Color.SkyBlue;
clrs[11] = System.Drawing.Color.MediumPurple;

SLDocument sl = new SLDocument("ColourWheel", "Castellar", "Harrington", clrs);
sl.RenameWorksheet(sl.DefaultFirstSheetName, "Hello");
sl.SetCellValue(2, 3, 3.14159);
sl.SetCellValue(2, 4, "This is PI");
sl.SaveAs("HelloWorld.xlsx");

There are 12 colours you need to define. These correspond to the 2 light colours, 2 dark colours, 6 accent colours, the hyperlink colour and the followed hyperlink colour. “ColourWheel” is the theme name, “Castellar” is the major Latin font and “Harrington” is the minor Latin font. The major Latin font is used when you apply the Title named cell style. The minor Latin font is basically the body font.

What named cell style? A customer suggested supporting the feature.

Named cell styles

So how do you apply it?

sl.ApplyNamedCellStyle(2, 3, SLNamedCellStyleValues.Good);
sl.ApplyNamedCellStyle(2, 4, SLNamedCellStyleValues.Accent1);

I am finishing up version 1 of the library, and it will soon be available. I’m targeting a launch in January 2012. The software library will be called SpreadsheetLight. The primary idea is for it to be simple and clean. Simple for you to use, and you write clean code when you use it.

My internal tests show that SpreadsheetLight runs faster than 2 other free libraries. I won’t tell you which 2, because it’s not relevant, and because I’m automatically biased, and because it’s just not nice to the other programmers who contributed to those 2 libraries. The point is that it runs fast and is effective, which I’m happy about.

Calculating Excel spreadsheet column names

I’ve been working with Open XML spreadsheets for the past, I don’t know how long… A year? I just realised that getting that Excel column header name is a frequent task. You know, given that it’s the 4th column, it’s “D”. I don’t work frequently with spreadsheets with lots of columns. So it was interesting that the 26th column is “Z” and the 27th column becomes “AA”. Basically, base-26 arithmetic, using the 26 letters of the English alphabet as tokens.

There are probably lots of code snippets out there showing you how to calculate a column name given the column index. Here’s mine:

string[] saExcelColumnHeaderNames = new string[16384];
string[] sa = new string[] { "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z" };
string s = string.Empty;
int i, j, k, l;
i = j = k = -1;
for (l = 0; l < 16384; ++l)
{
    s = string.Empty;
    ++k;
    if (k == 26)
    {
        k = 0;
        ++j;
        if (j == 26)
        {
            j = 0;
            ++i;
        }
    }
    if (i >= 0) s += sa[i];
    if (j >= 0) s += sa[j];
    if (k >= 0) s += sa[k];
    saExcelColumnHeaderNames[l] = s;
}

That gives you a zero-based indexing version. So to get the 30th column name, you use saExcelColumnHeaderNames[29].

In case you’re wondering, 16384 is the maximum number of columns supported by Excel 2010.

You will notice that it’s not a function given the column index. I find that not as useful. Look, typically when you need the column name, you probably also need to get it frequently, usually with different parameters.

What I did was to store all the calculation results into a string array. Then you reference it with an index. The calculation function typically is a O(n) operation. With you needing to use the function multiple times, your whole algorithm probably goes up to O(n^2).

My method is also an O(n) operation. But referencing a string array is I think an O(1), meaning it’s a constant. I’ve never been good with big O notation…

This style of solving the problem is called pre-calculation. Pre-calculation is especially useful in the games development region, where speed is important. For example, selected values of sine and cosine were pre-calculated and stored in arrays, for use in the numerous 3D/2D calculations in games. Calculating sine’s and cosine’s in real-time were detrimental to a speedy game.

That’s not as useful now because you need a fuller range of floating point values as input. But the concept is still useful.

I think I read somewhere (while I was doing hobbyist game development) this quote:

Pre-calculate everything!

Maybe computers are now much faster. I don’t care. That doesn’t give you an excuse to be sloppy. It’s an optimisation that doesn’t take much effort.

If you need to calculate it, see if you can calculate it just once.

Decompiling Open XML spreadsheets

Ok, I’m going to reveal the big secret project that I’ve been working on for the last 2 months. I’m writing a software program that will decompile Open XML spreadsheets into C# and VB.NET source code.

Now I know what you’re thinking. “But Vincent, there’s that SDK Productivity Tool that does that already!”

Frankly, when I started the project, I didn’t even think about the SDK tool. But, when I looked at the generated source code from the SDK tool, I found it… hideous. There were 2 things I found annoying:

  • New classes were created willy-nilly
  • Properties were dumped into class instantiation using object initialisers

The first point meant that most of the classes were created one-off. It didn’t matter if you needed a class of type SomeClass multiple times. The SDK tool simply created another class of type SomeClass. If that class type was used multiple times, you’ll see variables named someClass1, someClass2 all the way to someClass21. It’s why I wrote about multiple use variables versus multiple variables.

The second point meant that if a class has many properties, you might end up with something like:

CellFormat cellFormat3 = new CellFormat(){ NumberFormatId = (UInt32Value)0U, FontId = (UInt32Value)10U, FillId = (UInt32Value)9U, BorderId = (UInt32Value)0U, ApplyNumberFormat = false, ApplyBorder = false, ApplyAlignment = false, ApplyProtection = false };

That’s one line of code.

The problem I have with object initialisers is when you need to comment something in between. Commenting in C# and VB.NET means an entire line is commented, although C# offers the /* comment */ variant. There’s just no easy way to do so. Compare with this:

cellFormat = new CellFormat();
cellFormat.NumberFormatId = 0U;
cellFormat.FontId = 11U;
cellFormat.FillId = 10U;
cellFormat.BorderId = 0U;
cellFormat.ApplyNumberFormat = false;
cellFormat.ApplyBorder = false;
cellFormat.ApplyAlignment = false;
cellFormat.ApplyProtection = false;

I just find that easier to pick and choose stuff I don’t want.

Now the big advantage (my differentiation or unique selling proposition) is that I offer VB.NET too. The SDK tool doesn’t. Here’s a snippet:

run = New Run()
run.RunProperties = New RunProperties()
run.RunProperties.Append(New FontSize() With {.Val = 11R})
clr = New Color()
clr.Theme = 1UI
run.RunProperties.Append(clr)
run.RunProperties.Append(New RunFont() With {.Val = "Calibri"})
run.RunProperties.Append(New FontFamily() With {.Val = 2})
run.RunProperties.Append(New FontScheme() With {.Val = FontSchemeValues.Minor})

You will notice that I do use object initialisers. “That’s hypocritical of you!”. Perhaps, but I use them when the number of properties is small. I’ve kept it to 3 for now. Object initialisers in my case also made it easier that I don’t have to declare and instantiate new classes with actual variable names.

I understand why the SDK tool generates source code the way it does. It has to do with completely iterating through every single part and class of the root class SpreadsheetDocument. If you’ve ever written code to traverse a tree structure, you’ll know how tedious it can be.

The one thing the SDK tool lacks about the source code it generates is context. It runs through the entire Open XML document structure like a squirrel looking for every single acorn on a tree. It doesn’t stop to check any acorn for size, defects or even if it’s an acorn. Look, winter’s coming soon, and the squirrel doesn’t have all day telling you that this particular acorn is related to that particular acorn, and no it doesn’t care how big the acorn is, it’s got the teeth to eat it, ok?

Why are we talking about squirrels again?

So, after about 20 thousand lines of code, I’m just barely getting my software into beta mode. Halfway through that, my heart sank with the enormity of the task. In order to generate more readable code, I cannot iterate through the XML tree structure like the SDK tool. I had to stop and make sense of what the class was.

That made me look at the SDK help file and the ECMA-376 specification file way too much… Did you know the ECMA spec is like over 5000 pages long? And that’s part 1. Parts 2, 3 and 4 are smaller, but still heavyweights in their own right. And there are so many classes and child classes and grandchildren classes and properties and…

I’m going to at least make a valiant effort to have the software self-complete on a subset of the Excel functionality (and thus a subset of the SDK). If you’re interested, I present to you SoxDecompiler. As of this writing, I’m just trying to see if people are interested in the software, so it’s just a page to collect email addresses of the people interested in the software. I think I wrote “interested” way too many times…

For some reason, the name conjures an image of a thread slowly unravelling a sock. But I like it. It stands for “Spreadsheet Open XML Decompiler”.