Setting it free

It was the hardest, most painful decision I had to make in a long while.

I had worked hard on the project. I downloaded a few open-source versions of similar projects for comparison. I went through the use cases, on how to do certain tasks with those open-source projects.

Then I went to a couple of commercial projects. I couldn’t afford to buy them, but I went through sample source code to compare the use cases as well.

I did all that for research. Then I came up with a list of features that would be useful. Then I implemented them in a way that makes it easy for programmers to use.

It was also supposed to be a “big ticket” item, something I could sell for a higher price. I’m biased of course, but I believe my software is easier to use than either the open-source projects or the commercial projects.

The open-source projects targeted people who wanted free software, or supported open source projects (vehemently sometimes), or small to medium businesses. The commercial projects targeted the big enterprises. I targeted the small to medium businesses, appealing to the programmers or the IT managers/directors of those businesses and companies.

I launched my project. It didn’t do very well.

I added more features. I wrote detailed documentation of the software. I tweaked the price. I changed the sales copy. I did some advertising. I marketed the software project as best as I could. I created a whole website around it. Months of work went into the project. Still it floundered.

In the end, I dug into the core of why the project existed, and it was because I wanted to make the lives of programmers easier. And putting the software behind a payment wall might have put a dent in that.

And so I set it free.

It was heart-wrenching. I cried. Not so much for the lack of sales, but more because no one wanted to use the software. Ok, fine, it was equal measure of “no one using it” and “I need to eat”.

So if you have a few minutes, I’d appreciate it if you’d check out my spreadsheet software library. It’s called SpreadsheetLight, and it’s free for download, and has source code available. Tell someone who might find the software useful. Thanks.

Translating user requirements to code can be hard

Back when I was working as a professional programmer (my job title was “Systems Analyst”), I would attend meetings with my supervisor to gather user requirements. Sometimes, I’m not required at the meeting, but typically, my supervisor needed to know if something was technically feasible. That’s where I came in.

At those meetings, which could be 3 to 4 hours long, I would gather notes. (Once I had to attend a whole-day event. I had to look at it as “I learnt more about my users’ business” rather than “I wasted an entire day”. I’ll tell you about it some other time…) Sometimes, these notes weren’t written because my users specifically told me about it. It’s just that I took note of potential technical limitations and problems that my users didn’t see.

For example, they might draw a sketch of the user interface on the whiteboard and I’d copy it if necessary. Then I took notes on the possible internal code structure if necessary.

The hardest kinds of user requirement gathering are the super-really-obvious ones. Even to you.

The hardest user requirement I’ve ever been given was an Excel spreadsheet. That’s it. The user basically said, “I want you to give me a report that looks like this Excel file.” Implicit requirement was that the file data reflect current database data.

I had to hunt for the source of the data (which database?). Other than that, I had to figure out how to create the Excel spreadsheet layout based on the given sample file.

You can’t really ask what settings the particular spreadsheet cell has, because the user would say “It’s there! In the file! What do you mean how did I set it?” All the styling/layout requirements was already in the Excel file.

I’m starting a course!

All this is really a long way to tell you that I’m starting a course teaching about Open XML spreadsheets, using the Open XML SDK as part of the tool kit.

“But there are lots of resources online about Open XML! And they’re free!”

I know. They tell you how to do a specific task, such as setting a cell value or “This is how to read an Open XML file” and so on. But they don’t tell you why you should do it.

I’ve got a couple of customers (who bought my Open XML spreadsheet reference) asking me, “How do you set this style?”. Then they show me a screenshot or the actual Excel file, and it’s like this complicated (or elegantly professional, depending on your perspective) mess of a jumble of cells with background colours and borders and OMG is that a merged cell?! (One programmer sent me the actual file he’s to simulate, and I felt really sorry for him…)

So in addition to teaching you about the Open XML parts and code and stuff, I’ll also teach you how to translate a given Open XML spreadsheet into code that generates that spreadsheet.

Tentative price is USD 30 for the whole course. The course is about 8 to 10 lessons, with each lesson given weekly. I’ve a rough outline of the course curriculum, but I’m open to suggestions at this point (contact me or leave a comment). It should be up in the next couple of weeks.

SpreadsheetLight Relaunch

Just a quick note to let you know I’ve moved my spreadsheet software library, SpreadsheetLight, to its own website. Check it out.

Since the first launch in January, I’ve added support for tables, conditional formatting, basic charts and lots of small additional functions to make a spreadsheet programmer’s life easier.

To do this, I read books on how to use Excel. You read that right, I didn’t read up on how to write code libraries, I read up on how normal people use Excel. Because I want you to write spreadsheet code as easily as a person using Excel.

If an Excel guru recommends you to use a particular tip on your charts, I want to take that into account when I design the library functions.

Anyway, if you’re looking for a spreadsheet library, consider SpreadsheetLight. Or tell your manager. Tell your friends. I appreciate it. Thanks!

Converting HSL to RGB

There seems to be 5 flavours of the “HSL”:

  • HSL: Hue, Saturation, Lightness
  • HSL: Hue, Saturation, Luminance
  • HSV: Hue, Saturation, Value
  • HSB: Hue, Saturation, Brightness
  • HSI: Hue, Saturation, Intensity

I don’t understand why there are so many variants. I like RGB.

I needed a way to convert HSL (the luminance version) to RGB. Now .NET doesn’t have a way to convert any of the variants to its built-in System.Drawing.Color structure. Although the structure offers the GetHue(), GetSaturation() and GetBrightness() functions.

Generally speaking, the hue is between 0 and 360 degrees of a colour wheel. Saturation determines “how much” of that colour is displayed. And luminance determines “how bright/dark” that colour is.

So here’s the code to convert HSL to RGB (in the form of a System.Drawing.Color struct):

public System.Drawing.Color RgbFromHsl(double Hue, double Saturation, double Luminance)
{
    double fChroma = (1.0 - Math.Abs(2.0 * Luminance - 1.0)) * Saturation;
    double fHue = Hue / 60.0;
    double fHueMod2 = fHue;
    while (fHueMod2 >= 2.0) fHueMod2 -= 2.0;
    double fTemp = fChroma * (1.0 - Math.Abs(fHueMod2 - 1.0));

    double fRed = 0, fGreen = 0, fBlue = 0;
    if (fHue < 1.0)
    {
        fRed = fChroma;
        fGreen = fTemp;
        fBlue = 0;
    }
    else if (fHue < 2.0)
    {
        fRed = fTemp;
        fGreen = fChroma;
        fBlue = 0;
    }
    else if (fHue < 3.0)
    {
        fRed = 0;
        fGreen = fChroma;
        fBlue = fTemp;
    }
    else if (fHue < 4.0)
    {
        fRed = 0;
        fGreen = fTemp;
        fBlue = fChroma;
    }
    else if (fHue < 5.0)
    {
        fRed = fTemp;
        fGreen = 0;
        fBlue = fChroma;
    }
    else if (fHue < 6.0)
    {
        fRed = fChroma;
        fGreen = 0;
        fBlue = fTemp;
    }
    else
    {
        fRed = 0;
        fGreen = 0;
        fBlue = 0;
    }

    double fMin = Luminance - 0.5 * fChroma;
    fRed += fMin;
    fGreen += fMin;
    fBlue += fMin;

    fRed *= 255.0;
    fGreen *= 255.0;
    fBlue *= 255.0;

    int iRed = 0, iGreen = 0, iBlue = 0;
    // the default seems to be to truncate rather than round
    iRed = Convert.ToInt32(Math.Truncate(fRed));
    iGreen = Convert.ToInt32(Math.Truncate(fGreen));
    iBlue = Convert.ToInt32(Math.Truncate(fBlue));
    if (iRed < 0) iRed = 0;
    if (iRed > 255) iRed = 255;
    if (iGreen < 0) iGreen = 0;
    if (iGreen > 255) iGreen = 255;
    if (iBlue < 0) iBlue = 0;
    if (iBlue > 255) iBlue = 255;

    return System.Drawing.Color.FromArgb(iRed, iGreen, iBlue);
}

The algorithm is taken from the Wikipedia page on HSL and HSV. I added some checks at the end to make sure the RGB values are within 0 to 255.

About the only problem I had was the mod function. It’s not “integer modulus”, it’s “floating point modulus” (is that a thing?). So 5.5 mod 2 is 1.5 (because the remainder of 5.5/2 is 1.5). That took me a couple of hours to figure out. This is one of the few cases where thinking in programming terms instead of the mathematical terms cost me.

Design philosophy of a software library

Recently, I received an email from a customer. “Where are the Workbook and Worksheet class variables?” (I’m paraphrasing). The background is that I sell a spreadsheet software library (check it out here).

My answer is “I don’t want you to worry about them.” (I’m paraphrasing my reply).

From hours of looking at source code from other spreadsheet libraries, I’ve come to the conclusion that spreadsheets are kind of hard to create programmatically (read: super flying fishball noodly tedious). From hours of looking at hundreds of Excel spreadsheets, I’ve come to the conclusion that Excel makes spreadsheets ridiculously easy for users.

Why are programmers working harder than Excel users?

So here’s my story. Back in my last job, I dealt with lots of financial data. Millions of rows of data in databases, with information on how long a satellite call was, how much data in megabytes that email was and how many instant messages were sent. And each row had a price attached to it.

The marketing department wanted to know what products and services were up. The sales people wanted to know their commissions. The customer service officers wanted to know if the customer exceeded the usage limit. The directors wanted to know the quarterly results.

I maintained internal websites for the staff to get and update all that information. I also maintained public websites for customers to get information about their usage. Business logic and requirements were flying at me all over the place.

Now ASP.NET allows me to throw a bunch of data at a DataGrid (or GridView, or whatever it’s called now) and it’s nicely displayed on a web page.

Then people wanted to download all that tabular data into an Excel spreadsheet. Oh the horror…

I’m a programmer. I can write text files, schedule emails, update information into Sybase or SQL Server or Oracle databases, but I haven’t a clue about Excel spreadsheets.

The most complicated thing I’ve had to do in Excel was create a rectangular grid with numeric data and do a sum. There’s a SUM function in Excel. I know that much.

I survived that by doing a ton of research and testing. And what eventually resulted was an Open XML reference manual for spreadsheets, but that’s a different story.

I’m not much of an Excel user. I’m guessing you’re not too.

So when I did research on what an actual spreadsheet library can do, I was appalled at the amount of code I still had to write and figure out. “What do you mean I have to write 20 lines of code? I can do that in Excel with a few clicks on the mouse!”.

Those spreadsheet libraries carefully exposed classes and functions and interfaces that Excel uses internally. But you know what? The Excel user sees none of that.

A common part is that all the libraries require the programmer to keep track of worksheet classes. Such as workbook.worksheets[0] or workbook.worksheets[“Sheet1”] or something similar.

Do you see Excel forcing the user to keep track? No. The user sees a bunch of tabs that represent the existing worksheets, but she always only see the actively selected worksheet, because that’s the one she’s concerned about.

So while internally, SpreadsheetLight (my library) has a Workbook class and a Worksheet class, the programmer doesn’t have to worry about it. My design philosophy is that you should be able to do whatever you need in the spreadsheet as easily as you can in Excel.

Here’s how you create a table with Open XML SDK:

Table table1 = new Table(){ Id = (UInt32Value)2U, Name = "Table2", DisplayName = "Table2", Reference = "I2:O13", TotalsRowCount = (UInt32Value)1U };
table1.AddNamespaceDeclaration("x", "http://schemas.openxmlformats.org/spreadsheetml/2006/main");
AutoFilter autoFilter1 = new AutoFilter(){ Reference = "I2:O12" };

SortState sortState1 = new SortState(){ Reference = "I3:O12" };
SortCondition sortCondition1 = new SortCondition(){ Descending = true, Reference = "K3:K12" };

sortState1.Append(sortCondition1);

TableColumns tableColumns1 = new TableColumns(){ Count = (UInt32Value)7U };
TableColumn tableColumn1 = new TableColumn(){ Id = (UInt32Value)1U, Name = "Col9", TotalsRowLabel = "Totals" };
TableColumn tableColumn2 = new TableColumn(){ Id = (UInt32Value)2U, Name = "Col10" };
TableColumn tableColumn3 = new TableColumn(){ Id = (UInt32Value)3U, Name = "Col11" };
TableColumn tableColumn4 = new TableColumn(){ Id = (UInt32Value)4U, Name = "Col12" };
TableColumn tableColumn5 = new TableColumn(){ Id = (UInt32Value)5U, Name = "Col13" };
TableColumn tableColumn6 = new TableColumn(){ Id = (UInt32Value)6U, Name = "Col14" };
TableColumn tableColumn7 = new TableColumn(){ Id = (UInt32Value)7U, Name = "Col15", TotalsRowFunction = TotalsRowFunctionValues.Sum };

tableColumns1.Append(tableColumn1);
tableColumns1.Append(tableColumn2);
tableColumns1.Append(tableColumn3);
tableColumns1.Append(tableColumn4);
tableColumns1.Append(tableColumn5);
tableColumns1.Append(tableColumn6);
tableColumns1.Append(tableColumn7);
TableStyleInfo tableStyleInfo1 = new TableStyleInfo(){ Name = "TableStyleDark4", ShowFirstColumn = true, ShowLastColumn = true, ShowRowStripes = true, ShowColumnStripes = true };

table1.Append(autoFilter1);
table1.Append(sortState1);
table1.Append(tableColumns1);
table1.Append(tableStyleInfo1);
return table1;

Here’s how you do it with my library:

SLTable tbl = new SLTable("I2", "O12");

tbl.HasTotalRow = true;
// 1st table column, column I
tbl.SetTotalRowLabel(1, "Totals");
// 7th table column, column O
tbl.SetTotalRowFunction(7, SLTotalsRowFunctionValues.Sum);
tbl.SetTableStyle(SLTableStyleTypeValues.Dark4);

tbl.HasBandedColumns = true;
tbl.HasBandedRows = true;
tbl.HasFirstColumnStyled = true;
tbl.HasLastColumnStyled = true;

// sort by the 3rd table column (column K) in descending order
tbl.Sort(3, false);

“Wait, where do I set the table name?” Don’t worry about it.

“Shouldn’t I need to set the table column IDs and values? Where do I set ‘Col9’ and ‘Col10’ and the others?” Don’t worry about it.

“How do I know what string value to use for the table style?” Don’t worry about it. Use one of the built-in enumerations.

“You know, the section on sorting doesn’t include the header row. Shouldn’t I…” Don’t. Worry. About. It.

You know what happens in Excel? You select a bunch of cells, decide to make it a table, select a table style and POOF! A table appears. I want the programmer to feel just as awesome.

Frankly speaking, this library of mine is what I wish I had when I was doing all that coding in my last job. The last thing I needed was figuring out how to database dump tabular data onto an Excel spreadsheet. The company I worked for was a telecommunications company, working with satellite providers, service providers, content producers. The core business doesn’t involve making Excel spreadsheets, but sharing information inevitably involves Excel spreadsheets being passed around.

If you’re designing a software library, consider going beyond “allow the programmer to do X”. Consider “allow the programmer to do X in like, one line”, or at least very easily. Because the programmer might not care about X as much as you do.

Unless X is “make spreadsheet library” of course.

Dot notation dropdown hell

Have you written something like this:
body.LeftArm.ForeArm.Hand.Pinky.Twitch()

Would this be easier?
body.Twitch(BodyPart.LeftPinky)

I’ve been thinking about this a lot, because I’m writing a software library. You know what’s hard? Deciding what classes, functions and properties to expose to the programmer.

My software library deals with spreadsheets, and for uhm, research, I downloaded 2 free open-source libraries for comparison. Then I looked at sample code for 2 commercial libraries too.

You know what I found?

For the most part, the libraries just expose the underlying class structures to the programmer.

While this gives the programmer ultimate coding power, I personally find this exhausting. It’s like the first time I encountered the .NET Framework. There’s an overwhelming number of classes with their own functions and enumerations… “I just want to write one single sentence to a file!” Even that took me a few minutes to get used to. Luckily there’s extensive documentation, or I’d just collapse under the weight.

Do you remember Nokia? It’s a telecommunications company, but I know it as a mobile phone maker. My experience with Nokia phones were that they were probably designed by engineers and programmers. There were a lot of dropdown menus.

We programmers can think in hierarchies. But users don’t usually think in hierarchies (I think there’s research showing dropdown menus on web sites confuse users).

Consider the basic task in programming: declaring a variable. Here’s how you do this in VB:
Dim asdf as Int32

Here’s how you do this in C# (and any C-family):
int asdf;

When I want a variable, I already know what type it should be. The type matters to me and the compiler. The name matters a little to me, and practically none to the compiler. By the time I type “Dim asdf as” I already forgot what type I wanted, because I was so busy coming up with a variable name.

I feel this is backwards. Yes, I kinda have a thing about VB in this case…

Remember the pinky twitching example?

body.LeftArm.ForeArm.Hand.Pinky.Twitch()
This forces the programmer to go all the way back to the root class, and then traverse the properties down again to reach the Twitch() function.

body.Twitch(BodyPart.LeftPinky)
This shortcuts a lot of the traversing.

I have a lot of respect for the programmers working on open-source projects. They give their time and effort to improving software without pay. I just feel the design sometimes leave a little to be desired.

Have you looked at an iPhone? The interface allows the user to reach something within a couple of taps.

Do you know Google recommends that website links be no more than 3 levels deep? This means every page should be available from every other page via no more than 3 clicks.

So why did the first twitching example need 5 levels to reach my Twitch() function?

Collision detection in merging Excel cells

You can merge cells in Excel. “Duh!” you say. So how does Excel ensure that the cells you want to merge don’t overlap any existing merged cells? How do you check programmatically that the cells you want to merge won’t overlap any existing merged cells?

Well, that was the problem I faced while writing my spreadsheet library software (check out SpreadsheetLight here!). The inkling of an idea was formed when I visited StackOverflow. Can’t remember what I was there for, but the answer page wasn’t about Excel or spreadsheets. It was about collision detection.

Yes, the type of collision detection used in games so you don’t walk through walls and stuff.

And thus I learnt a new mathematical theorem: the separating axis theorem (or hyperplane separation theorem). From what I understand, given two N-dimensional thingies, if you can find one (N-1)-dimensional hyperplane that separates those two thingies, then those two thingies are separate (or not overlapping).

Yes, “thingies” is a technical term.

For example, given two 3D objects, if you can find one 2D plane that separates them, then the 3D objects are not overlapping. Another example is when you’re given two 2D objects, and if you can find one line (which is 1D) that separates them, then the 2D objects are not overlapping.

There are some conditions to be fulfilled, such as the objects being convex and disjoint and whatever. (yes I’m a mathematician…) I’ll leave it to you to do your own reading.

But for our purposes, how do we check if that rectangle of Excel cells we want to merge won’t overlap with any existing merged cells? We have these conditions that make our lives easier:

  • The merged cells are rectangular
  • The 4 end points (top-left, top-right, bottom-left, bottom-right) of a merged cell are in whole numbers
  • The (merged) cells map strictly to a 2D rectangular grid

Since the merged cells are rectangular, they’re also convex (I’m not going to explain why, just trust me). Since the 4 end points are in whole numbers, line boundaries can be easily calculated (it’s easy to check if something is <7 or >=8 for example). And since they map strictly to a 2D rectangular grid, the separating axis is a line. And even better, you can always find a subset of solutions of those separating axes that are either horizontal or vertical.

Ok, diagrams are in order.

Collision detection in merged cells

So in our case, as long as you can find one horizontal line or one vertical line that separates the 2 merged cells we’re checking, then the 2 merged cells are separate.

Let’s try an example. Excel uses cell references that are in the A1 format, which means row 3 column 4 becomes “D3”. The column index uses alphabets and is in base-26. Read more here.

We’re going to simplify that. Let’s say our existing set of merged cells only has one entry. This merged cell has the top-left corner at row 1 column 1, and the bottom-right corner at row 3 column 4. Suppose we want to check if this merge cell overlaps: top-left corner at row 7 column 8, bottom-right corner at row 10 column 10.

The horizontal line of 5 fits the bill, or the line y=5 if you want to be mathematical about it (but y goes from negative to positive downwards instead of the usual Cartesian y). Or y=6. Or even y=7 (note that the line can “touch” one of the merged cells, but not both. This is where the “whole number” condition comes in).

The vertical lines x=5, x=6 or x=8 also fit the bill.

Thus, our 2 merged cells don’t overlap.

So what’s the programmatic way to check? You’d be surprised at the simplicity. To make it easier, the variables with “StartRowIndex” mean the top row of the merged cell, “EndRowIndex” mean the bottom row of the merged cell. And “StartColumnIndex” mean the left-most column of the merged cell, and “EndColumnIndex” mean the right-most column of the merged cell.

if (!(iEndRowIndex < mc.StartRowIndex || iStartRowIndex > mc.EndRowIndex || iEndColumnIndex < mc.StartColumnIndex || iStartColumnIndex > mc.EndColumnIndex))
{
    // there's an overlap!
}

So the merge cell we want to check has top-left corner at (iStartRowIndex, iStartColumnIndex) and bottom-right corner at (iEndRowIndex, iEndColumnIndex).

The variable “mc” refers to a custom class I use to represent merged cells. Obviously, you’d run that condition in a loop through all your existing merged cells. If you can’t find an overlap after running through that loop, then the merge cell you’re checking is good to go.

Let’s run through the individual conditions.

  • (iEndRowIndex < mc.StartRowIndex) means our merged cell is completely above the existing merged cell
  • (iStartRowIndex > mc.EndRowIndex) means our merged cell is completely below the existing merged cell
  • (iEndColumnIndex < mc.StartColumnIndex) means our merged cell is completely to the left of the existing merged cell
  • (iStartColumnIndex > mc.EndColumnIndex) means our merged cell is completely to the right of the existing merged cell

The first 2 conditions check for the existence of a horizontal separating axis. The next 2 conditions check for the existence of a vertical separating axis.

Note the negation of the entire boolean condition. Those 4 conditions check for existence of solutions. Negation means checking for overlaps.

Note also that we use strictly greater than or strictly less than checks. If the merge cells share a row or column, then they overlap, right?

Did you know my first draft of writing the code had me checking 6 different conditions? Each condition had its own if statements. This one just had one. I’m so lucky to have learnt the separating axis theorem. I was checking if a point was above, within or below the merge cell, then I had to check if the point was to the left, within or to the right of the merge cell, and … urgh, it was horrible…

Open XML SDK class structure

I’ve gotten a few questions on the class structure of the Open XML SDK. There are articles on Open XML itself, where you work directly with XML files and tags, and zip them up yourself. Basically you work with WordprocessingML (Word), SpreadsheetML (Excel), PresentationML (PowerPoint) and possibly DrawingML (for images and stuff). Eric White did a lot of stuff on this.

There are also articles on the use of Open XML SDK itself. However, the articles I’ve found tend to give you code samples and some explanation of why you do those things, but didn’t really explain the deep “why”.

The fundamental question I was asked was “How are the Open XML SDK classes related to each other?“. A related question is “Why do I have to use that particular class?”.

While I’m familiar with the spreadsheet portion of the SDK, I believe the general class structure applies to the WordprocessingML and PresentationML parts too. So I’ll use the SpreadsheetML part as the example.

I also didn’t find any article giving names to the class categories I’m going to tell you, so I’m going to make up my own. If there’s an official Microsoft article on this, let me know. Generally speaking, there are 4 types of SDK classes:

  • Relationship Part Classes (henceforth referred to as RPCs)
  • Root Classes (henceforth referred to as RCs)
  • Content Classes (henceforth referred to as CCs)
  • Special Classes

Before I continue, keep in mind that Open XML documents are basically a bunch of XML files zipped together. Just like a relational database, certain XML files are related to each other (just like certain database tables are related to each other). Which brings us to the first type of class.

Relationship Part Classes

RPCs are the glue that holds certain SDK classes together. They are most easily recognisable by their class type name. For example, WorkbookPart and WorksheetPart.

However, not all SDK classes with names that end with “Part” are RPCs. For example, TablePart is not an RPC (it’s actually a Content Class). The corresponding RPC is actually TableDefinitionPart.

The most important part of an RPC is that it carries a relationship ID, and this relationship ID is used to tie relevant classes together. An RPC is also different in that it has as a property, a Root Class.

Root Classes

Remember that Open XML documents are a bunch of XML files zipped together? Well, an RC represents one of those XML files.

For example, the RPC WorksheetPart has as its RC, the Worksheet class. The Worksheet class holds information that basically translates into an XML file, typically sheet1.xml (and sheet2.xml and so on). The Worksheet class contains your worksheet cell data information.

Content Classes

If RCs represent an XML file, then CCs are basically XML tags.

For example, the Worksheet class contains the SheetData class, which contains the Row class(es), which in turn contains the Cell class(es). The corresponding XML tags are “worksheet”, “sheetData”, “row” and “c”.

Yes, an RC represents an XML file, and also translates to be the first XML tag of that XML file. That’s why it’s called a Root Class, because it also represents the root element (of the underlying XML structure/document).

Special Classes

These aren’t really that special. As far as I know, there are only 3 classes under this category: WordprocessingDocument, SpreadsheetDocument and PresentationDocument.

Those 3 classes form the starting point of any code relying on the Open XML SDK. You can consider them as Super Relationship Part Classes, because their properties are mainly RPCs.

An illustration

You might still be confused at this point (I don’t blame you…). Here’s a diagram for a simple Open XML spreadsheet:
Open XML SDK class structure

In green, we have the Special Class SpreadsheetDocument as the ultimate root.

In blue, we have the RPCs, 1 WorkbookPart class and 2 WorksheetPart classes. The SpreadsheetDocument class has the WorkbookPart class as a property. The WorkbookPart class contains a collection of WorksheetPart classes.

In grey, we have the RCs, 1 Workbook class and 2 Worksheet classes. The Workbook class is the RC of WorkbookPart class. The Worksheet classes are RCs of corresponding WorksheetPart classes. The Workbook class represents the workbook.xml file and the Worksheet classes (typically) represent sheet1.xml and sheet2.xml files.

In orange, we have the CCs. The Workbook class contains the Sheets class, which in turn contains 2 Sheet classes. The Sheet classes have a property holding the relationship ID of the corresponding WorksheetPart classes, which is how they’re tied to the Worksheet classes.

One of the most confusing parts…

After working with the Open XML SDK for a while, you might find yourself asking these questions:

  • Why are there so many classes?
  • Why are some of these classes devoid of any meaningful functionality?
  • Why are some of these classes duplicates of each other?

When I was first using the SDK, I felt the same way when I first used the .NET Framework: Being overwhelmed. There were many namespaces, with many classes in them, and I didn’t know which class to use for a specific purpose until I looked it up and wrote a small test program for it. Having a comprehensive help database/file for the .NET Framework was a really good idea.

And so it was with the Open XML SDK. I mean, it’s a spreadsheet. I can see a couple dozen of classes. Maybe. It turns out to be a lot of classes. That’s why there are so many code samples out there, but you don’t really know why you need to use that particular class.

And there are classes without any meaningful properties or functions. They inherit from a base Open XML class, and that’s it. For example, the Sheets class.

Then there are classes with identical properties. For example, the InlineString class, the SharedStringItem class and CommentText class. Or the Color, BackgroundColor, ForegroundColor and TabColor classes.

The answer is the same for all the above questions. The Open XML SDK is meant to abstract away the XML file structure.

There are duplicate classes because each class eventually translates into an XML tag (if it’s a CC or RC). XML requires a different tag for different purposes, hence the Open XML SDK has different classes. Even though the classes are identical in programming functionality, they become rendered as different XML tags.

There are classes without any seemingly meaningful properties or functions because their sole purpose is to have children. (Ooh Open XML SDK joke!) The Sheets class has as children, the Sheet classes. In XML, they’re correspondingly the “sheets” tag with “sheet” tags as children. The final XML tags have no XML attributes, hence the corresponding SDK classes also have no properties. Tada!

And finally, there are so many classes, because frankly speaking, you need one SDK class corresponding to each individually different XML tag. There are a lot of XML tags used in Open XML, hence so many classes. And that’s before we add in the Relationship Part Classes.

If you have any questions, leave them in a comment or contact me. And I’ll see if I can answer them in an article.

A Sandbox In The Cloud

I am honoured and excited to bring you an article written by a Rackspace staff, Joseph Palumbo. My thoughts will be at the end of this article. Thanks Joseph! Disclaimer: I’m not paid by Rackspace.

As a founding member of Rackspace’s Managed Cloud support team, Joseph spends half of his time teaching customers about the Cloud and the other half learning about the Cloud from them. Follow him on Twitter.

Solid, high performing websites and web applications don’t happen by accident. From imagining an idea, creating code and developing an intuitive user experience, there are many behind the scenes tasks to ensure everything works smoothly.

Despite how simple a website or web app might appear, the reality is that even the simplest looking sites can have powerful and complex code behind them. The complexity means that one small change can take down the entire site. However, both business needs and technologies evolve, necessitating changes to your site. The choice, however, is how you implement these changes.

You can choose to make code changes to your live, production environment, but this is a dangerous proposition. By doing so, you assume the risk of making a mistake that can be visible to users, or creating an error that can make your entire site go dark for an extended period of time. The better alternative is for businesses to create a test and dev sandbox that mirrors the live environment, but in the recent past, this was expensive to do. The high cost presented a difficult decision: do you spend the money to create a test and development environment or do you assume the risk of introducing a bug or error into the live environment?

With the advances in cloud computing, you no longer have to choose. Businesses can easily clone their production environment and create a test and dev sandbox. In the cloned site, developers can replicate the ratio of usage rather than purchase all of the horsepower; this means that you can have a more cost effective version of your site because you aren’t serving up production traffic.

This cloned site can be created on demand for testing code changes and will literally cost just pennies per hour. Not only can businesses create a cloned site for temporary testing, the cloud presents a cost effective solution for a long-term test and dev sandbox.

Furthermore, the test dev sandbox allows experimentation to happen behind the scenes without anyone outside the company (or the IT department) ever knowing. While you are making changes to your test and dev sandbox, the production site is humming along, bringing in revenue, collecting customer data and maintaining your online presence.

Once your developers have perfected the changes and are ready to move the mirror site into production, it can be uploaded directly. If there is a load balancer in front of your configuration, you have the ability to make the test environment the new production environment. You simply make a change on the load balancer, redirecting traffic from the old production site to the new production site in the middle of the night. This is easily done from a systems administration point of view and can result in little or no downtime to your configuration.

In the past, I would receive frantic phone calls from people who didn’t have a test and dev site and didn’t know their code very well. They only had a production site and were trying to make changes, but they were concerned about the potential of bringing down their site, or even worse, making an irreparable error such as erasing part of their database.

The cloud lowers the cost of having a test and dev site, allowing businesses to prove out their code changes without adversely impacting their production site. You can have peace of mind that you won’t make visible mistakes to your users or delete any of their data. Peace of mind is worth every penny – and with the cloud, it won’t cost very many pennies to have.

Post-article thoughts

I have personally maintained development, test and production web servers, along with the corresponding web sites. It can get exhausting, especially when you have to coordinate the efforts of other developers and testers (from dev and test sites), and juggle inquiries from customers and customer service officers (from live sites).

I’ve also personally done server maintenance. There was this one time when there was a change in some wiring structure in the data centre, and I had to be there personally (because there’s no one else) to make sure my servers were still operational after the change. I’m really not a hardware kind of guy…

DNS propagation, IP address settings, SSL certificates, server upgrades. If there was an easy way to enclose all that into a standalone testing environment, my life would have been so much easier.

Partial fractions in SQL queries

I never thought my maths training would come in handy again. I was working on a financial report, and one of the requirements was to have a particular calculated value show up. The formula didn’t make sense to me, but it was a business/financial logic requirement, so I just dealt with it.

So here’s the core of the problem (specific values had been changed):

select sum(A)/sum(B) - 0.7 from sometable

where “A” and “B” are columns of the database table “sometable”.

So what’s the problem? That select query won’t run. Or at least it didn’t run from a Sybase database (or was it an SQL Server database?). I’m not saying it ran but the value was wrong. I mean it didn’t even execute. Just in case you asked, “A” and “B” are numeric data columns so the sum function will work.

I don’t know how I came up with the idea of using partial fractions. Given that only 0.08%* of staff in the entire office building had maths background, and I probably made up the entire 0.08%, I didn’t have anyone to bounce ideas off of and be told “How about you try using partial fractions on that, Vincent?”

(* a completely made up statistic)

Anyway, I tried using partial fractions, and it worked. Now in partial fractions, you typically deal with decomposing a fraction into 2 or more fractions. Here, we’re combining fractions into 1 fraction. Let me show you.

sum(A)/sum(B) – 0.7
= sum(A)/sum(B) – 7/10
= ( 10*sum(A) – 7*sum(B) ) / 10*sum(B)

If I remember correctly, this (equivalent) SQL query will work:

select ( 10*sum(A) - 7*sum(B) ) / 10*sum(B) from sometable

I’m not a database expert. If you know why that works but not the original (and more direct) version, leave a comment.

[UPDATE: A commenter told me that complicated maths functions don’t work on aggregates. The sum(A) result is an aggregated result. Apparently sum(A)/sum(B) is too complicated. Oh well…]

As part of that same programming task, I had to deal with another similar problem:

select 50 * (sum(A)/sum(B) - 0.7) from sometable

That SQL query also didn’t run. So here’s the partial fraction combining process:

50 * (sum(A)/sum(B) – 0.7)
= 50*sum(A)/sum(B) – 35
= ( 50*sum(A) – 35*sum(B) ) / sum(B)

Alternatives

Now I know there’s another option. I could get sum(A) and sum(B) individually, and then do the required calculation in code (C# code as opposed to database SQL code. I was dealing with ASP.NET then).

After considering my options, I decided to leave all the calculations at the database side. This makes the ASP.NET code “cleaner”. Then I only have to deal with one return value (instead of 2, sum(A) and sum(B)), and I can bind it directly to my database objects for display on the web browser.

Also, there were where (and group-by? Can’t remember…) clauses in the SQL query. I didn’t know if I obtained sum(A) and sum(B) individually (even if they were in the same query) that that will affect their values. I decided to play it safe, and just get it all in one resulting value from the same query.

I didn’t check for efficiency. It wasn’t an oft-used report, so the code execution won’t be run frequently enough to matter.

But if you’re curious enough to do some tests, go ahead. If you then want to share your results, I’d very much appreciate it too.