Open XML SDK class structure

I’ve gotten a few questions on the class structure of the Open XML SDK. There are articles on Open XML itself, where you work directly with XML files and tags, and zip them up yourself. Basically you work with WordprocessingML (Word), SpreadsheetML (Excel), PresentationML (PowerPoint) and possibly DrawingML (for images and stuff). Eric White did a lot of stuff on this.

There are also articles on the use of Open XML SDK itself. However, the articles I’ve found tend to give you code samples and some explanation of why you do those things, but didn’t really explain the deep “why”.

The fundamental question I was asked was “How are the Open XML SDK classes related to each other?“. A related question is “Why do I have to use that particular class?”.

While I’m familiar with the spreadsheet portion of the SDK, I believe the general class structure applies to the WordprocessingML and PresentationML parts too. So I’ll use the SpreadsheetML part as the example.

I also didn’t find any article giving names to the class categories I’m going to tell you, so I’m going to make up my own. If there’s an official Microsoft article on this, let me know. Generally speaking, there are 4 types of SDK classes:

  • Relationship Part Classes (henceforth referred to as RPCs)
  • Root Classes (henceforth referred to as RCs)
  • Content Classes (henceforth referred to as CCs)
  • Special Classes

Before I continue, keep in mind that Open XML documents are basically a bunch of XML files zipped together. Just like a relational database, certain XML files are related to each other (just like certain database tables are related to each other). Which brings us to the first type of class.

Relationship Part Classes

RPCs are the glue that holds certain SDK classes together. They are most easily recognisable by their class type name. For example, WorkbookPart and WorksheetPart.

However, not all SDK classes with names that end with “Part” are RPCs. For example, TablePart is not an RPC (it’s actually a Content Class). The corresponding RPC is actually TableDefinitionPart.

The most important part of an RPC is that it carries a relationship ID, and this relationship ID is used to tie relevant classes together. An RPC is also different in that it has as a property, a Root Class.

Root Classes

Remember that Open XML documents are a bunch of XML files zipped together? Well, an RC represents one of those XML files.

For example, the RPC WorksheetPart has as its RC, the Worksheet class. The Worksheet class holds information that basically translates into an XML file, typically sheet1.xml (and sheet2.xml and so on). The Worksheet class contains your worksheet cell data information.

Content Classes

If RCs represent an XML file, then CCs are basically XML tags.

For example, the Worksheet class contains the SheetData class, which contains the Row class(es), which in turn contains the Cell class(es). The corresponding XML tags are “worksheet”, “sheetData”, “row” and “c”.

Yes, an RC represents an XML file, and also translates to be the first XML tag of that XML file. That’s why it’s called a Root Class, because it also represents the root element (of the underlying XML structure/document).

Special Classes

These aren’t really that special. As far as I know, there are only 3 classes under this category: WordprocessingDocument, SpreadsheetDocument and PresentationDocument.

Those 3 classes form the starting point of any code relying on the Open XML SDK. You can consider them as Super Relationship Part Classes, because their properties are mainly RPCs.

An illustration

You might still be confused at this point (I don’t blame you…). Here’s a diagram for a simple Open XML spreadsheet:
Open XML SDK class structure

In green, we have the Special Class SpreadsheetDocument as the ultimate root.

In blue, we have the RPCs, 1 WorkbookPart class and 2 WorksheetPart classes. The SpreadsheetDocument class has the WorkbookPart class as a property. The WorkbookPart class contains a collection of WorksheetPart classes.

In grey, we have the RCs, 1 Workbook class and 2 Worksheet classes. The Workbook class is the RC of WorkbookPart class. The Worksheet classes are RCs of corresponding WorksheetPart classes. The Workbook class represents the workbook.xml file and the Worksheet classes (typically) represent sheet1.xml and sheet2.xml files.

In orange, we have the CCs. The Workbook class contains the Sheets class, which in turn contains 2 Sheet classes. The Sheet classes have a property holding the relationship ID of the corresponding WorksheetPart classes, which is how they’re tied to the Worksheet classes.

One of the most confusing parts…

After working with the Open XML SDK for a while, you might find yourself asking these questions:

  • Why are there so many classes?
  • Why are some of these classes devoid of any meaningful functionality?
  • Why are some of these classes duplicates of each other?

When I was first using the SDK, I felt the same way when I first used the .NET Framework: Being overwhelmed. There were many namespaces, with many classes in them, and I didn’t know which class to use for a specific purpose until I looked it up and wrote a small test program for it. Having a comprehensive help database/file for the .NET Framework was a really good idea.

And so it was with the Open XML SDK. I mean, it’s a spreadsheet. I can see a couple dozen of classes. Maybe. It turns out to be a lot of classes. That’s why there are so many code samples out there, but you don’t really know why you need to use that particular class.

And there are classes without any meaningful properties or functions. They inherit from a base Open XML class, and that’s it. For example, the Sheets class.

Then there are classes with identical properties. For example, the InlineString class, the SharedStringItem class and CommentText class. Or the Color, BackgroundColor, ForegroundColor and TabColor classes.

The answer is the same for all the above questions. The Open XML SDK is meant to abstract away the XML file structure.

There are duplicate classes because each class eventually translates into an XML tag (if it’s a CC or RC). XML requires a different tag for different purposes, hence the Open XML SDK has different classes. Even though the classes are identical in programming functionality, they become rendered as different XML tags.

There are classes without any seemingly meaningful properties or functions because their sole purpose is to have children. (Ooh Open XML SDK joke!) The Sheets class has as children, the Sheet classes. In XML, they’re correspondingly the “sheets” tag with “sheet” tags as children. The final XML tags have no XML attributes, hence the corresponding SDK classes also have no properties. Tada!

And finally, there are so many classes, because frankly speaking, you need one SDK class corresponding to each individually different XML tag. There are a lot of XML tags used in Open XML, hence so many classes. And that’s before we add in the Relationship Part Classes.

If you have any questions, leave them in a comment or contact me. And I’ll see if I can answer them in an article.

The math behind 360 degree fisheye to landscape conversion

I wrote an article to convert a 360 degree fisheye image to a landscape view some time ago. I also realised I didn’t explain the math very much, mainly because I thought it’s fairly obvious. On hindsight, it doesn’t seem obvious. My apologies.

Commenter Eric pointed out a math technique called linear fractional transformation. It’s basically a function that can map lines and circles to lines or circles.

In theory, it seems applicable to our problem. When I tried working out the solution, I failed. 2 possible reasons: my math sucks, or the technique isn’t applicable to the problem. It’s probably the former…

My postulate is that, the fisheye-landscape conversion has an edge condition that maps a line to a point. Specifically, the bottom of the destination image maps to one point, the centre of the source image. Thus linear fractional transformation is probably not suitable. I’m happy to hear from you if you’ve found a way to make it work.

Let’s bring up the explanation diagram I had from before:

Fisheye to landscape explanation diagram

I have assumed a source image that’s square (width equals height) and its width has an even number of pixels. With this, let the given image be of width and height of 2l pixels. The idea is to construct an image with width of 4l pixels and height of l pixels, which is the landscape view of the source fisheye image.

The dimensions of the destination image was arbitrarily chosen. I just find that a height of l pixels (or half the height of the source image) to be convenient. The centre of the source image is assumed to be the centre of the small “planet”. This means the pixels along the horizontal and vertical bisectors of the source image will not be distorted (much) by the conversion.

Oh right, I haven’t told you about the exacts of the conversion math…

You should know that only the pixels in the inscribed circle of the source image would be in the destination image. This is due to the “uncurling” effect. The pixels not in the inscribed circle would be out of range in the destination image.

So, imagine the source image as a representation of the Cartesian plane. The centre of the image is the origin. The point A, is the eastern point of the inscribed circle. Points B, C and D are the northern, western and southern points of the inscribed circle respectively.

I’m using the Cartesian plane because the Cartesian quadrants make the math easier. Circles mean sines and cosines, so I’d rather work with angles in the typical form than do all the funny angle modifications. I’m not masochistic, you know…

What you should understand now is this: the pixels along the top of the destination image come from the pixels along the circumference of the inscribed circle on the source image.

We’ll be iterating over the destination image (remember my messy index preference?) Let’s start at the top left corner. We’ll be iterating 4l pixels to the top right corner. This is visualised as going clockwise on the source image from point A, to D, to C, to B and back to A.

So, 4l pixels is equivalent to 2 PI radians?

At the top left corner, we start with 2 PI radians (so to speak). As we iterate to the top right corner, the angle reduces to 0. Thus this formula:

theta = (4l – x)/(4l) * 2PI
where x is the Cartesian x axis.

Generally speaking, iterating from the left to right on the destination image is equivalent to going clockwise on the source image.

Now, as we iterate from the top left of the destination image to the bottom left, it’s equivalent to going from point A on the source image to the centre of the source image. Thus:

radius = l – y
where y is the Cartesian y axis.

Generally speaking, iterating from the top to bottom of the destination image is equivalent to going from edge of source image to centre of source image.

And once you understand that, the rest is just coding. I merged the code for converting Cartesian to raster coordinates together (more details here with code here). The code was deliberately left unoptimised so it’s easier to read.

For example,

theta = 2.0 * Math.PI * (double)(4.0 * l - j) / (double)(4.0 * l);

could be

theta = Math.PI * (double)(4.0 * l - j) / (double)(2.0 * l);

to save on operations. But the 2.0 * Math.PI makes it more meaningful.

The if condition

if (x >= 0 && x < (2 * l) && y >= 0 && y < (2 * l))

could have had the (2 * l) part assigned to a variable to avoid multiplication multiple times. You're welcome to use a variable perhaps named iSourceWidth for example.

And that's all I have. I hope you have fun with the code.