Open XML SDK class structure

I’ve gotten a few questions on the class structure of the Open XML SDK. There are articles on Open XML itself, where you work directly with XML files and tags, and zip them up yourself. Basically you work with WordprocessingML (Word), SpreadsheetML (Excel), PresentationML (PowerPoint) and possibly DrawingML (for images and stuff). Eric White did a lot of stuff on this.

There are also articles on the use of Open XML SDK itself. However, the articles I’ve found tend to give you code samples and some explanation of why you do those things, but didn’t really explain the deep “why”.

The fundamental question I was asked was “How are the Open XML SDK classes related to each other?“. A related question is “Why do I have to use that particular class?”.

While I’m familiar with the spreadsheet portion of the SDK, I believe the general class structure applies to the WordprocessingML and PresentationML parts too. So I’ll use the SpreadsheetML part as the example.

I also didn’t find any article giving names to the class categories I’m going to tell you, so I’m going to make up my own. If there’s an official Microsoft article on this, let me know. Generally speaking, there are 4 types of SDK classes:

  • Relationship Part Classes (henceforth referred to as RPCs)
  • Root Classes (henceforth referred to as RCs)
  • Content Classes (henceforth referred to as CCs)
  • Special Classes

Before I continue, keep in mind that Open XML documents are basically a bunch of XML files zipped together. Just like a relational database, certain XML files are related to each other (just like certain database tables are related to each other). Which brings us to the first type of class.

Relationship Part Classes

RPCs are the glue that holds certain SDK classes together. They are most easily recognisable by their class type name. For example, WorkbookPart and WorksheetPart.

However, not all SDK classes with names that end with “Part” are RPCs. For example, TablePart is not an RPC (it’s actually a Content Class). The corresponding RPC is actually TableDefinitionPart.

The most important part of an RPC is that it carries a relationship ID, and this relationship ID is used to tie relevant classes together. An RPC is also different in that it has as a property, a Root Class.

Root Classes

Remember that Open XML documents are a bunch of XML files zipped together? Well, an RC represents one of those XML files.

For example, the RPC WorksheetPart has as its RC, the Worksheet class. The Worksheet class holds information that basically translates into an XML file, typically sheet1.xml (and sheet2.xml and so on). The Worksheet class contains your worksheet cell data information.

Content Classes

If RCs represent an XML file, then CCs are basically XML tags.

For example, the Worksheet class contains the SheetData class, which contains the Row class(es), which in turn contains the Cell class(es). The corresponding XML tags are “worksheet”, “sheetData”, “row” and “c”.

Yes, an RC represents an XML file, and also translates to be the first XML tag of that XML file. That’s why it’s called a Root Class, because it also represents the root element (of the underlying XML structure/document).

Special Classes

These aren’t really that special. As far as I know, there are only 3 classes under this category: WordprocessingDocument, SpreadsheetDocument and PresentationDocument.

Those 3 classes form the starting point of any code relying on the Open XML SDK. You can consider them as Super Relationship Part Classes, because their properties are mainly RPCs.

An illustration

You might still be confused at this point (I don’t blame you…). Here’s a diagram for a simple Open XML spreadsheet:
Open XML SDK class structure

In green, we have the Special Class SpreadsheetDocument as the ultimate root.

In blue, we have the RPCs, 1 WorkbookPart class and 2 WorksheetPart classes. The SpreadsheetDocument class has the WorkbookPart class as a property. The WorkbookPart class contains a collection of WorksheetPart classes.

In grey, we have the RCs, 1 Workbook class and 2 Worksheet classes. The Workbook class is the RC of WorkbookPart class. The Worksheet classes are RCs of corresponding WorksheetPart classes. The Workbook class represents the workbook.xml file and the Worksheet classes (typically) represent sheet1.xml and sheet2.xml files.

In orange, we have the CCs. The Workbook class contains the Sheets class, which in turn contains 2 Sheet classes. The Sheet classes have a property holding the relationship ID of the corresponding WorksheetPart classes, which is how they’re tied to the Worksheet classes.

One of the most confusing parts…

After working with the Open XML SDK for a while, you might find yourself asking these questions:

  • Why are there so many classes?
  • Why are some of these classes devoid of any meaningful functionality?
  • Why are some of these classes duplicates of each other?

When I was first using the SDK, I felt the same way when I first used the .NET Framework: Being overwhelmed. There were many namespaces, with many classes in them, and I didn’t know which class to use for a specific purpose until I looked it up and wrote a small test program for it. Having a comprehensive help database/file for the .NET Framework was a really good idea.

And so it was with the Open XML SDK. I mean, it’s a spreadsheet. I can see a couple dozen of classes. Maybe. It turns out to be a lot of classes. That’s why there are so many code samples out there, but you don’t really know why you need to use that particular class.

And there are classes without any meaningful properties or functions. They inherit from a base Open XML class, and that’s it. For example, the Sheets class.

Then there are classes with identical properties. For example, the InlineString class, the SharedStringItem class and CommentText class. Or the Color, BackgroundColor, ForegroundColor and TabColor classes.

The answer is the same for all the above questions. The Open XML SDK is meant to abstract away the XML file structure.

There are duplicate classes because each class eventually translates into an XML tag (if it’s a CC or RC). XML requires a different tag for different purposes, hence the Open XML SDK has different classes. Even though the classes are identical in programming functionality, they become rendered as different XML tags.

There are classes without any seemingly meaningful properties or functions because their sole purpose is to have children. (Ooh Open XML SDK joke!) The Sheets class has as children, the Sheet classes. In XML, they’re correspondingly the “sheets” tag with “sheet” tags as children. The final XML tags have no XML attributes, hence the corresponding SDK classes also have no properties. Tada!

And finally, there are so many classes, because frankly speaking, you need one SDK class corresponding to each individually different XML tag. There are a lot of XML tags used in Open XML, hence so many classes. And that’s before we add in the Relationship Part Classes.

If you have any questions, leave them in a comment or contact me. And I’ll see if I can answer them in an article.

Playing the keyboard

I thought I’d let you see the notes I was playing on the keyboard for a previous video. Due to various reasons, the actual music was comprised of 2 separate parts that I recorded. There was some construction going on when I recorded that video. And yes, those were birds singing outside my house…

I’ve also never taken piano lessons, so stand down Mozart…

Original video:
http://www.youtube.com/watch?v=J4s_HgvGsRc

What creates jobs?

There were a few recent financial crises. The subprime mortgage. Bailouts of large companies by (American) government. The sovereign debt in Europe.

I thought we were past the worst of things. The economy seems to be recovering, if not already recovering (I’m not an economist). Then I saw this wonderful ad.

Ideal job ad

Jobs are in short supply globally. I just thought Singapore would have weathered this gracefully. But then that advertisement came up recently, so perhaps Singapore had a delayed reaction.

Governments don’t create jobs

I respect every civil servant out there. You should also be aware that their salary comes from taxes. Where else would the government get their money from? (There are probably foreign investments and donations and whatnot, but I presume the majority would be from taxes).

By that logic, hiring more government staff generally means higher or more taxes. If I’m wrong, please leave a comment.

For the purposes of discussion, I would include military forces as part of the government. The military is what businesses call a “cost centre”, meaning it doesn’t bring in revenue directly but costs something to maintain. Well, ok, the military can sell weapons and security technology. But mainly, the military just eats up money.

I am probably offending politicians and military officers left and right here… Go read something else then.

IT departments are usually considered cost centres (well, they were in my previous company). Unless your company is a software company, then your software department making your software products becomes your main cash generating department.

Big companies don’t create jobs

The recent waves of financial downturns, together with globalisation, have kept big companies on their toes. Cost cutting can only go so far.

The reason big companies are big is not because they hire a lot of people (that might have been true in the past), but because they can scale up quickly. They have the infrastructure and people in place. However, that scaling ability comes at a cost. Maintaining the status quo requires money too.

To remain competitive in these current times, there have been mergers and acquisitions. If your company buys up an IT company, your company no longer really needs an internal IT department. Staff from the original internal IT department and the recently bought IT company will be combined, and as expected, people will be let go due to job/task redundancy.

So big companies are not going to be creating many jobs because they’re busy trying to get their costs and profit margins in order in this increasingly chaotic (relatively speaking) economic environment.

Small companies don’t create jobs

Small companies have their own problems too. I will include startups in this. The main problem, in direct contrast to big companies, is scale. Although that can be overcome with some creativity and especially with the Internet.

There was once I applied for a job at a small startup here in Singapore. I asked a friend who was already working there. I was willing to be an unpaid intern. I didn’t get a reply. Either my “friend” didn’t put it up to the founder for consideration, or my “friend” was afraid I’d steal jobs. In this case, small companies are small because they have a small number of staff.

So while existing small companies might not create enough jobs to make a difference, new small companies will still create jobs. Which brings us to the next point…

Entrepreneurs (might) not create jobs

Well, entrepreneurs create companies, right? They usually start small, so they create jobs. Right?

I guess.

My definition of “entrepreneur” comes from the dictionary, which is “one who takes on risks of a business or enterprise”. Based on that, I consider myself an entrepreneur. I run a business selling software that I wrote. I assume the risks of a business. By definition, I’m an entrepreneur.

But I don’t hire anybody. I outsource some of my tasks (usually design work), but the only job I’ve created is custom made for me. And I’m filling it right now.

And typically, for those pursuing “lifestyle businesses”, they don’t hire a lot of people too. They usually outsource, and even if they do hire staff, it’s probably 2 to 5 people.

Not quite the 50 to 500 staff for small companies, huh?

So, what does create jobs?

Customers create jobs

I can’t remember who said this. Basically, a job exists when someone is willing to pay for something, and then pays it. When you have customers, you create jobs.

A customer wants a latte. Someone has to grow, harvest, transport, and grind the coffee beans. Someone has to milk a cow. Someone has to brew that coffee. Someone has to take the money from the customer.

Your customer wants a monthly report on how much money he’s making. He doesn’t want to learn how to write SQL statements to grab information from his database. That’s what you’re for.

On the most basic level, it’s about cash flow. A job exists to facilitate the transfer of money from the customer to you. If you’re an employee, then the money goes to your company, who then pays you.

So if you want to find a job, find out how to create customers for the company you’re interested in.

Tax season and SpreadsheetLight deal

It’s currently tax season. Making financial and tax reports can be taxing (haha!). I know, because I’ve spent years writing software that creates Excel reports for sales, revenue, debt and other financial reports.

So from now till the end 15th of April 2012, I’m offering my spreadsheet software library at the unbelievably low cost of USD 199 150. SpreadsheetLight runs on .NET Framework and Open XML SDK, is written in C#, uses the MIT License, and is designed to be easy to integrate into existing software projects with a minimum of fuss.

UPDATE: The promotion will be until 16th April 2012, and I’m selling it at USD 150. Need a spreadsheet library software? Get it now before I raise the price.

The reason is because I want you to have an easier time writing your programs. My experience in churning out Excel reports tells me it can be frustrating at times…

So check out SpreadsheetLight. Or get your manager to take a look.

A Sandbox In The Cloud

I am honoured and excited to bring you an article written by a Rackspace staff, Joseph Palumbo. My thoughts will be at the end of this article. Thanks Joseph! Disclaimer: I’m not paid by Rackspace.

As a founding member of Rackspace’s Managed Cloud support team, Joseph spends half of his time teaching customers about the Cloud and the other half learning about the Cloud from them. Follow him on Twitter.

Solid, high performing websites and web applications don’t happen by accident. From imagining an idea, creating code and developing an intuitive user experience, there are many behind the scenes tasks to ensure everything works smoothly.

Despite how simple a website or web app might appear, the reality is that even the simplest looking sites can have powerful and complex code behind them. The complexity means that one small change can take down the entire site. However, both business needs and technologies evolve, necessitating changes to your site. The choice, however, is how you implement these changes.

You can choose to make code changes to your live, production environment, but this is a dangerous proposition. By doing so, you assume the risk of making a mistake that can be visible to users, or creating an error that can make your entire site go dark for an extended period of time. The better alternative is for businesses to create a test and dev sandbox that mirrors the live environment, but in the recent past, this was expensive to do. The high cost presented a difficult decision: do you spend the money to create a test and development environment or do you assume the risk of introducing a bug or error into the live environment?

With the advances in cloud computing, you no longer have to choose. Businesses can easily clone their production environment and create a test and dev sandbox. In the cloned site, developers can replicate the ratio of usage rather than purchase all of the horsepower; this means that you can have a more cost effective version of your site because you aren’t serving up production traffic.

This cloned site can be created on demand for testing code changes and will literally cost just pennies per hour. Not only can businesses create a cloned site for temporary testing, the cloud presents a cost effective solution for a long-term test and dev sandbox.

Furthermore, the test dev sandbox allows experimentation to happen behind the scenes without anyone outside the company (or the IT department) ever knowing. While you are making changes to your test and dev sandbox, the production site is humming along, bringing in revenue, collecting customer data and maintaining your online presence.

Once your developers have perfected the changes and are ready to move the mirror site into production, it can be uploaded directly. If there is a load balancer in front of your configuration, you have the ability to make the test environment the new production environment. You simply make a change on the load balancer, redirecting traffic from the old production site to the new production site in the middle of the night. This is easily done from a systems administration point of view and can result in little or no downtime to your configuration.

In the past, I would receive frantic phone calls from people who didn’t have a test and dev site and didn’t know their code very well. They only had a production site and were trying to make changes, but they were concerned about the potential of bringing down their site, or even worse, making an irreparable error such as erasing part of their database.

The cloud lowers the cost of having a test and dev site, allowing businesses to prove out their code changes without adversely impacting their production site. You can have peace of mind that you won’t make visible mistakes to your users or delete any of their data. Peace of mind is worth every penny – and with the cloud, it won’t cost very many pennies to have.

Post-article thoughts

I have personally maintained development, test and production web servers, along with the corresponding web sites. It can get exhausting, especially when you have to coordinate the efforts of other developers and testers (from dev and test sites), and juggle inquiries from customers and customer service officers (from live sites).

I’ve also personally done server maintenance. There was this one time when there was a change in some wiring structure in the data centre, and I had to be there personally (because there’s no one else) to make sure my servers were still operational after the change. I’m really not a hardware kind of guy…

DNS propagation, IP address settings, SSL certificates, server upgrades. If there was an easy way to enclose all that into a standalone testing environment, my life would have been so much easier.

Different authors

Here’s the list of authors I was going to talk about.

Andy McNab – Firewall. (Too many weapons/tactics/militant details.)
Eoin Colfer – Artemis Fowl
Anthony Horowitz – Alex Rider
Matthew Reilly – Scarecrow, Jack West Jr (Highly recommended author)
Rick Riordan – Percy Jackson, Kane Chronicles
Jim Butcher – Harry Dresden (Also recommended if you like magic and stuff)
Graham Brown – Mayan Conspiracy
Pittacus Lore – I am Number Four, The Power of Six
Michael Crichton – Next, Timeline
David Baldacci – Hell’s Corner, Camel Club and First Family
Darren Shan – Demonata, Cirque du Freak