DefinedNames in Cells for Open XML spreadsheets

So a while back, a customer of mine asked me if I knew how to set names (or labels) to cells in a spreadsheet, so that a cell formula just referenced those names. Frankly, I didn’t even know I could do that. I’m not an Excel wizard, as you can tell.

In case you’re in a hurry, here’s the code and the resulting Excel spreadsheet. The code runs against the Open XML SDK.

To my surprise, the name given to the spreadsheet cell isn’t defined at the Cell class level. You’d think you would name your child and make sure to slap a name tag on your progeny just so everyone knows what to call your offspring, right? After looking through the documentation, and thinking it through, I guess it makes sense to separate it. It works by having a central depository with all the available names, contained within DefinedNames and DefinedName classes. Then the names are available throughout the spreadsheet’s workbook.

It’s sort of like the SharedStringTable, where every piece of text is stored in a SharedStringItem, and referenced with an index. This has problems, in that I don’t know what is contained in the Cell class itself, but I don’t really want to go there right now…

Let’s go through the code a bit. Here’s the part where you define DefinedName(s):

defname = new DefinedName();
defname.Name = "PrimeNum1";
defname.Text = "Sheet1!$C$2";
defnames.Append(defname);

defname = new DefinedName();
defname.Name = "PrimeNum2";
defname.Text = "Sheet1!$C$3";
defnames.Append(defname);

defname = new DefinedName();
defname.Name = "PrimeNum3";
defname.Text = "Sheet1!$C$4";
defnames.Append(defname);

defname = new DefinedName();
defname.Name = "PrimeNum4";
defname.Text = "Sheet1!$C$5";
defnames.Append(defname);

In case you’re unfamiliar with Excel, “Sheet1!$C$5″ means the cell C5 of the sheet named “Sheet1″. The dollar sign acts as a separator, I think. In this case, the relevant cells are C2, C3, C4 and C5 (containing the 1st, 2nd, 3rd and 4th prime numbers). And then we have this part:

r = new Row();
r.RowIndex = 6;
c = new Cell();
c.CellReference = "B6";
c.DataType = CellValues.String;
c.CellValue = new CellValue("SUM");
r.Append(c);
c = new Cell();
c.CellReference = "C6";
c.CellFormula = new CellFormula("SUM(PrimeNum1, PrimeNum2, PrimeNum3, PrimeNum4)");
c.CellValue = new CellValue("17");
r.Append(c);
sd.Append(r);

Note this particular line:

c.CellFormula = new CellFormula("SUM(PrimeNum1, PrimeNum2, PrimeNum3, PrimeNum4)");

That’s our defined names in effect. Otherwise, we would use this:

c.CellFormula = new CellFormula("SUM(C2, C3, C4, C5)");
// or even
// c.CellFormula = new CellFormula("SUM(C2:C5)");

I teach you more about the CellFormula class, as well as a whole bunch of Open XML concepts in my programming guide.

3 facts about Open XML SDK

In the course of writing my guide on Open XML and Excel spreadsheets, I discovered some things about the Open XML SDK. There’s been some flak about the SDK or even on Open XML in general. But the price of the SDK is free, so you can’t beat that. While there are some inconveniences with using the SDK, the solutions and workarounds aren’t particularly nasty.

Here we go.

Open XML SDK is not a convenience library

If you’re looking for a neat little function that will toast bread, mop your floor, press your shirt and wish you “Have a nice day!” as you’re leaving the house, Open XML SDK is not what you’re looking for. (Actually if you find a software library that does the above, I wanna know.)

The Open XML SDK is a software development kit (hence the “SDK”). It is not a conventional software library in that sense. It provides atomic functions that allow you to do all sorts of nifty things to create/manipulate Open XML spreadsheets, word processing documents and presentation slides. The keyword is “atomic”.

This isn’t necessarily a bad thing. I know some companies will prefer not to use a software library (or third-party software). For example, government agencies or financial institutions, where data security is a concern and all software is (preferably) written in-house. The furthest they would allow is probably standard libraries.

I’m not going into a discussion about whether Open XML SDK is a “standard” library, since it’s from Microsoft. There seems to be a lot of unhappiness with Microsoft. Is it because of their monopoly? Because Bill Gates is rich? I like the .NET Framework, because it makes coding easier. Open XML SDK is built on top of that, so that makes Open XML documents easier to create and manipulate.

That said, because Open XML SDK offers atomic functions to do atomic tasks, this allows you to create your own functions to encapsulate whatever you’re trying to achieve. This flexibility is sometimes more valuable than convenience. Why do you think software companies charge a lot more for the source code of their libraries? You want to know what’s going on behind that function from a third-party library, because you don’t know if that bug is due to your own code or their code. (It’s probably your own code though…)

Open XML SDK is not a data manipulation library

There are 4 main parts to the Open XML SDK:

  • Spreadsheets
  • Word processing documents
  • Presentation slides
  • Graphics

The graphics part is a common set of class objects used by the other 3. Out of the first 3, spreadsheets are typically used to hold lots of data. Financial data, fiscal summaries, call logs, database dumps and so on. Text documents and slides hold data too, but not usually as intensive as spreadsheets.

Now it might surprise you that Open XML SDK offers no functions to manipulate any kind of data. None at all.

Need to dump 100 database records with 8 fields into a spreadsheet? You need to write code to fill in a 8-column-100-row block of cells.

Need to sort data? Do it within the database environment (there’s a SORT BY clause, you know?), or write your own sorting function. Because Open XML SDK ain’t got it.

This isn’t a big problem, but it does make writing code a little bit of a hassle. Particularly when you’re creating or manipulating a data-intensive spreadsheet. It’s not like when you’re attaching the DataReader object to a DataTable, and voila! All the data is nicely formatted and shown on a web page.

But this is also where the flexibility of writing your own functions come in.

Code execution order is important

When writing code with Open XML SDK, you have to put aside how you normally work with spreadsheets, text documents and slides. I’ll lump them together as “office documents”. Small “o” so it’s not copyright, right?

As a user, you would normally just go about your task of typing text, inserting images, styling text, cut-and-copy sections of an office document. The order of how you go about doing it is irrelevant, because you get to see the final result. And the final result is all that matters.

When working with Open XML SDK, you need to keep to an ordering of code sections. Each class in the SDK has to be appended to the appropriate parent class in the correct order, or you’ll get a corrupt file. This means your code has to be executed in the correct order too.

Go to the Open XML SDK help file. Let’s say we look at the Worksheet class. There will be a section called XML schema. In that, there will be an XML tag called “sequence”. See those “element” tags? That’s the order.

This code order isn’t a big problem. In your spreadsheet, it looks like an image is at the top, with a table of cell data below it. In code, you need to code for the table of cells first, then the image. Because that’s the order.

The solution is to ignore the visual placements of the data sections in your office document, and look at what is in your office document. Then you rearrange all the data parts according to the order, and then write code for that.

Conclusion

The Open XML SDK offers you flexibility in how you write your code and a great price (it’s free). It is backed by a large software company (Microsoft), so it’s not likely to disappear with no support in the future.

Sure it’s not as easy to use, but sometimes your situation doesn’t allow you to use a third-party library (cost, licensing, security). Hey, it works well enough. And sometimes, when you’re in a tight project, that’s all you really need. That you produce software (that produces office documents) well enough.

If you’re interested, you can check out my guide. I teach you how to create a spreadsheet with the standard functions in Excel such as text styling, image insertions, multiple worksheets, cell formulas. I also show you how to use a template in case you need something really complicated. And there’s working source code in C# and VB.NET.

Art, Birthdays and Ceriphs

A video announcement (sort of) of the June 2011 issue of Singularity. Check it out here if you haven’t done so already.

Spreadsheet Open XML V2

I was rushing to get this out. The updated version of my programming guide is out! I first launched it on 17 Jan this year, so I was hoping to meet the personal-and-unseen deadline of 17 May, so it’s a nice 4 months interval. Ah well, I’m the only one who cares anyway…

I’ve added loads more content to help you with your Open XML spreadsheet needs. Version 1 was 53 pages. Version 2 is 147 pages. There are a lot of pages with screenshots, but still… 147 pages!

You can find out more here.

The last few weeks had been interesting while I rushed to get working source code and write explanations for the guide… I need to sleep… wait, I’ve got a magazine deadline! *sigh*

Programming guide update

So. Quick update.

I’ve finished writing a bunch of code for my Open XML guide to create spreadsheets. They include:

  • Inserting comments
  • Using formulas
  • Setting header/footer options
  • Setting page setup options
  • Creating multiple worksheets
  • Freezing/Splitting panes
  • Using and styling tables (but not pivot tables)
  • How to write the cells if you have multiple sets of tabular data

The last one was a problem that a customer faced. Let’s say you have 2 sets of tabular data, and they’re side by side in the worksheet. Using Open XML SDK, you have to run through the 1st row of the 1st set of data, then the 1st row of the 2nd set of data. You can’t completely iterate over the 1st set of data first. This is because with Open XML SDK, you have to completely define the first row to be inserted into the worksheet. That will include the 1st row of the 2nd set of data.

So I’m (feverishly) typing the explanation part of the guide (the code is all done and tested). I’m telling you now because when I release this update, I’m going to increase the price. I calculated that with the amount of work done on writing the code and the explanation, and that I currently offer a 1-year free technical support together with the guide, I’m severely undercharging. Besides, the number of hours saved studying the guide versus learning from scratch is worth the price increase.

Since I currently give customers a free update within one year of purchase, you might want to get the guide now before the price goes up. Then you’ll get the current guide and do awesome stuff with it right now, and when I release the updated guide, you get it at no extra cost.

Depending on how fast I type (and how many cups of tea I imbibe), I might release the update in a week’s time or two. I don’t really have a deadline. But the latest release date will be around mid-May (because I’ll have to work on my magazine by then. But I digress…). But it might be sooner, because I want to get this off my todo list so I can work on something else.