Spreadsheet chart reference manual now available

After weeks of working on this, I’ve finally finished my reference manual on Open XML charts in spreadsheets. It’s called (imaginatively) Spreadsheet Open XML Charts.

It’s sort of a sequel to my other reference manual on Open XML spreadsheets.

If you do any work on Open XML spreadsheets and in charting specifically, I’m certain you’ll find this useful. Please check it out and let me know what you think.

Now if you’ll excuse me, I’ve got to go crash and do nothing. Perhaps get a cup of tea or hot chocolate.

How to create an Open XML spreadsheet bar chart

Today, you’re going to learn how to insert a bar chart into an Open XML spreadsheet. Frankly speaking, the basics apply to the other types of charts, so if you master this, you can figure out how to create the other charts.

First, download the source code and the resulting Excel file.

I’m not going to go through every detailed step of the code. Instead, I’ll go through broad strokes of explanation. You’re expected to know the basics of using Open XML SDK to create cells and insert images. Refresher on cells and styles here, and inserting multiple images here.

Hardcoded data

You may have noticed that the spreadsheet cell data is hardcoded into arrays. The reason is it’s used in creating the cells, and creating a copy of the data when creating the chart. Open XML charts store the actual data it’s representing.

For charts, there are 3 parts to the data: the category entries, the “legend” entries, and the cell data. “Legend” is in quotes because most charts use that data part as the legend, but it’s not called legend entries. I believe internally, it’s referred as the series name of data series.

Absolute cell references

You may also notice that in the chart parts, cell references are absolute. For example, “Sheet1!$B$3”. It means the cell references are absolute, as opposed to relative (such as “Sheet1!B3”).

The sheet name is used

You may also notice that the sheet name is used. This means you can have a chart in sheet “Sheet1” but contain data from “Sheet2”.

Axis IDs

If you’ve explored an Open XML spreadsheet chart, you may have noticed magic numbers used as axis IDs. So how does Excel calculate them? I don’t know, but you don’t have to worry about that.

Within each chart, as long as the axis IDs are unique, you’re fine (this means within each ChartSpace class). This means theoretically, you only need 6 unique axis IDs. There are 3 axis IDs (category/date axis, value axis and series axis) for the primary axis and another 3 for the secondary axis. The secondary axis comes into play when you plot a particular data series on its own (or against an existing secondary axis).

When data series are plotted on a secondary axis, they can also be plotted as a different chart type. This is what’s known as a combination chart. For example, a column chart with a line chart.

I don’t know all the valid types of combination charts, thus the theoretical limit is 6. But I don’t think 3D charts can be part of combination charts. In practice, you only need 4 unique IDs.

And you don’t even need to fret. Just use 1 through 6 as the IDs.

Data series

Excel assumes that if your data is wider than it’s tall, then rows of horizontal cells are taken as data (with first row as category entries and the first cell of each row as the “legend” entry). Otherwise, it’s columns of vertical data (with first column as category entries and the first cell of each column as the “legend” entry). You can change this if you find your chart weird-looking on the Excel UI, but programmatically, the source code uses rows as data series.

Each data series has the category entries, the associated “legend” entry, and the actual data.

Different charts have different Open XML SDK classes representing data series. For example, the bar charts and column charts use the BarChartSeries. The pie charts (and surprisingly the doughnut charts) use the PieChartSeries. The reason is that there are different properties associated with different charts, and thus each chart series needs to take care of that. For example, line charts can have markers on each data point, and pie charts can have exploded pie slices.

Charts are also images

A chart has 2 main parts: the chart content, and the drawing part.

The chart content contains the chart data, legend, title, axes and what not. The drawing part refers to how big the chart is displayed on the worksheet, and where to display it. And in this sense, a chart technically behaves like an image.

So not only do you have to create the chart content, you have to take care of its “image-y” properties. Hence the reference link I gave on inserting multiple images earlier on.

Because of this, you have to take care of unique image IDs. Each worksheet needs a unique ID assigned to each image part. And a chart is considered an image for this purpose.

Here’s the gist. A WorksheetPart class is tied to each separate worksheet. Each WorksheetPart class has a DrawingsPart class. And a DrawingsPart class can have ImageParts (the “normal” images) and ChartParts (for charts).

For more Inception-like information, a ChartPart can have ImagePart’s too (to have picture fills for the chart title or legend for example). It’s sort of complicated, so let’s not go there in this article.

The dreaded TwoCellAnchor

Position of images (and thus charts) is done with either an AbsoluteAnchor, OneCellAnchor or TwoCellAnchor class. I explained a little why TwoCellAnchor took out more of my hair than 4 years of university mathematics here.

But for charts, you don’t really need fine-tuned positioning. So a TwoCellAnchor is fine. The column/row properties are zero-based indices of the column and row indices. The offsets are in EMUs. Theoretically, the offsets are less than the column width or row height of its corresponding column or row.

Chart anatomy

The typical chart goes like this. Each chart has a title, plot area and legend. The plot area contains a chart type (or several chart types for combination charts), and relevant axes.

Each chart type might have specific properties unique to the chart type assigned (bar charts will have the bar direction assigned for example). And data series unique to the chart type assigned (the BarChartSeries for bar and column charts for example). And any relevant axis IDs assigned (some charts don’t have axes, such as doughnut charts).

Relevant axes are typically a category (or text) axis on the horizontal, and a value axis on the vertical. If the category entries are date values, Excel might have used a date axis instead of a category axis. The source code provided assumes textual category axis is used.

For data series charted on the secondary axis, the order of appending axes is as follows:

  • Primary horizontal axis (whether category or date axis) [typically at bottom]
  • Primary vertical axis (value axis) [typically on the left]
  • Secondary vertical axis (value axis) [typically on the right]
  • Secondary horizontal axis (whether category or date axis) [typically on the top]

Bonus!

If you change the bar direction in the source code to Column, you’ll get a clustered column chart. Everything else works just fine.

In fact, all the charts look surprisingly similar in terms of code structure. Change the chart class, and the chart series class used. Fiddle with a couple of properties unique to the particular chart type. BOOM! New chart born.

The really hard part is getting the chart data series down pat. Which is why I hardcoded the data for this explanation. In practice, you’re probably getting your data from a database, so you’ll be iterating through that. Or obtaining the actual cell data somehow. So I might as well have a simple data structure so you can expand it yourself instead of hardcoding the absolute cell references everywhere (which will be a pain for you to follow).

In conclusion…

Charts are hard… I’m writing a reference manual for Open XML charts in spreadsheets. So stay tuned for that. In the meantime, my software library SpreadsheetLight takes care of all the above in a few lines of code. So take a look at that (it’s free and source code is available).