Collision detection in merging Excel cells

You can merge cells in Excel. “Duh!” you say. So how does Excel ensure that the cells you want to merge don’t overlap any existing merged cells? How do you check programmatically that the cells you want to merge won’t overlap any existing merged cells?

Well, that was the problem I faced while writing my spreadsheet library software (check out SpreadsheetLight here!). The inkling of an idea was formed when I visited StackOverflow. Can’t remember what I was there for, but the answer page wasn’t about Excel or spreadsheets. It was about collision detection.

Yes, the type of collision detection used in games so you don’t walk through walls and stuff.

And thus I learnt a new mathematical theorem: the separating axis theorem (or hyperplane separation theorem). From what I understand, given two N-dimensional thingies, if you can find one (N-1)-dimensional hyperplane that separates those two thingies, then those two thingies are separate (or not overlapping).

Yes, “thingies” is a technical term.

For example, given two 3D objects, if you can find one 2D plane that separates them, then the 3D objects are not overlapping. Another example is when you’re given two 2D objects, and if you can find one line (which is 1D) that separates them, then the 2D objects are not overlapping.

There are some conditions to be fulfilled, such as the objects being convex and disjoint and whatever. (yes I’m a mathematician…) I’ll leave it to you to do your own reading.

But for our purposes, how do we check if that rectangle of Excel cells we want to merge won’t overlap with any existing merged cells? We have these conditions that make our lives easier:

  • The merged cells are rectangular
  • The 4 end points (top-left, top-right, bottom-left, bottom-right) of a merged cell are in whole numbers
  • The (merged) cells map strictly to a 2D rectangular grid

Since the merged cells are rectangular, they’re also convex (I’m not going to explain why, just trust me). Since the 4 end points are in whole numbers, line boundaries can be easily calculated (it’s easy to check if something is <7 or >=8 for example). And since they map strictly to a 2D rectangular grid, the separating axis is a line. And even better, you can always find a subset of solutions of those separating axes that are either horizontal or vertical.

Ok, diagrams are in order.

Collision detection in merged cells

So in our case, as long as you can find one horizontal line or one vertical line that separates the 2 merged cells we’re checking, then the 2 merged cells are separate.

Let’s try an example. Excel uses cell references that are in the A1 format, which means row 3 column 4 becomes “D3″. The column index uses alphabets and is in base-26. Read more here.

We’re going to simplify that. Let’s say our existing set of merged cells only has one entry. This merged cell has the top-left corner at row 1 column 1, and the bottom-right corner at row 3 column 4. Suppose we want to check if this merge cell overlaps: top-left corner at row 7 column 8, bottom-right corner at row 10 column 10.

The horizontal line of 5 fits the bill, or the line y=5 if you want to be mathematical about it (but y goes from negative to positive downwards instead of the usual Cartesian y). Or y=6. Or even y=7 (note that the line can “touch” one of the merged cells, but not both. This is where the “whole number” condition comes in).

The vertical lines x=5, x=6 or x=8 also fit the bill.

Thus, our 2 merged cells don’t overlap.

So what’s the programmatic way to check? You’d be surprised at the simplicity. To make it easier, the variables with “StartRowIndex” mean the top row of the merged cell, “EndRowIndex” mean the bottom row of the merged cell. And “StartColumnIndex” mean the left-most column of the merged cell, and “EndColumnIndex” mean the right-most column of the merged cell.

if (!(iEndRowIndex < mc.StartRowIndex || iStartRowIndex > mc.EndRowIndex || iEndColumnIndex < mc.StartColumnIndex || iStartColumnIndex > mc.EndColumnIndex))
{
    // there's an overlap!
}

So the merge cell we want to check has top-left corner at (iStartRowIndex, iStartColumnIndex) and bottom-right corner at (iEndRowIndex, iEndColumnIndex).

The variable “mc” refers to a custom class I use to represent merged cells. Obviously, you’d run that condition in a loop through all your existing merged cells. If you can’t find an overlap after running through that loop, then the merge cell you’re checking is good to go.

Let’s run through the individual conditions.

  • (iEndRowIndex < mc.StartRowIndex) means our merged cell is completely above the existing merged cell
  • (iStartRowIndex > mc.EndRowIndex) means our merged cell is completely below the existing merged cell
  • (iEndColumnIndex < mc.StartColumnIndex) means our merged cell is completely to the left of the existing merged cell
  • (iStartColumnIndex > mc.EndColumnIndex) means our merged cell is completely to the right of the existing merged cell

The first 2 conditions check for the existence of a horizontal separating axis. The next 2 conditions check for the existence of a vertical separating axis.

Note the negation of the entire boolean condition. Those 4 conditions check for existence of solutions. Negation means checking for overlaps.

Note also that we use strictly greater than or strictly less than checks. If the merge cells share a row or column, then they overlap, right?

Did you know my first draft of writing the code had me checking 6 different conditions? Each condition had its own if statements. This one just had one. I’m so lucky to have learnt the separating axis theorem. I was checking if a point was above, within or below the merge cell, then I had to check if the point was to the left, within or to the right of the merge cell, and … urgh, it was horrible…

Named cell styles are still explicitly declared

Styling cells in Microsoft Excel has its difficulties (as I’ve written before). The biggest one is keeping track of all the indices. In Open XML SDK, you have the ability to have a named cell style.

At first glance, you might think that’s awesome. You just use a named cell style, and all the related styles are applied. It’s like there’s a red car that uses hybrid fuels. “Yes, I would like to have a red car that uses hybrid fuels.” Not quite.

You see, the named cell style is dependent on the implementing spreadsheet software. For example, Microsoft Excel has the “Normal”, “Bad”, “Good” and “Neutral” named cell styles. But Google Spreadsheets and OpenOffice.org Calc do not have to have those named cell styles, or even style it the same as Microsoft Excel. This is where Open XML SDK isn’t quite “open”… After much research and work, I discovered the SDK is basically Open-XML-ising Microsoft Excel (and Word and PowerPoint). I’m neutral on the stands of open source and “forcing standards”. I just use whatever there is, and make something within the limitations.

Because of the dependency, the underlying individual styles need to be declared explicitly. Actually more so because of the dependency.

So for my spreadsheet software library SpreadsheetLight, I used Excel as the guideline.

In researching Excel named cell styles, I had to look at the underlying XML files (because Open XML spreadsheets are made of XML files). While the Open XML SDK comes with a document explorer (the Productivity Tool), I needed to make notes and also that I felt the need to see the XML file itself, rather than using the explorer tool.

This gave me a problem because while XML files are supposed to be human-readable, it doesn’t make it easy to read. The “natural” XML file has no indents. Oh my Godiva chocolate, it’s so hard to read… Then I remembered I had an XML tool, XML Studio. I fired that up and a few clicks later, the XML file had nice indents and I could find out where the individual style tags were. XML Studio was amazing to use.

Disclaimer: I was given a free developer license of XML Studio by Liquid Technologies. But the software is really useful if you work with XML files a lot.

After doing my notes for a while, I discovered even that’s not enough. There were too many individual styles! I needed the indices for those styles, because only the index was referenced in the final style (CellFormat classes). I didn’t really feel up to annotating the indices… until I remembered my partially completed Open XML spreadsheet decompiler tool. When I created that tool, one of my aims was to put in comments on the index of the individual styles.

Note to Liquid Technologies: You might want to consider putting in XML comments on the index of an XML child tag with respect to its parent. But I don’t know if that’s useful to programming spheres other than Open XML…

Anyway, my hard work paid off, and SpreadsheetLight allows you to apply named cell styles. Here’s how the spreadsheet looks like:
Applying named cell styles

Note that some of the named cell styles use accent colours. The accent colours are part of the spreadsheet’s theme. So in offering named cell styles as a feature, I also had to allow you to create your own theme. And here’s the code using SpreadsheetLight:

System.Drawing.Color[] clrs = new System.Drawing.Color[12];
clrs[0] = System.Drawing.Color.White;
clrs[1] = System.Drawing.Color.Black;
clrs[2] = System.Drawing.Color.WhiteSmoke;
clrs[3] = System.Drawing.Color.DarkSlateGray;
clrs[4] = System.Drawing.Color.DarkRed;
clrs[5] = System.Drawing.Color.OrangeRed;
clrs[6] = System.Drawing.Color.DarkGoldenrod;
clrs[7] = System.Drawing.Color.DarkOliveGreen;
clrs[8] = System.Drawing.Color.Navy;
clrs[9] = System.Drawing.Color.Indigo;
clrs[10] = System.Drawing.Color.SkyBlue;
clrs[11] = System.Drawing.Color.MediumPurple;

SLDocument sl = new SLDocument("ColourWheel", "Castellar", "Harrington", clrs);

sl.SetRowHeight(6, 24);
sl.SetColumnWidth(1, 1);
sl.SetColumnWidth(2, 13);
sl.SetColumnWidth(3, 13);
sl.SetColumnWidth(4, 13);
sl.SetColumnWidth(5, 13);
sl.SetColumnWidth(6, 13);
sl.SetColumnWidth(7, 13);

sl.SetCellValue(2, 2, "Normal");
sl.ApplyNamedCellStyle(2, 2, SLNamedCellStyleValues.Normal);
sl.SetCellValue(2, 3, "Bad");
sl.ApplyNamedCellStyle(2, 3, SLNamedCellStyleValues.Bad);
sl.SetCellValue(2, 4, "Good");
sl.ApplyNamedCellStyle(2, 4, SLNamedCellStyleValues.Good);
sl.SetCellValue(2, 5, "Neutral");
sl.ApplyNamedCellStyle(2, 5, SLNamedCellStyleValues.Neutral);

sl.SetCellValue(3, 2, "Calculation");
sl.ApplyNamedCellStyle(3, 2, SLNamedCellStyleValues.Calculation);
sl.SetCellValue(3, 3, "Check Cell");
sl.ApplyNamedCellStyle(3, 3, SLNamedCellStyleValues.CheckCell);
sl.SetCellValue(3, 4, "Explanatory Text");
sl.ApplyNamedCellStyle(3, 4, SLNamedCellStyleValues.ExplanatoryText);
sl.SetCellValue(3, 5, "Input");
sl.ApplyNamedCellStyle(3, 5, SLNamedCellStyleValues.Input);

sl.SetCellValue(4, 2, "Linked Cell");
sl.ApplyNamedCellStyle(4, 2, SLNamedCellStyleValues.LinkedCell);
sl.SetCellValue(4, 3, "Note");
sl.ApplyNamedCellStyle(4, 3, SLNamedCellStyleValues.Note);
sl.SetCellValue(4, 4, "Output");
sl.ApplyNamedCellStyle(4, 4, SLNamedCellStyleValues.Output);
sl.SetCellValue(4, 5, "Warning Text");
sl.ApplyNamedCellStyle(4, 5, SLNamedCellStyleValues.WarningText);

sl.SetCellValue(6, 2, "Heading 1");
sl.ApplyNamedCellStyle(6, 2, SLNamedCellStyleValues.Heading1);
sl.SetCellValue(6, 3, "Heading 2");
sl.ApplyNamedCellStyle(6, 3, SLNamedCellStyleValues.Heading2);
sl.SetCellValue(6, 4, "Heading 3");
sl.ApplyNamedCellStyle(6, 4, SLNamedCellStyleValues.Heading3);
sl.SetCellValue(6, 5, "Heading 4");
sl.ApplyNamedCellStyle(6, 5, SLNamedCellStyleValues.Heading4);
sl.SetCellValue(6, 6, "Title");
sl.ApplyNamedCellStyle(6, 6, SLNamedCellStyleValues.Title);
sl.SetCellValue(6, 7, "Total");
sl.ApplyNamedCellStyle(6, 7, SLNamedCellStyleValues.Total);

sl.SetCellValue(8, 2, "Accent1");
sl.ApplyNamedCellStyle(8, 2, SLNamedCellStyleValues.Accent1);
sl.SetCellValue(8, 3, "Accent2");
sl.ApplyNamedCellStyle(8, 3, SLNamedCellStyleValues.Accent2);
sl.SetCellValue(8, 4, "Accent3");
sl.ApplyNamedCellStyle(8, 4, SLNamedCellStyleValues.Accent3);
sl.SetCellValue(8, 5, "Accent4");
sl.ApplyNamedCellStyle(8, 5, SLNamedCellStyleValues.Accent4);
sl.SetCellValue(8, 6, "Accent5");
sl.ApplyNamedCellStyle(8, 6, SLNamedCellStyleValues.Accent5);
sl.SetCellValue(8, 7, "Accent6");
sl.ApplyNamedCellStyle(8, 7, SLNamedCellStyleValues.Accent6);

sl.SetCellValue(9, 2, "Accent1Perc60");
sl.ApplyNamedCellStyle(9, 2, SLNamedCellStyleValues.Accent1Percentage60);
sl.SetCellValue(9, 3, "Accent2Perc60");
sl.ApplyNamedCellStyle(9, 3, SLNamedCellStyleValues.Accent2Percentage60);
sl.SetCellValue(9, 4, "Accent3Perc60");
sl.ApplyNamedCellStyle(9, 4, SLNamedCellStyleValues.Accent3Percentage60);
sl.SetCellValue(9, 5, "Accent4Perc60");
sl.ApplyNamedCellStyle(9, 5, SLNamedCellStyleValues.Accent4Percentage60);
sl.SetCellValue(9, 6, "Accent5Perc60");
sl.ApplyNamedCellStyle(9, 6, SLNamedCellStyleValues.Accent5Percentage60);
sl.SetCellValue(9, 7, "Accent6Perc60");
sl.ApplyNamedCellStyle(9, 7, SLNamedCellStyleValues.Accent6Percentage60);

sl.SetCellValue(10, 2, "Accent1Perc40");
sl.ApplyNamedCellStyle(10, 2, SLNamedCellStyleValues.Accent1Percentage40);
sl.SetCellValue(10, 3, "Accent2Perc40");
sl.ApplyNamedCellStyle(10, 3, SLNamedCellStyleValues.Accent2Percentage40);
sl.SetCellValue(10, 4, "Accent3Perc40");
sl.ApplyNamedCellStyle(10, 4, SLNamedCellStyleValues.Accent3Percentage40);
sl.SetCellValue(10, 5, "Accent4Perc40");
sl.ApplyNamedCellStyle(10, 5, SLNamedCellStyleValues.Accent4Percentage40);
sl.SetCellValue(10, 6, "Accent5Perc40");
sl.ApplyNamedCellStyle(10, 6, SLNamedCellStyleValues.Accent5Percentage40);
sl.SetCellValue(10, 7, "Accent6Perc40");
sl.ApplyNamedCellStyle(10, 7, SLNamedCellStyleValues.Accent6Percentage40);

sl.SetCellValue(11, 2, "Accent1Perc20");
sl.ApplyNamedCellStyle(11, 2, SLNamedCellStyleValues.Accent1Percentage20);
sl.SetCellValue(11, 3, "Accent2Perc20");
sl.ApplyNamedCellStyle(11, 3, SLNamedCellStyleValues.Accent2Percentage20);
sl.SetCellValue(11, 4, "Accent3Perc20");
sl.ApplyNamedCellStyle(11, 4, SLNamedCellStyleValues.Accent3Percentage20);
sl.SetCellValue(11, 5, "Accent4Perc20");
sl.ApplyNamedCellStyle(11, 5, SLNamedCellStyleValues.Accent4Percentage20);
sl.SetCellValue(11, 6, "Accent5Perc20");
sl.ApplyNamedCellStyle(11, 6, SLNamedCellStyleValues.Accent5Percentage20);
sl.SetCellValue(11, 7, "Accent6Perc20");
sl.ApplyNamedCellStyle(11, 7, SLNamedCellStyleValues.Accent6Percentage20);

sl.SetCellValue(13, 2, 12345678);
sl.ApplyNamedCellStyle(13, 2, SLNamedCellStyleValues.Comma);
sl.SetCellValue(13, 4, 12345678);
sl.ApplyNamedCellStyle(13, 4, SLNamedCellStyleValues.Comma0);
sl.SetCellValue(14, 2, 12345678);
sl.ApplyNamedCellStyle(14, 2, SLNamedCellStyleValues.Currency);
sl.SetCellValue(14, 4, 12345678);
sl.ApplyNamedCellStyle(14, 4, SLNamedCellStyleValues.Currency0);
sl.SetCellValue(15, 2, 123);
sl.ApplyNamedCellStyle(156, 2, SLNamedCellStyleValues.Percentage);

sl.SaveAs("NamedCellStyles.xlsx");

I set the column widths of the 2nd to 7th column, and the row height of the 6th row so it’s easier to see.

The main body text (the minor font) is in Harrington, and the title font (major font) is Castellar. You will note that even though the major font is supposedly used for heading and title texts, only the named cell style Title uses the major font. The headings 1 through 4 use the minor font.

Calculating Excel spreadsheet column names

I’ve been working with Open XML spreadsheets for the past, I don’t know how long… A year? I just realised that getting that Excel column header name is a frequent task. You know, given that it’s the 4th column, it’s “D”. I don’t work frequently with spreadsheets with lots of columns. So it was interesting that the 26th column is “Z” and the 27th column becomes “AA”. Basically, base-26 arithmetic, using the 26 letters of the English alphabet as tokens.

There are probably lots of code snippets out there showing you how to calculate a column name given the column index. Here’s mine:

string[] saExcelColumnHeaderNames = new string[16384];
string[] sa = new string[] { "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z" };
string s = string.Empty;
int i, j, k, l;
i = j = k = -1;
for (l = 0; l < 16384; ++l)
{
    s = string.Empty;
    ++k;
    if (k == 26)
    {
        k = 0;
        ++j;
        if (j == 26)
        {
            j = 0;
            ++i;
        }
    }
    if (i >= 0) s += sa[i];
    if (j >= 0) s += sa[j];
    if (k >= 0) s += sa[k];
    saExcelColumnHeaderNames[l] = s;
}

That gives you a zero-based indexing version. So to get the 30th column name, you use saExcelColumnHeaderNames[29].

In case you’re wondering, 16384 is the maximum number of columns supported by Excel 2010.

You will notice that it’s not a function given the column index. I find that not as useful. Look, typically when you need the column name, you probably also need to get it frequently, usually with different parameters.

What I did was to store all the calculation results into a string array. Then you reference it with an index. The calculation function typically is a O(n) operation. With you needing to use the function multiple times, your whole algorithm probably goes up to O(n^2).

My method is also an O(n) operation. But referencing a string array is I think an O(1), meaning it’s a constant. I’ve never been good with big O notation…

This style of solving the problem is called pre-calculation. Pre-calculation is especially useful in the games development region, where speed is important. For example, selected values of sine and cosine were pre-calculated and stored in arrays, for use in the numerous 3D/2D calculations in games. Calculating sine’s and cosine’s in real-time were detrimental to a speedy game.

That’s not as useful now because you need a fuller range of floating point values as input. But the concept is still useful.

I think I read somewhere (while I was doing hobbyist game development) this quote:

Pre-calculate everything!

Maybe computers are now much faster. I don’t care. That doesn’t give you an excuse to be sloppy. It’s an optimisation that doesn’t take much effort.

If you need to calculate it, see if you can calculate it just once.

Spreadsheet Open XML V2

I was rushing to get this out. The updated version of my programming guide is out! I first launched it on 17 Jan this year, so I was hoping to meet the personal-and-unseen deadline of 17 May, so it’s a nice 4 months interval. Ah well, I’m the only one who cares anyway…

I’ve added loads more content to help you with your Open XML spreadsheet needs. Version 1 was 53 pages. Version 2 is 147 pages. There are a lot of pages with screenshots, but still… 147 pages!

You can find out more here.

The last few weeks had been interesting while I rushed to get working source code and write explanations for the guide… I need to sleep… wait, I’ve got a magazine deadline! *sigh*

Built-in styles for Excel Open XML

So a blog reader, Sebastien, once asked about built-in style numbers used in Open XML for Excel. I’m sorry to break the news to you. It’s not straightforward to use those built-in styles.

What are these built-in styles? Check these out:

Excel built-in styles

For instance, Sebastien was asking how to use the “Good” and “Bad” built-in styles. So the good news is, you can use those named styles. The bad news is, the various style effects (font colour, background colour, borders) aren’t automatically added in for you. From the Microsoft documentation of CellStyle:

This element represents the name and related formatting records for a named cell style in this workbook.

Annex H contains a listing of cellStyles whose corresponding formatting records are implied rather than explicitly saved in the file. In this case, a builtinId attribute is written on the cellStyle record, but no corresponding formatting records are written.

For all built-in cell styles, the builtinId determines the style, not the name. For all cell styles, Normal is applied by default.

Read the second paragraph again. It means even if you use the built-in style “Good”, you still need to create a style with the green background and green text font colour.

Since this is the case, I suggest you just stick to creating your own custom styles. It’s what’s going to happen anyway.

Now there is a magic number, 164. Your custom styles will start being numbered from 164 onwards. This implies there are 164 built-in styles (0-indexed). My search efforts only gave me a partial list.

Please refer to the ECMA-376 documentation for Office Open XML formats. Download the 1st edition, part 4. Inside, you’ll find a PDF. Turn to page 2135 (as of this writing. It’s page 2128 on the PDF itself though). You’ll find this:

Excel Open XML built-in style list

That list is probably that of “standard” styles. From reading the next few pages of the documentation, I believe most of those 164 built-in styles are due to internationalisation issues.

So my conclusion is, for the “standard” built-in styles you’re interested in, you still have to create the supporting style effects (font colour, background colour and the like). For the “non-standard” built-in styles, they are probably different based on the localisation of your Excel file. So you’re better off just creating your own custom styles, which I teach you how here and here. Or you can get my programming guide for detailed explanations and source code.