Modulo 26 and column names

I was sitting in the lecture theatre valiantly trying to keep awake. The professor was speaking on the rigorous application and proving of the modulus function. It’s basically the remainder, but I’ve never been introduced to it in such, uh, rigor.

He brought up an example using modulo 26. And demonstrated the wrapping around of values. And the use of it in cryptology (a class I took later on, and I got tasked by the cryptography professor to write a program to do simple encryption. But another story perhaps…).

Modulo 26 is similar to finding the remainder. The difference is that the remainder is unique. This is important to our discussion.

“About what?” you may ask.

Excel column names.

There are 2 types of cell references used in spreadsheets, the R1C1 format and the A1 format. The R1C1 is simple. If the row index is 5 and the column index is 7, the result is R5C7.

The A1 format takes on a column name and the row index. Using our example again, the result is G5, because “G” is the 7th alphabet. Yes, that list of 26 alphabets.

The version of Excel currently (Excel 2007 and 2010) has up to 3 letters, with XFD as the last column name (that’s the 16384th column). What happens is you have column names A, B, and then up to Z. Then the next column name is AA, then AB and then up AZ. Then BA, BB and so on.

Basically, it’s base 26 arithmetic.

As far as I know, the typical method of getting the column name given the column index, is to run a loop. You add 1 until it hits 26, then you move to the next “position”, and then start from 1 again.

There’s nothing wrong with this method. It’s just that you have to iterate as many times as the given column index. If you’re given 16384, the loop runs 16384 times. This is regardless of the fact that the result is always the same. Given the range of values, the result can only be one of 16384 values.

So it was with this in mind, and my dabbling in game development (which said “Precalculate everything!”), that I precalculated an array of strings that are the column names. The array has 16384 items. The context is a spreadsheet software library.

Now to recoup that precalculation cost, I’d have to access my array at least 16384 times. This is where the context of the spreadsheet library comes in. Everybody (and their dog) wants to know if my library can handle millions of cells. This means if the column name calculation is in a method, that method is called millions of times. Given the iteration loop in it, that means the method with the iteration loop thing isn’t efficient (it’s O(n^2)).

However, due to technical issues, I can’t keep the array of strings. The column name array needs to be static to be available throughout the library. This causes issues if multi-threads or multi-processors or multi-whatevers comes in.

So I can’t use my static array anymore. Bummer. The O(n) of simple array access was working so well.

But I still want to have an efficient way of getting column names. So instead of iterating, I used simple division and modulus operations.

Consider 587. How do you know there’s 5 hundred? 587 / 100 equals 5 (truncating remainders). How do you know there’s 8 tens? 87 / 10 equals 8 (truncating remainders).

Yes, it’s elementary arithmetic, but it works great. So we can do the same for column names. There’s a problem though.

In the case above, we divided by 100. Why 100? Because it’s 10^2, and we’re concerned with the 2nd position after the ones position. The ones position is 10^0 by the way, which is 1.

So for our case, for the “hundreds” position, we divide by 676, which is 26^2. And for the “tens” position, we divide by 26.

Now 587 is 100*5 + 10*8 + 7. I’m going to use the notation (5,8,7) to denote this.

Now consider a column index of 3380. 3380 is equal to 676*5 + 26*0 + 0. This is (5,0,0).

However, in our case, our acceptable range of values is 1 through 26. It doesn’t contain zero. So (5,0,0) is not valid.

In the case of 587, we’re working in base 10, with the acceptable range of values being 0 to 9. This is “proper” remainder. Given any number in base 10, there’s a unique number within [0, 9] that’s the remainder.

However, for our purposes, there’s no unique number. Because we’re working in modulo 26, not just “remainder 26”.

The correct column name corresponding to column index 3380 is “DYZ”. This corresponds to (4,25,26). Or
3380 = 676*4 +26*25 + 26.

Note that 3380 is also 676*5 + 26*0 + 0.

My solution is to start from the “ones” position. If it’s greater than zero, fine. If it’s less than or equal to zero, borrow from the next larger position. Then we move to the next larger position, and check again. Continue to borrow until there are no zero values (or negatives) on the “right” side of the resulting notation (we can have “leading” zeroes).

So (5,0,0) becomes (5, -1, 0 + 26), or just (5,-1,26), borrowing 1 from the “tens” position. We cannot have -1, so that becomes (4, -1 + 26, 26), which becomes (4, 25, 26).

An interesting effect is that we typically assign 0 to A, 1 to B, and 25 to Z. In this case, 1 is assigned to A, 2 is to B, and most interestingly, both 0 and 26 map to Z. In fact, any multiple of 26 will map to Z.

Don’t think Z is special. Any multiple of 26 plus 1 also maps to A. So 1, 27, 53 and so on map to A. This is a property of the modulo thing.

Do you have a better way of converting (5,0,0) to (4,25,26)? Let me know in the comments.

Calculate Excel column width pixel interval

Brace yourself. You’re about to learn the secret behind how Excel mysteriously calculates the column width intervals.

In this article, I’m not going into the details of the column widths, but the column width intervals. There’s a difference. From the Open XML SDK specs:

width = Truncate([{Number of Characters} * {Maximum Digit Width} + {5 pixel padding}] / {Maximum Digit Width} * 256) / 256

To put it mildly, that’s a load of hogwash. In the documentation, it says that for Calibri 11 point at 96 DPI, the maximum digit width is 7 pixels. That is also another load of hogwash. It’s actually 8 pixels (well, 7 point something…).

When you move the line on the column width in Excel, just 1 pixel to the left, what is the column width? When you move it 1 pixel to the right, what’s the column width?

It turns out the each pixel interval isn’t a simple multiple of an internal column width interval.

Let’s take Calibri 11 pt 96 DPI again. With a maximum digit width of 8 pixels, each column width interval per pixel is supposedly 1/7 or 1/(max digit width -1).

But wait! It’s not actually 1/7. It’s the largest number of 1/256 multiples that is less than 1/7.

Now 1/7 is about 0.142857142857143. The actual interval is 0.140625, which is 36/256.

4/7 is about 0.571428571428571. The actual interval is 0.5703125, which is 146/256. And you will note that 146 is not equal to (4 * 36).

If you’re using Open XML SDK (or however you choose to access an Open XML Excel file), when you set the column width as 8.142857142857143, internally, Excel will save it as 8.140625.

Here’s some code:

int iPixelWidth = 8;
double fIntervalCheck;
double fInterval;
for (int step = 0; step < iPixelWidth; ++step)
{
    fIntervalCheck = (double)step / (double)(iPixelWidth - 1);
    fInterval = Math.Truncate(256.0 * fIntervalCheck) / 256.0;
    Console.WriteLine("{0:f15} {1:f15}", fIntervalCheck, fInterval);
}

So now you know how the intervals are calculated. But what about the actual column width? Hmm... perhaps another article...

P.S. I'm currently doing research for how to do autofitting for rows and columns for my spreadsheet library. I found this "secret" after fiddling with Excel files for a couple of hours... I know I'm talking about my library a lot, but it's taking up a lot of my brain space right now, so yeah...

Calculating Excel spreadsheet column names

I’ve been working with Open XML spreadsheets for the past, I don’t know how long… A year? I just realised that getting that Excel column header name is a frequent task. You know, given that it’s the 4th column, it’s “D”. I don’t work frequently with spreadsheets with lots of columns. So it was interesting that the 26th column is “Z” and the 27th column becomes “AA”. Basically, base-26 arithmetic, using the 26 letters of the English alphabet as tokens.

There are probably lots of code snippets out there showing you how to calculate a column name given the column index. Here’s mine:

string[] saExcelColumnHeaderNames = new string[16384];
string[] sa = new string[] { "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z" };
string s = string.Empty;
int i, j, k, l;
i = j = k = -1;
for (l = 0; l < 16384; ++l)
{
    s = string.Empty;
    ++k;
    if (k == 26)
    {
        k = 0;
        ++j;
        if (j == 26)
        {
            j = 0;
            ++i;
        }
    }
    if (i >= 0) s += sa[i];
    if (j >= 0) s += sa[j];
    if (k >= 0) s += sa[k];
    saExcelColumnHeaderNames[l] = s;
}

That gives you a zero-based indexing version. So to get the 30th column name, you use saExcelColumnHeaderNames[29].

In case you’re wondering, 16384 is the maximum number of columns supported by Excel 2010.

You will notice that it’s not a function given the column index. I find that not as useful. Look, typically when you need the column name, you probably also need to get it frequently, usually with different parameters.

What I did was to store all the calculation results into a string array. Then you reference it with an index. The calculation function typically is a O(n) operation. With you needing to use the function multiple times, your whole algorithm probably goes up to O(n^2).

My method is also an O(n) operation. But referencing a string array is I think an O(1), meaning it’s a constant. I’ve never been good with big O notation…

This style of solving the problem is called pre-calculation. Pre-calculation is especially useful in the games development region, where speed is important. For example, selected values of sine and cosine were pre-calculated and stored in arrays, for use in the numerous 3D/2D calculations in games. Calculating sine’s and cosine’s in real-time were detrimental to a speedy game.

That’s not as useful now because you need a fuller range of floating point values as input. But the concept is still useful.

I think I read somewhere (while I was doing hobbyist game development) this quote:

Pre-calculate everything!

Maybe computers are now much faster. I don’t care. That doesn’t give you an excuse to be sloppy. It’s an optimisation that doesn’t take much effort.

If you need to calculate it, see if you can calculate it just once.

Calculating column widths in Excel Open XML

Commenter Roie said that the formula for calculating column widths in Excel Open XML should still be manageable. This is despite the scary formula mentioned before:

width = Truncate([{Number of Characters} * {Maximum Digit Width} + {5 pixel padding}] / {Maximum Digit Width} * 256) / 256

So I put on my explorer’s hat, and decided against my better judgment to delve into that inexplicable equation. Turns out, it wasn’t too bad. Here’s a screenshot of the resulting Excel file:

Excel Open XML calculate column widths

The code is almost the same as when we’re setting custom column widths, but here it is in its full glory:

static void Main(string[] args)
{
	string sFile = "ExcelOpenXmlCalculateColumnWidth.xlsx";
	if (File.Exists(sFile))
	{
		File.Delete(sFile);
	}
	BuildWorkbook(sFile);
}

private static void BuildWorkbook(string filename)
{
	try
	{
		using (SpreadsheetDocument xl = SpreadsheetDocument.Create(filename, SpreadsheetDocumentType.Workbook))
		{
			WorkbookPart wbp = xl.AddWorkbookPart();
			WorksheetPart wsp = wbp.AddNewPart<WorksheetPart>();
			Workbook wb = new Workbook();
			FileVersion fv = new FileVersion();
			fv.ApplicationName = "Microsoft Office Excel";
			Worksheet ws = new Worksheet();
			SheetData sd = new SheetData();

			WorkbookStylesPart wbsp = wbp.AddNewPart<WorkbookStylesPart>();
			wbsp.Stylesheet = CreateStylesheet();
			wbsp.Stylesheet.Save();

			string sILT = "Iced Lemon Tea Is An Awesome Drink!";
			double fSimpleWidth = 0.0f;
			double fWidthOfZero = 0.0f;
			double fDigitWidth = 0.0f;
			double fMaxDigitWidth = 0.0f;
			double fTruncWidth = 0.0f;

			System.Drawing.Font drawfont = new System.Drawing.Font("Calibri", 11);
			// I just need a Graphics object. Any reasonable bitmap size will do.
			Graphics g = Graphics.FromImage(new Bitmap(200,200));
			fWidthOfZero = (double)g.MeasureString("0", drawfont).Width;
			fSimpleWidth = (double)g.MeasureString(sILT, drawfont).Width;
			fSimpleWidth = fSimpleWidth / fWidthOfZero;

			for (int i = 0; i < 10; ++i)
			{
				fDigitWidth = (double)g.MeasureString(i.ToString(), drawfont).Width;
				if (fDigitWidth > fMaxDigitWidth)
				{
					fMaxDigitWidth = fDigitWidth;
				}
			}
			g.Dispose();
			
			// Truncate([{Number of Characters} * {Maximum Digit Width} + {5 pixel padding}] / {Maximum Digit Width} * 256) / 256
			fTruncWidth = Math.Truncate((sILT.ToCharArray().Count() * fMaxDigitWidth + 5.0) / fMaxDigitWidth * 256.0) / 256.0;

			Columns columns = new Columns();
			columns.Append(CreateColumnData(1, 1, fSimpleWidth));
			columns.Append(CreateColumnData(2, 2, fTruncWidth));
			columns.Append(CreateColumnData(3, 3, 35.42578125));
			ws.Append(columns);

			Row r;
			Cell c;

			r = new Row();
			c = new Cell();
			c.DataType = CellValues.String;
			c.CellReference = "A1";
			c.CellValue = new CellValue(sILT);
			r.Append(c);

			c = new Cell();
			c.DataType = CellValues.String;
			c.CellReference = "B1";
			c.CellValue = new CellValue(sILT);
			r.Append(c);

			c = new Cell();
			c.DataType = CellValues.String;
			c.CellReference = "C1";
			c.CellValue = new CellValue(sILT);
			r.Append(c);
			sd.Append(r);

			r = new Row();
			c = new Cell();
			c.DataType = CellValues.String;
			c.CellReference = "A2";
			c.CellValue = new CellValue(string.Format("Simple width: {0}", fSimpleWidth));
			r.Append(c);
			sd.Append(r);

			r = new Row();
			c = new Cell();
			c.DataType = CellValues.String;
			c.CellReference = "A3";
			c.CellValue = new CellValue(string.Format("Truncation width: {0}", fTruncWidth));
			r.Append(c);
			sd.Append(r);

			r = new Row();
			c = new Cell();
			c.DataType = CellValues.String;
			c.CellReference = "A4";
			c.CellValue = new CellValue(string.Format("Width of '0': {0}", fWidthOfZero));
			r.Append(c);
			sd.Append(r);

			r = new Row();
			c = new Cell();
			c.DataType = CellValues.String;
			c.CellReference = "A5";
			c.CellValue = new CellValue(string.Format("Max Width of Digits: {0}", fMaxDigitWidth));
			r.Append(c);
			sd.Append(r);

			ws.Append(sd);
			wsp.Worksheet = ws;
			wsp.Worksheet.Save();
			Sheets sheets = new Sheets();
			Sheet sheet = new Sheet();
			sheet.Name = "Sheet1";
			sheet.SheetId = 1;
			sheet.Id = wbp.GetIdOfPart(wsp);
			sheets.Append(sheet);
			wb.Append(fv);
			wb.Append(sheets);

			xl.WorkbookPart.Workbook = wb;
			xl.WorkbookPart.Workbook.Save();
			xl.Close();
		}
	}
	catch (Exception e)
	{
		Console.WriteLine(e.ToString());
		Console.ReadLine();
	}
}

private static Column CreateColumnData(UInt32 StartColumnIndex, UInt32 EndColumnIndex, double ColumnWidth)
{
	Column column;
	column = new Column();
	column.Min = StartColumnIndex;
	column.Max = EndColumnIndex;
	column.Width = ColumnWidth;
	column.CustomWidth = true;
	return column;
}

private static Stylesheet CreateStylesheet()
{
	Stylesheet ss = new Stylesheet();

	Fonts fts = new Fonts();
	DocumentFormat.OpenXml.Spreadsheet.Font ft = new DocumentFormat.OpenXml.Spreadsheet.Font();
	FontName ftn = new FontName();
	ftn.Val = "Calibri";
	FontSize ftsz = new FontSize();
	ftsz.Val = 11;
	ft.FontName = ftn;
	ft.FontSize = ftsz;
	fts.Append(ft);
	fts.Count = (uint)fts.ChildElements.Count;

	Fills fills = new Fills();
	Fill fill;
	PatternFill patternFill;
	fill = new Fill();
	patternFill = new PatternFill();
	patternFill.PatternType = PatternValues.None;
	fill.PatternFill = patternFill;
	fills.Append(fill);
	fill = new Fill();
	patternFill = new PatternFill();
	patternFill.PatternType = PatternValues.Gray125;
	fill.PatternFill = patternFill;
	fills.Append(fill);
	fills.Count = (uint)fills.ChildElements.Count;

	Borders borders = new Borders();
	Border border = new Border();
	border.LeftBorder = new LeftBorder();
	border.RightBorder = new RightBorder();
	border.TopBorder = new TopBorder();
	border.BottomBorder = new BottomBorder();
	border.DiagonalBorder = new DiagonalBorder();
	borders.Append(border);
	borders.Count = (uint)borders.ChildElements.Count;

	CellStyleFormats csfs = new CellStyleFormats();
	CellFormat cf = new CellFormat();
	cf.NumberFormatId = 0;
	cf.FontId = 0;
	cf.FillId = 0;
	cf.BorderId = 0;
	csfs.Append(cf);
	csfs.Count = (uint)csfs.ChildElements.Count;

	uint iExcelIndex = 164;
	NumberFormats nfs = new NumberFormats();
	CellFormats cfs = new CellFormats();

	cf = new CellFormat();
	cf.NumberFormatId = 0;
	cf.FontId = 0;
	cf.FillId = 0;
	cf.BorderId = 0;
	cf.FormatId = 0;
	cfs.Append(cf);

	NumberFormat nf;
	nf = new NumberFormat();
	nf.NumberFormatId = iExcelIndex++;
	nf.FormatCode = "dd/mm/yyyy hh:mm:ss";
	nfs.Append(nf);
	cf = new CellFormat();
	cf.NumberFormatId = nf.NumberFormatId;
	cf.FontId = 0;
	cf.FillId = 0;
	cf.BorderId = 0;
	cf.FormatId = 0;
	cf.ApplyNumberFormat = true;
	cfs.Append(cf);

	nf = new NumberFormat();
	nf.NumberFormatId = iExcelIndex++;
	nf.FormatCode = "#,##0.0000";
	nfs.Append(nf);
	cf = new CellFormat();
	cf.NumberFormatId = nf.NumberFormatId;
	cf.FontId = 0;
	cf.FillId = 0;
	cf.BorderId = 0;
	cf.FormatId = 0;
	cf.ApplyNumberFormat = true;
	cfs.Append(cf);

	// #,##0.00 is also Excel style index 4
	nf = new NumberFormat();
	nf.NumberFormatId = iExcelIndex++;
	nf.FormatCode = "#,##0.00";
	nfs.Append(nf);
	cf = new CellFormat();
	cf.NumberFormatId = nf.NumberFormatId;
	cf.FontId = 0;
	cf.FillId = 0;
	cf.BorderId = 0;
	cf.FormatId = 0;
	cf.ApplyNumberFormat = true;
	cfs.Append(cf);

	// @ is also Excel style index 49
	nf = new NumberFormat();
	nf.NumberFormatId = iExcelIndex++;
	nf.FormatCode = "@";
	nfs.Append(nf);
	cf = new CellFormat();
	cf.NumberFormatId = nf.NumberFormatId;
	cf.FontId = 0;
	cf.FillId = 0;
	cf.BorderId = 0;
	cf.FormatId = 0;
	cf.ApplyNumberFormat = true;
	cfs.Append(cf);

	nfs.Count = (uint)nfs.ChildElements.Count;
	cfs.Count = (uint)cfs.ChildElements.Count;

	ss.Append(nfs);
	ss.Append(fts);
	ss.Append(fills);
	ss.Append(borders);
	ss.Append(csfs);
	ss.Append(cfs);

	CellStyles css = new CellStyles();
	CellStyle cs = new CellStyle();
	cs.Name = "Normal";
	cs.FormatId = 0;
	cs.BuiltinId = 0;
	css.Append(cs);
	css.Count = (uint)css.ChildElements.Count;
	ss.Append(css);

	DifferentialFormats dfs = new DifferentialFormats();
	dfs.Count = 0;
	ss.Append(dfs);

	TableStyles tss = new TableStyles();
	tss.Count = 0;
	tss.DefaultTableStyle = "TableStyleMedium9";
	tss.DefaultPivotStyle = "PivotStyleLight16";
	ss.Append(tss);

	return ss;
}

Note this new part:

string sILT = "Iced Lemon Tea Is An Awesome Drink!";
double fSimpleWidth = 0.0f;
double fWidthOfZero = 0.0f;
double fDigitWidth = 0.0f;
double fMaxDigitWidth = 0.0f;
double fTruncWidth = 0.0f;

System.Drawing.Font drawfont = new System.Drawing.Font("Calibri", 11);
// I just need a Graphics object. Any reasonable bitmap size will do.
Graphics g = Graphics.FromImage(new Bitmap(200,200));
fWidthOfZero = (double)g.MeasureString("0", drawfont).Width;
fSimpleWidth = (double)g.MeasureString(sILT, drawfont).Width;
fSimpleWidth = fSimpleWidth / fWidthOfZero;

for (int i = 0; i < 10; ++i)
{
	fDigitWidth = (double)g.MeasureString(i.ToString(), drawfont).Width;
	if (fDigitWidth > fMaxDigitWidth)
	{
		fMaxDigitWidth = fDigitWidth;
	}
}
g.Dispose();

// Truncate([{Number of Characters} * {Maximum Digit Width} + {5 pixel padding}] / {Maximum Digit Width} * 256) / 256
fTruncWidth = Math.Truncate((sILT.ToCharArray().Count() * fMaxDigitWidth + 5.0) / fMaxDigitWidth * 256.0) / 256.0;

I am using the MeasureString() function of the Graphics object to get the pixel width of a string when rendered in a particular font.

Roie gave a simple formula to calculate the column width, which is the pixel width of the string divided by the pixel of the zero character “0”. So let’s test that.

Then there’s that beast of a formula. We’ll need the number of characters in the string and the maximum pixel width of the digits. Turns out that the maximum pixel width of all digits is that of “0”. So Roie was right on that. Actually it’s logical, because “0” spans the horizontal and vertical space. If you want to test for alphabets, then “M” and “W” are my guesses.

Anyway, I created the Excel file, then did the save as zip and opening the XML file trick. And found that 35.42578125 to be the column width to comfortably cover the string “Iced Lemon Tea Is An Awesome Drink!” rendered in Calibri font.

What do you know, 35.42578125 is also what was calculated using that unwieldy formula. So yeah, that wasn’t so bad.

Roie simplified that formula, and my only gripe is that rounding errors might add up when the string is long. Ah well, the only way you know if your theory works is if you test it. I’m paranoid about these things because my math background practically force me to look at calculation errors of any formula. You do not want a sin(x) = x error on your hands.

So there you have it. The custom column widths are calculable. Have fun with the code and the resulting Excel file.

There’s updated material and source code, together with more information on how to work with Open XML. Click here to find out more.

Custom column widths in Excel Open XML

Commenter Steven wanted to know how to set the column widths in Excel, in Open XML format. I replied, then thought you might also want to know how to do that. It’s fairly straightforward, and here’s how the sample Excel file looks like:

Excel Open XML custom column widths

I like iced lemon tea.

And here’s the code:

static void Main(string[] args)
{
	string sFile = "ExcelOpenXmlWithCustomWidths.xlsx";
	if (File.Exists(sFile))
	{
		File.Delete(sFile);
	}
	BuildWorkbook(sFile);
}

private static void BuildWorkbook(string filename)
{
	try
	{
		using (SpreadsheetDocument xl = SpreadsheetDocument.Create(filename, SpreadsheetDocumentType.Workbook))
		{
			WorkbookPart wbp = xl.AddWorkbookPart();
			WorksheetPart wsp = wbp.AddNewPart<WorksheetPart>();
			Workbook wb = new Workbook();
			FileVersion fv = new FileVersion();
			fv.ApplicationName = "Microsoft Office Excel";
			Worksheet ws = new Worksheet();
			SheetData sd = new SheetData();

			WorkbookStylesPart wbsp = wbp.AddNewPart<WorkbookStylesPart>();
			wbsp.Stylesheet = CreateStylesheet();
			wbsp.Stylesheet.Save();

			Columns columns = new Columns();
			columns.Append(CreateColumnData(1, 1, 11));
			columns.Append(CreateColumnData(2, 4, 23.5703125));
			columns.Append(CreateColumnData(6, 6, 6.5703125));
			ws.Append(columns);

			Row r;
			Cell c;

			// header
			r = new Row();
			c = new Cell();
			c.DataType = CellValues.String;
			c.CellReference = "A1";
			c.CellValue = new CellValue("Product ID");
			r.Append(c);

			c = new Cell();
			c.DataType = CellValues.String;
			c.CellReference = "B1";
			c.CellValue = new CellValue("Product Description");
			r.Append(c);

			c = new Cell();
			c.DataType = CellValues.String;
			c.CellReference = "C1";
			c.CellValue = new CellValue("Bill Description");
			r.Append(c);

			c = new Cell();
			c.DataType = CellValues.String;
			c.CellReference = "D1";
			c.CellValue = new CellValue("Discount Description");
			r.Append(c);

			c = new Cell();
			c.DataType = CellValues.String;
			c.CellReference = "E1";
			c.CellValue = new CellValue("Currency");
			r.Append(c);

			c = new Cell();
			c.DataType = CellValues.String;
			c.CellReference = "F1";
			c.CellValue = new CellValue("Cost");
			r.Append(c);
			sd.Append(r);

			// content
			r = new Row();
			c = new Cell();
			c.DataType = CellValues.String;
			c.CellReference = "A2";
			c.CellValue = new CellValue("PROD12345");
			r.Append(c);

			c = new Cell();
			c.DataType = CellValues.String;
			c.CellReference = "B2";
			c.CellValue = new CellValue("Iced Lemon Tea");
			r.Append(c);

			c = new Cell();
			c.DataType = CellValues.String;
			c.CellReference = "C2";
			c.CellValue = new CellValue("Special Iced Lemon Tea");
			r.Append(c);

			c = new Cell();
			c.DataType = CellValues.String;
			c.CellReference = "D2";
			c.CellValue = new CellValue("Iced Lemon Tea (50% off)");
			r.Append(c);

			c = new Cell();
			c.DataType = CellValues.String;
			c.CellReference = "E2";
			c.CellValue = new CellValue("USD");
			r.Append(c);

			c = new Cell();
			c.StyleIndex = 3;
			c.DataType = CellValues.Number;
			c.CellReference = "F2";
			c.CellValue = new CellValue("5.95");
			r.Append(c);
			sd.Append(r);

			ws.Append(sd);
			wsp.Worksheet = ws;
			wsp.Worksheet.Save();
			Sheets sheets = new Sheets();
			Sheet sheet = new Sheet();
			sheet.Name = "Sheet1";
			sheet.SheetId = 1;
			sheet.Id = wbp.GetIdOfPart(wsp);
			sheets.Append(sheet);
			wb.Append(fv);
			wb.Append(sheets);

			xl.WorkbookPart.Workbook = wb;
			xl.WorkbookPart.Workbook.Save();
			xl.Close();
		}
	}
	catch (Exception e)
	{
		Console.WriteLine(e.ToString());
		Console.ReadLine();
	}
}

private static Column CreateColumnData(UInt32 StartColumnIndex, UInt32 EndColumnIndex, double ColumnWidth)
{
	Column column;
	column = new Column();
	column.Min = StartColumnIndex;
	column.Max = EndColumnIndex;
	column.Width = ColumnWidth;
	column.CustomWidth = true;
	return column;
}

private static Stylesheet CreateStylesheet()
{
	Stylesheet ss = new Stylesheet();

	Fonts fts = new Fonts();
	DocumentFormat.OpenXml.Spreadsheet.Font ft = new DocumentFormat.OpenXml.Spreadsheet.Font();
	FontName ftn = new FontName();
	ftn.Val = "Calibri";
	FontSize ftsz = new FontSize();
	ftsz.Val = 11;
	ft.FontName = ftn;
	ft.FontSize = ftsz;
	fts.Append(ft);
	fts.Count = (uint)fts.ChildElements.Count;

	Fills fills = new Fills();
	Fill fill;
	PatternFill patternFill;
	fill = new Fill();
	patternFill = new PatternFill();
	patternFill.PatternType = PatternValues.None;
	fill.PatternFill = patternFill;
	fills.Append(fill);
	fill = new Fill();
	patternFill = new PatternFill();
	patternFill.PatternType = PatternValues.Gray125;
	fill.PatternFill = patternFill;
	fills.Append(fill);
	fills.Count = (uint)fills.ChildElements.Count;

	Borders borders = new Borders();
	Border border = new Border();
	border.LeftBorder = new LeftBorder();
	border.RightBorder = new RightBorder();
	border.TopBorder = new TopBorder();
	border.BottomBorder = new BottomBorder();
	border.DiagonalBorder = new DiagonalBorder();
	borders.Append(border);
	borders.Count = (uint)borders.ChildElements.Count;

	CellStyleFormats csfs = new CellStyleFormats();
	CellFormat cf = new CellFormat();
	cf.NumberFormatId = 0;
	cf.FontId = 0;
	cf.FillId = 0;
	cf.BorderId = 0;
	csfs.Append(cf);
	csfs.Count = (uint)csfs.ChildElements.Count;

	uint iExcelIndex = 164;
	NumberFormats nfs = new NumberFormats();
	CellFormats cfs = new CellFormats();

	cf = new CellFormat();
	cf.NumberFormatId = 0;
	cf.FontId = 0;
	cf.FillId = 0;
	cf.BorderId = 0;
	cf.FormatId = 0;
	cfs.Append(cf);

	NumberFormat nf;
	nf = new NumberFormat();
	nf.NumberFormatId = iExcelIndex++;
	nf.FormatCode = "dd/mm/yyyy hh:mm:ss";
	nfs.Append(nf);
	cf = new CellFormat();
	cf.NumberFormatId = nf.NumberFormatId;
	cf.FontId = 0;
	cf.FillId = 0;
	cf.BorderId = 0;
	cf.FormatId = 0;
	cf.ApplyNumberFormat = true;
	cfs.Append(cf);

	nf = new NumberFormat();
	nf.NumberFormatId = iExcelIndex++;
	nf.FormatCode = "#,##0.0000";
	nfs.Append(nf);
	cf = new CellFormat();
	cf.NumberFormatId = nf.NumberFormatId;
	cf.FontId = 0;
	cf.FillId = 0;
	cf.BorderId = 0;
	cf.FormatId = 0;
	cf.ApplyNumberFormat = true;
	cfs.Append(cf);

	// #,##0.00 is also Excel style index 4
	nf = new NumberFormat();
	nf.NumberFormatId = iExcelIndex++;
	nf.FormatCode = "#,##0.00";
	nfs.Append(nf);
	cf = new CellFormat();
	cf.NumberFormatId = nf.NumberFormatId;
	cf.FontId = 0;
	cf.FillId = 0;
	cf.BorderId = 0;
	cf.FormatId = 0;
	cf.ApplyNumberFormat = true;
	cfs.Append(cf);

	// @ is also Excel style index 49
	nf = new NumberFormat();
	nf.NumberFormatId = iExcelIndex++;
	nf.FormatCode = "@";
	nfs.Append(nf);
	cf = new CellFormat();
	cf.NumberFormatId = nf.NumberFormatId;
	cf.FontId = 0;
	cf.FillId = 0;
	cf.BorderId = 0;
	cf.FormatId = 0;
	cf.ApplyNumberFormat = true;
	cfs.Append(cf);

	nfs.Count = (uint)nfs.ChildElements.Count;
	cfs.Count = (uint)cfs.ChildElements.Count;

	ss.Append(nfs);
	ss.Append(fts);
	ss.Append(fills);
	ss.Append(borders);
	ss.Append(csfs);
	ss.Append(cfs);

	CellStyles css = new CellStyles();
	CellStyle cs = new CellStyle();
	cs.Name = "Normal";
	cs.FormatId = 0;
	cs.BuiltinId = 0;
	css.Append(cs);
	css.Count = (uint)css.ChildElements.Count;
	ss.Append(css);

	DifferentialFormats dfs = new DifferentialFormats();
	dfs.Count = 0;
	ss.Append(dfs);

	TableStyles tss = new TableStyles();
	tss.Count = 0;
	tss.DefaultTableStyle = "TableStyleMedium9";
	tss.DefaultPivotStyle = "PivotStyleLight16";
	ss.Append(tss);

	return ss;
}

You already know how to create stylesheets, so I’ll skip explaining that. I deliberately simplified the content of the Excel cells so we can concentrate on just the column widths. There’s this helper function:

private static Column CreateColumnData(UInt32 StartColumnIndex, UInt32 EndColumnIndex, double ColumnWidth)
{
	Column column;
	column = new Column();
	column.Min = StartColumnIndex;
	column.Max = EndColumnIndex;
	column.Width = ColumnWidth;
	column.CustomWidth = true;
	return column;
}

That should be fairly straightforward to understand. There’s a Min and Max property because Excel is lazy, so if you have a set of contiguous columns with the same width, you set using that with one “column” (class), so to speak.

Now, the Width property is… hard to calculate. So where did I get my values? By generating the .xlsx file, opening it up, adjusting the column widths to taste, saving it, changing the extension to .zip, opening up the zip file, searching for my data XML worksheet, opening that up, and look for something like this:

<cols>
<col min=”1″ max=”1″ width=”11″ customWidth=”1″ />
<col min=”2″ max=”4″ width=”23.5703125″ customWidth=”1″ />
<col min=”6″ max=”6″ width=”6.5703125″ customWidth=”1″ />
</cols>

There, you have your widths. Go back to your code and plug that in. Tada, mission accomplished.

“Wait, so why can’t those widths be calculated?”

Well, they can. I just think the effort’s not worth it. First, you need to know the font of the text in that cell/column. Then there’s some weird truncation calculation based on the width of characters in that font… You know what, here’s an extract from the Open XML SDK documentation:

width = Truncate([{Number of Characters} * {Maximum Digit Width} + {5 pixel padding}] / {Maximum Digit Width} * 256) / 256

You are free to go perform the calculation. Knock yourself out. Let me know how it goes.

Here’s the source code, and here’s the resulting Excel file.

There’s updated material and source code, together with more information on how to work with Open XML. Click here to find out more.

Deciphering column types in design documents

When I first started working, I’ve never even heard of design specifications. The few sentences of a programming question for a university assignment barely made it as design requirements. I think the longest description went slightly over half a page, and that’s because it was explaining some scientific logic behind the question.

So when I was first handed the design documents of an existing application system, my eyes kind of glazed over the arcane language… The first few pages were usually full of important-sounding sentences but really means very little to the programmer. Well, most of it anyway. They’re about how this application was to do X, because Y happened and Z wasn’t very happy about it, and application A could almost do the same thing except for condition B.

It wasn’t a critical period when I joined the team, so things were a bit quiet and I had time to learn. Have I told you I didn’t know a single thing about SQL at the time? I was picking that up too.

Flipping through the pages, I found a table describing column information. There were input files, and that table described the columns in the file. This was a few years ago, so the input files were what was termed “flat files”.

Each line in those files were of a fixed length, and each column occupies a specific position and a specific length in a line. The usual line types were the header, trailer, and data. The header and trailer lines were usually shorter than the data lines.

The header probably contains information such as

  • Timestamp of file (usually just the date it was generated)
  • Name of file
  • Application code (not our kind of code. Short acronym identifier of program)

The trailer probably contains information such as

  • Number of data lines (for reconciliation purposes)
  • Sum totals of stuff (monetary amount, duration and so on)
  • … you know, I think it’s usually just the above 2

Now the data lines were more interesting. They were loaded into the database, so the columns in the file usually match closely to that of the database table. Here’s where I both learned to read design documents and file formats, and picked up SQL all at one go…

Here are 2 examples:

9(8) with comment “ccyymmdd”. It means “8 numerals”, and the comment hints at … ? Century, last 2 digits of the year, the month, and the day.

9(6) with comment “ccyymm”. It means “6 numerals”, and I’m sure you can figure out what the comment means.

The “9” is a notation used to denote digits or numerals only. The number within brackets denote the number of digits. Let’s try…

X(9) which means 9 alphanumeric characters.
X(57) with comment “filler”. It means … ? 57 alphanumeric characters, probably just spaces because this column is a filler.

I have no idea why “X” denotes alphanumeric… For that matter, I don’t know why “9” is used to represent digits too. As for the filler column, remember the header and trailer lines? They are shorter than the data lines, so a column is specially made so that each line, whether it’s a header, data or trailer line, can fit snugly into one line. No, XML wasn’t invented yet… I think.

Now for some obscure ones…

9(7)v99 which means there are 7 digits, followed by 2 digits.
9v9(5) means 1 digit, followed by 5 digits.

If they are all digits, what’s with the weird notation? The “v” means there’s an implied decimal point. So “9(7)v99” means a number which is up to 7 digits long, followed by 2 digits representing a number (below one) up to 2 decimal places.

Confused? “9(7)v99” is equivalent to numeric(9,2) in SQL-speak. 1234567.89 is an example.

So what’s the implied decimal point for? If I understand it correctly, the notations came from programming practices in COBOL, and the banking industry was making use of flat files to transfer data around. Since transmitting data was expensive (they didn’t have 500 gigabytes of hard disk space then…), every single byte counted.

Since it was understood that the figure in that particular column was a money value, the decimal point was taken out to save space. Tada! Instant saving of, I don’t know, tens and hundreds of kilobytes. And that practice flowed to other industries.

It’s a good thing my current team uses notations such as “char(8)”, “numeric(15,2)” and “int” to define column types. Hey wait, those look familiar…