How to insert an image in Excel Open XML

This is a continuation of a series of articles on using Excel Open XML. Previously we looked at how to create stylesheets. Today, we’ll look at inserting images. (You will need the Open XML SDK from Microsoft)

Similar to when I was writing code for the stylesheet creation, I have little idea on why any piece of code has to exist for an image to be inserted successfully. All I can advise you is to insert one image into an Excel file, save it as an Excel Open XML format, rename it to a .zip extension, unzip it, and study the contents like crazy. That’s what I did.

Here, I am going to show you the shortcut, and relieve you of all that pain. You will learn to create a simple Excel file, with just one Excel sheet in it. And on that one sheet, is an image. That’s it. You’ll be surprised to know how much code you have to write…

static void Main(string[] args)
	string sFile = "ExcelOpenXmlWithImage.xlsx";
	if (File.Exists(sFile))

private static void BuildWorkbook(string filename)
		using (SpreadsheetDocument xl = SpreadsheetDocument.Create(filename, SpreadsheetDocumentType.Workbook))
			WorkbookPart wbp = xl.AddWorkbookPart();
			WorksheetPart wsp = wbp.AddNewPart<WorksheetPart>();
			Workbook wb = new Workbook();
			FileVersion fv = new FileVersion();
			fv.ApplicationName = "Microsoft Office Excel";
			Worksheet ws = new Worksheet();
			SheetData sd = new SheetData();

			string sImagePath = "polymathlogo.png";
			DrawingsPart dp = wsp.AddNewPart<DrawingsPart>();
			ImagePart imgp = dp.AddImagePart(ImagePartType.Png, wsp.GetIdOfPart(dp));
			using (FileStream fs = new FileStream(sImagePath, FileMode.Open))

			NonVisualDrawingProperties nvdp = new NonVisualDrawingProperties();
			nvdp.Id = 1025;
			nvdp.Name = "Picture 1";
			nvdp.Description = "polymathlogo";
			DocumentFormat.OpenXml.Drawing.PictureLocks picLocks = new DocumentFormat.OpenXml.Drawing.PictureLocks();
			picLocks.NoChangeAspect = true;
			picLocks.NoChangeArrowheads = true;
			NonVisualPictureDrawingProperties nvpdp = new NonVisualPictureDrawingProperties();
			nvpdp.PictureLocks = picLocks;
			NonVisualPictureProperties nvpp = new NonVisualPictureProperties();
			nvpp.NonVisualDrawingProperties = nvdp;
			nvpp.NonVisualPictureDrawingProperties = nvpdp;

			DocumentFormat.OpenXml.Drawing.Stretch stretch = new DocumentFormat.OpenXml.Drawing.Stretch();
			stretch.FillRectangle = new DocumentFormat.OpenXml.Drawing.FillRectangle();

			BlipFill blipFill = new BlipFill();
			DocumentFormat.OpenXml.Drawing.Blip blip = new DocumentFormat.OpenXml.Drawing.Blip();
			blip.Embed = dp.GetIdOfPart(imgp);
			blip.CompressionState = DocumentFormat.OpenXml.Drawing.BlipCompressionValues.Print;
			blipFill.Blip = blip;
			blipFill.SourceRectangle = new DocumentFormat.OpenXml.Drawing.SourceRectangle();

			DocumentFormat.OpenXml.Drawing.Transform2D t2d = new DocumentFormat.OpenXml.Drawing.Transform2D();
			DocumentFormat.OpenXml.Drawing.Offset offset = new DocumentFormat.OpenXml.Drawing.Offset();
			offset.X = 0;
			offset.Y = 0;
			t2d.Offset = offset;
			Bitmap bm = new Bitmap(sImagePath);
			DocumentFormat.OpenXml.Drawing.Extents extents = new DocumentFormat.OpenXml.Drawing.Extents();
			extents.Cx = (long)bm.Width * (long)((float)914400 / bm.HorizontalResolution);
			extents.Cy = (long)bm.Height * (long)((float)914400 / bm.VerticalResolution);
			t2d.Extents = extents;
			ShapeProperties sp = new ShapeProperties();
			sp.BlackWhiteMode = DocumentFormat.OpenXml.Drawing.BlackWhiteModeValues.Auto;
			sp.Transform2D = t2d;
			DocumentFormat.OpenXml.Drawing.PresetGeometry prstGeom = new DocumentFormat.OpenXml.Drawing.PresetGeometry();
			prstGeom.Preset = DocumentFormat.OpenXml.Drawing.ShapeTypeValues.Rectangle;
			prstGeom.AdjustValueList = new DocumentFormat.OpenXml.Drawing.AdjustValueList();
			sp.Append(new DocumentFormat.OpenXml.Drawing.NoFill());

			DocumentFormat.OpenXml.Drawing.Spreadsheet.Picture picture = new DocumentFormat.OpenXml.Drawing.Spreadsheet.Picture();
			picture.NonVisualPictureProperties = nvpp;
			picture.BlipFill = blipFill;
			picture.ShapeProperties = sp;

			Position pos = new Position();
			pos.X = 0;
			pos.Y = 0;
			Extent ext = new Extent();
			ext.Cx = extents.Cx;
			ext.Cy = extents.Cy;
			AbsoluteAnchor anchor = new AbsoluteAnchor();
			anchor.Position = pos;
			anchor.Extent = ext;
			anchor.Append(new ClientData());
			WorksheetDrawing wsd = new WorksheetDrawing();
			Drawing drawing = new Drawing();
			drawing.Id = dp.GetIdOfPart(imgp);


			wsp.Worksheet = ws;
			Sheets sheets = new Sheets();
			Sheet sheet = new Sheet();
			sheet.Name = "Sheet1";
			sheet.SheetId = 1;
			sheet.Id = wbp.GetIdOfPart(wsp);

			xl.WorkbookPart.Workbook = wb;
	catch (Exception e)

It’s a console program, so create your C# project accordingly. There were 3 reference articles I used:

The use of the ImagePart I can understand. It’s to load the image file you had in mind. The rest is pure gibberish until you hit the part on resolution. That’s where I used the Stack Overflow articles as reference. Then it was gibberish until the positioning of the image.

This is where I did something different. If you did the trick on inserting image into Excel, saving it as Open XML format, renaming to .zip, and unzipping that file, you might find this in the xl/drawings/drawing1.xml file:

-<xdr:twoCellAnchor editAs="oneCell">
- <xdr:from>
- <xdr:to>

This is for positioning the image. I’m using the AbsoluteAnchor instead of the TwoCellAnchor. After much hairtearing, I’ve decided the TwoCellAnchor class is too hard to use. Let me tell you why.

[UPDATE: A reader Kevin noted the following:

I’ve been using the OneCellAnchor and find it perfect: position by row/column and size by image extent in EMU and best of all, the image will not resize when column widths are set.

So you can look at the OneCellAnchor class and see if it suits you better.

The from tag should be easy to understand. The col tag refers to which Excel column (zero-based) is the left side of the image in. The colOff tag is the offset. What offset you ask? The offset from the left-border of the column referenced in the col tag, in EMU. Yes, that English Metric Unit. The row and rowOff tags refer to the Excel row and offset from said Excel row (based on the top border of the image).

So far, it’s 0 for all 4 properties, because we’re inserting the image at the top left corner. Now comes the fun part.

In the to tag, the bottom right of the image is in the Excel cell, column 6, row 7. That’s not all. The bottom right of the image is 248195 EMUs from the left of column 6, and 152608 EMUs from the top of row 7. Note that the image is not 248195 EMUs wide nor 152608 high.

Since those offsets are calculated based on its current cell, if the widths and heights of previous cells (column and row indices less than current ones) change, these offsets change. This means, if I widen the first cell, the offset will change from 248195 EMUs to say 1000 EMUs.

The fun part is, if I widen the first cell enough, the bottom right of the image could end up in column 5. So the column and column offset values are in flux.

To really nail this part down, I will need to know the exact widths of every column up to where the bottom right of the image is. I will probably decide on some default column width in EMU, set every column to be that width. Then I’ll convert the width of the image from pixels to EMU and calculate the offset based on that default column width. And woe befall me if I happen to place the image other than at the top left corner, since I would have to do the same thing for the top left corner of the image.

That wouldn’t be feasible if your document happen to require a fixed width for some columns, due to the content of those columns. So your columns won’t all have the same width. So you’ll have to keep track of all the column widths.

Needless to say, I researched and found the AbsoluteAnchor class. Much fewer properties to deal with…

Just in case you’re wondering, no, I don’t know why the ID of NonVisualDrawingProperties is 1025. That’s the value I found when I did the renaming-unzipping-gawk-at-xlsx-content trick. I believe it can be some other value, but I don’t know the valid range. If you do know, chime in with a comment.

Oh and here’s the image used.
Polymath Programmer logo text
Lovely, isn’t it? *smile*

And you can download the resulting Excel file ExcelOpenXmlWithImage.xlsx.

There’s updated material and source code, together with more information on how to work with Open XML. Click here to find out more.

FAHTR aka Free And Happen To Remember

The following completely fake conversation was inspired by an actual conversation with an internal company user.

User: The web application doesn’t let me do XYZ. I can’t download ABC data for customer PQR.
Me: Well, your customer database don’t have JKL. The code changes will take some time.

User: So how long will it take?
Me: It’s not urgent, right?

User: Well…
Me: Then I’ll do it on a FAHTR basis.

User: What father?
Me: FAHTR. It means I’ll do it when I’m free and I happen to remember to do it.

Writing is hard

Found this from Merlin Mann:

If you don’t feel that you are possibly on the edge of humiliating yourself, of losing control of the whole thing, then probably what you are doing isn’t very vital.

John Irving

Darn right.

That said, many things get in the way of writing articles, as Brent Diggs found out.

I also found that writing articles is harder than writing code. It’s even harder when your wireless adaptor dies on you. Which was exactly what happened to me. Yup, that wireless adaptor. My super power apparently failed this time.

There will probably be no new articles for a week. Maybe 2. It depends on how fast my new computer arrives. Yes, I’m scrapping the current one. Because the battery for the internal clock appears to be dying. And because the USB connection from the keyboard seems to be acting up. And because the RAM thought there wasn’t enough space to hold variable values and decided to store those variable values in a buffer overflow, which apparently overflowed into the physical world and ATE MY COMPUTER FROM THE INSIDE OUT. And I might as well upgrade from Windows XP to Windows 7 (my computer’s 5 years old).

If you’re positively dying for some action, go to Stack Overflow. Ask a question. Answer a question. Then come back and tell me how awesome you are.

I planned to write about inserting an image into Excel in Open XML format (following the article on stylesheet creation in Open XML). In case you’re really interested in that, subscribe to my RSS feed so you’ll know the moment I publish that article.

How to create a stylesheet in Excel Open XML

Today, I’ll show you how to create a stylesheet in Excel Open XML with the minimum required. The styles I need are:

  • Forced text format for long consecutive string of digits
  • Date format
  • Decimal format

We’ll be using the Open XML SDK 2.0 from Microsoft. As of this writing, it’s still in Community Technical Preview state (August 2009), so I’ll just let you search online, in case the final product is released by the time you read this article.

The stylesheet is represented by the DocumentFormat.OpenXml.Spreadsheet.Stylesheet class. Through my hours (and hours and hours…) of playing around with the code, I still had to use almost the same default stylesheet when I unzip a blank Excel file in Open XML format. In case you don’t know, an Excel Open XML file (or any of the Microsoft Office products in Open XML format such as Word and PowerPoint) is basically a zip file of folders and XML files (and perhaps some media resources).

This is the stylesheet XML file produced by the code which you’ll see in a bit. Download ExcelOpenXmlStyles.xml (which is actually named styles.xml in the original zip file, but I renamed it to avoid clashing with my other files).

You could write an XML file directly with that content (say, using the StreamWriter class). You just have to be careful of the XML structure, such as opening and closing of tags, and taking care of child tags.

Or you could use the SDK.

I haven’t found anyone writing on how to create styles to format the content of the Excel file. Maybe it’s because just creating an Excel file is already an awesome accomplishment… Unfortunately, I can’t remember where are the one or two articles I read to create a basic Excel file…

Well, I figured out how to use the SDK to create the file. What I did was look at a tag in the XML file (downloadable from above), search for the corresponding class entry in the Open XML SDK help file, and use that class in the code. It’s a tedious process, and some tags have a different-looking name as the class (such as the cellStyleXfs tag and its CellStyleFormats class, or the unintuitive xf tag and its CellFormat class.)

Anyway, here’s the code:

private static Stylesheet CreateStylesheet()
	Stylesheet ss = new Stylesheet();

	Fonts fts = new Fonts();
	DocumentFormat.OpenXml.Spreadsheet.Font ft = new DocumentFormat.OpenXml.Spreadsheet.Font();
	FontName ftn = new FontName();
	ftn.Val = "Calibri";
	FontSize ftsz = new FontSize();
	ftsz.Val = 11;
	ft.FontName = ftn;
	ft.FontSize = ftsz;
	fts.Count = (uint)fts.ChildElements.Count;

	Fills fills = new Fills();
	Fill fill;
	PatternFill patternFill;
	fill = new Fill();
	patternFill = new PatternFill();
	patternFill.PatternType = PatternValues.None;
	fill.PatternFill = patternFill;
	fill = new Fill();
	patternFill = new PatternFill();
	patternFill.PatternType = PatternValues.Gray125;
	fill.PatternFill = patternFill;
	fills.Count = (uint)fills.ChildElements.Count;

	Borders borders = new Borders();
	Border border = new Border();
	border.LeftBorder = new LeftBorder();
	border.RightBorder = new RightBorder();
	border.TopBorder = new TopBorder();
	border.BottomBorder = new BottomBorder();
	border.DiagonalBorder = new DiagonalBorder();
	borders.Count = (uint)borders.ChildElements.Count;

	CellStyleFormats csfs = new CellStyleFormats();
	CellFormat cf = new CellFormat();
	cf.NumberFormatId = 0;
	cf.FontId = 0;
	cf.FillId = 0;
	cf.BorderId = 0;
	csfs.Count = (uint)csfs.ChildElements.Count;

	uint iExcelIndex = 164;
	NumberFormats nfs = new NumberFormats();
	CellFormats cfs = new CellFormats();

	cf = new CellFormat();
	cf.NumberFormatId = 0;
	cf.FontId = 0;
	cf.FillId = 0;
	cf.BorderId = 0;
	cf.FormatId = 0;

	NumberFormat nf;
	nf = new NumberFormat();
	nf.NumberFormatId = iExcelIndex++;
	nf.FormatCode = "dd/mm/yyyy hh:mm:ss";
	cf = new CellFormat();
	cf.NumberFormatId = nf.NumberFormatId;
	cf.FontId = 0;
	cf.FillId = 0;
	cf.BorderId = 0;
	cf.FormatId = 0;
	cf.ApplyNumberFormat = true;

	nf = new NumberFormat();
	nf.NumberFormatId = iExcelIndex++;
	nf.FormatCode = "#,##0.0000";
	cf = new CellFormat();
	cf.NumberFormatId = nf.NumberFormatId;
	cf.FontId = 0;
	cf.FillId = 0;
	cf.BorderId = 0;
	cf.FormatId = 0;
	cf.ApplyNumberFormat = true;

	// #,##0.00 is also Excel style index 4
	nf = new NumberFormat();
	nf.NumberFormatId = iExcelIndex++;
	nf.FormatCode = "#,##0.00";
	cf = new CellFormat();
	cf.NumberFormatId = nf.NumberFormatId;
	cf.FontId = 0;
	cf.FillId = 0;
	cf.BorderId = 0;
	cf.FormatId = 0;
	cf.ApplyNumberFormat = true;

	// @ is also Excel style index 49
	nf = new NumberFormat();
	nf.NumberFormatId = iExcelIndex++;
	nf.FormatCode = "@";
	cf = new CellFormat();
	cf.NumberFormatId = nf.NumberFormatId;
	cf.FontId = 0;
	cf.FillId = 0;
	cf.BorderId = 0;
	cf.FormatId = 0;
	cf.ApplyNumberFormat = true;

	nfs.Count = (uint)nfs.ChildElements.Count;
	cfs.Count = (uint)cfs.ChildElements.Count;


	CellStyles css = new CellStyles();
	CellStyle cs = new CellStyle();
	cs.Name = "Normal";
	cs.FormatId = 0;
	cs.BuiltinId = 0;
	css.Count = (uint)css.ChildElements.Count;

	DifferentialFormats dfs = new DifferentialFormats();
	dfs.Count = 0;

	TableStyles tss = new TableStyles();
	tss.Count = 0;
	tss.DefaultTableStyle = "TableStyleMedium9";
	tss.DefaultPivotStyle = "PivotStyleLight16";

	return ss;

The whole thing is actually very simple. There’s a SpreadsheetDocument class, which has as a child a WorkbookPart class, which has as a child a WorkbookStylesPart class, which has a Stylesheet property which you assign with the result of that function you see in the code above. *whew*

It’s logically structured. It’s just that I can’t find anything online or in the documentation about which classes I needed to use… hence the hours (and hours and hours…) of research and testing.

A few points to note:

  • The Font class used is different from System.Drawing.Font
  • System-defined style numbers are less than 164 (based on my experiments on custom styles). Hence the magic number. So custom style index numbers are 164 and above.
  • The style “#,##0.0000” is typically used by me for representing Internet traffic, as in 1,234.5670 MB.
  • The style “#,##0.00” is a standard format in Excel, with the style number 4
  • Forced text format is “@”, which is also a standard format in Excel, with the style number 49
  • The function is static because I’m using it in a console program.

There are a couple of classes used that I have no idea what they are used for. For example, the DifferentialFormats class and the TableStyles class. I just know that if I don’t create them as a child of the Stylesheet class, the Excel file will fail to open. This is the major time-drain of my research and experiments: determining the classes used to write the minimum code (or XML file).

Next time, I’ll show you how to insert an image into the Excel file. That one takes up a whole lot of code when compared to inserting an image file in HTML. I’ll conclude the whole Excel Open XML creation with the full code on generating a working Excel file. I’m setting all the pieces here piecemeal so I don’t have to explain everything in one shot.

There’s updated material and source code, together with more information on how to work with Open XML. Click here to find out more.

Upgraded: Reverse Polish Notation with C#

I wrote a reverse polish notation tutorial for Dream In Code almost 2 years ago (wow that’s a long time ago!). The code only parsed for PI, E, numbers, the basic operators (plus, minus, multiply, divide), and the 3 basic trigonometric functions sine, cosine and tangent.

I’ve added more functions to it, and you can download the class here: ReversePolishNotation.cs

You can use the source code in a personal or commercial project. The disclaimer is that it’s given as is, and you’re ultimately responsible for what goes on in your project. Attribution to me (Vincent Tan) or a link to this blog is appreciated, but not necessary. Just do something awesome with it!

The new functions are

  • Absolute function
  • Arc sine (asin, inverse sine function)
  • Arc cosine (acos, inverse cosine function)
  • Arc tangent (atan, inverse tangent function)
  • Hyperbolic sine (sinh)
  • Hyperbolic cosine (cosh)
  • Hyperbolic tangent (tanh)
  • Square root function
  • Sign function

All in all, not very much added. But I’m using regular expressions to parse the input, so the more functions I support, the more complex the regular expression becomes. For example, I have to tell “sin”, “asin” and “sinh” apart. It’s harder because “sin” is a subset of “asin” and “sinh”.

Don’t get it? What if the input is “sing 1”? The parser should output 2 tokens, “sing” and “1”. It shouldn’t output “sin”, “g” and “1” (or “sin” and “g 1”, or whatever weird case that shouldn’t happen).

Anyway, I haven’t been to Dream In Code for a while now… and someone’s commented that the regular expression for detecting the unary minus should be changed. This was the original code (broken up for legibility):

sBuffer =
"${number} -");

This was what that person suggested:

sBuffer =
"${number} ) - (");

If I understand it correctly, he (or she) wants to catch the situation where the input has something like this:
( 5 ) – ( 3 )

I checked, and my original parsing would indeed fail. I do think he overparsed (I made that word up) though. He made the assumption that the next number is also encapsulated in a round bracket. It doesn’t have to. My code will fail for this too:
( 5 ) – 3

The error was in the right round bracket, so the corrected code is

sBuffer =
"${number} ${bracket} -");

Basically, I checked for the existence of the round bracket. Let me take it out for you to see it better:


Anyway, I’m putting the upgraded RPN code here because I’m going to shut down the site where I put it. I made a playground site called Ragnarok Code (now defunct). After months, I still only have the RPN code up.

Let me just say that writing articles is hard, and takes a non-trivial amount of time. So that site’s been stagnating, which ironically is the very thing I wanted my coding skills to not be. *sigh*

So have a play with an implementation of RPN while it’s still up.

P.S. I checked that person’s profile on Dream In Code. Seems like the only thing he did was comment on my tutorial. Wow, it must have bugged him really bad for him to register an account just so he could comment on my tutorial. After about 2 months, there’s no more activity from him. Maybe he got tired of waiting for a reply from me… oops…

Why are signals from passive optical networks split into 32?

World map information

I attended a course on fibre technology recently. The presenter was Dr. Jeffrey Bannister from Orbitage.

He was talking about fibre optics being a relatively old technology, and is now being used as a means of transporting the vast amounts of information that’s the Internet. Remember the earthquakes near Taiwan, which halted Internet traffic in Asia?

There’s an interesting point he made, that there are only 4 of these hair-thin optical fibres supporting the Asian Internet traffic. And if I remember correctly, these optical fibres run in between Vietnam and Philippines, to Taiwan, and to Japan. I can’t find any reference on the number of fibres used, but 4 seems incredulous. I mean, it takes a lot of money, time and effort to set those submarine cables. It makes sense to use more, since optical fibres are cheap (as cheap as fishing lines, so says Dr Bannister). Maybe there are dark fibres.

Another interesting point is that optical fibres do not rely on electricity to convey information. Shine a light at one end of a fibre, and it’s interpreted as a “1” at the other end. No light means a “0”. Voila! Zeroes and ones for digital use. Using a physics property called Brewster’s angle, light can be transmitted for long distances with little loss of energy or result in data corruption (light keeps bouncing around along the optical fibre).

A third interesting point is that the light used in transmitting our Internet data is not visible at all! It’s actually infrared light, because it has the best result for the single-mode optical fibres used.

A fourth interesting point is that upstream and downstream data use the same optical fibre. It’s accomplished by using different wavelengths of infrared light, using a technique called wavelength-division multiplexing.

So where do passive optical networks come in, and what are they? If I understand it correctly, it’s an architecture for housing splitters, and

Each splitter typically splits the signal from a single fiber into 16, 32, or 64 fibers, depending on the manufacturer, and several splitters can be aggregated in a single cabinet.

Remember the data travelling along just 4 optical fibres mentioned before? There are many endpoints for that data, so somewhere along the line, the data have to be split up. That’s where the splitters come in.

By now, your coder senses should be tingling. Let me highlight the source of the tingling:

single fiber into 16, 32, or 64 fibers

16? 32? 64? They look familiar…

In his talk, Dr Bannister mentioned that splitters split the signals into 32 or 64. Now if he mentioned only “32”, I might have waved it off. But he mentioned “64” in the same breath too, and that’s what triggered my coder senses.

And in case you haven’t caught on,
16 = 2^4
32 = 2^5
64 = 2^6

So after his talk, I went up to ask him about this. At first, he misunderstood my question, and explained more on how the splitting was done.

My question was actually something else. Fibre optics do not need electricity to transmit data. The splitters do not need electricity to split signals. Basically everything is analog. Why is the binary concept, the basis of digital, used in the number of splits?

His answer was actually very simple. It’s easy to calculate the efficiency. (Or light energy. Or wavelength.) Splitting a signal into 2 means it’s a simple 50% divide.

He thought about it, and said the engineers could probably split signals into 10 or powers of 10. But splitting in powers of 2 is easy for the math.

Frankly speaking, I didn’t expect such a simple and logical answer. I was actually stunned for a few seconds.

Does that make you feel computers have a completely efficient understanding of the world?

[Update: Commenter John Bartell has more information on splitters and passive optical networks.]

[image by ktsimage]