How to properly use OpenXmlWriter to write large Excel files

Brian Jones already wrote something on using the OpenXmlWriter with the Open XML SDK to write large Excel files. There are a couple of things that weren’t mentioned though.

OpenXmlWriter works best for creating new things

For the purposes of explanation, we’ll assume the “new thing” is a new worksheet, although OpenXmlWriter can be used to write any Open XML SDK part (such as WorkbookPart).

The reason for this is that OpenXmlWriter is essentially writing out XML tags. The SDK offers classes and properties that we’re familiar with, but underneath it all, an Open XML spreadsheet’s internal structure are XML files zipped together. So instead of using the SDK classes to form the “XML” parts, we use the OpenXmlWriter to directly write out the XML.

The problem comes when we need to save. If you’re working with an existing worksheet, there’s already an existing XML structure in place. This means there’s a corresponding WorksheetPart (to your Worksheet class), and this WorksheetPart already has the XML structure (and more importantly, a DOM structure).

So you can’t just use the OpenXmlWriter to write the XML stuff into an existing WorksheetPart, because any changes you made will be ignored.

This is why in Brian’s code, he reads in the existing worksheet and writes the content into a new worksheet, thus using a new WorksheetPart that’s “attached” to the WorkbookPart. After he’s done reading, he deletes the old WorksheetPart.

In conclusion, either you

  • Work with the DOM structure using the SDK, which means snail-pace if you’re writing many cells
  • Work with new WorksheetPart

If you have something like worksheetPart.Worksheet.Save() mixed with the use of OpenXmlWriter, it’s not going to work.

XML attributes

The more subtle (and possibly more crucial) information that’s missing is of attributes. When you use the OpenXmlWriter, it works best if it’s used all the way. As in you don’t mix SDK class/property usage code with OpenXmlWriter code.

Basically, if there’s an SDK class property you want to set, don’t assign to that property in code. Use the OpenXmlWriter to write that property, because that property is eventually an XML attribute anyway. In this case, think of the SDK as getting in the way of making your code run faster.

In Brian’s code, notice that the RowIndex property of the Row class is not assigned. Which is fine since it’s optional. In this case, Excel tries to guess which row it is, and places the first Row class content into the first row on the spreadsheet. This is good if that’s what you want. And bad if your want your “first” Row class to be on the 3rd row instead.

You might also notice that the Cell class has no properties assigned. And no, CellFormula and CellValue are not properties of the Cell class, they’re actually child classes. The SDK allows you to access them as though they’re properties. This means the “standard” set of cell properties are not assigned:

  • The cell reference, such as “A1”
  • The cell data type, such as string or number
  • The cell style index

The code works because the cell content is a random number. And the default data type of a cell is a number (so it doesn’t need to be explicitly assigned). It will fail if you need strings or other data types.

And because there’s no cell reference given, Excel tries to guess where each cell is. Which is to say, place all the cells in that row flush to the left. This means you can’t have a cell in A1, and a blank cell in B1, and then a cell in C1, because Excel mooshes all the Cell classes together.

And because there’s no style index given, the default style is used. This is typically the minor font (such as Calibri) with no special formatting (such as font colour or bold or italics or borders).

So here’s sample code on how to use OpenXmlWriter. Note the class OpenXmlAttribute. And if you understand this, you will know that the OpenXmlWriter isn’t only for writing large Excel files. It just happens that it directly writes XML tags and attributes, and thus works the fastest for writing a lot of data.

using (SpreadsheetDocument xl = SpreadsheetDocument.Create("LargeFile.xlsx", SpreadsheetDocumentType.Workbook))
{
    List<OpenXmlAttribute> oxa;
    OpenXmlWriter oxw;

    xl.AddWorkbookPart();
    WorksheetPart wsp = xl.WorkbookPart.AddNewPart<WorksheetPart>();

    oxw = OpenXmlWriter.Create(wsp);
    oxw.WriteStartElement(new Worksheet());
    oxw.WriteStartElement(new SheetData());

    for (int i = 1; i <= 50000; ++i)
    {
        oxa = new List<OpenXmlAttribute>();
        // this is the row index
        oxa.Add(new OpenXmlAttribute("r", null, i.ToString()));

        oxw.WriteStartElement(new Row(), oxa);

        for (int j = 1; j <= 100; ++j)
        {
            oxa = new List<OpenXmlAttribute>();
            // this is the data type ("t"), with CellValues.String ("str")
            oxa.Add(new OpenXmlAttribute("t", null, "str"));

            // it's suggested you also have the cell reference, but
            // you'll have to calculate the correct cell reference yourself.
            // Here's an example:
            //oxa.Add(new OpenXmlAttribute("r", null, "A1"));

            oxw.WriteStartElement(new Cell(), oxa);

            oxw.WriteElement(new CellValue(string.Format("R{0}C{1}", i, j)));

            // this is for Cell
            oxw.WriteEndElement();
        }

        // this is for Row
        oxw.WriteEndElement();
    }

    // this is for SheetData
    oxw.WriteEndElement();
    // this is for Worksheet
    oxw.WriteEndElement();
    oxw.Close();

    oxw = OpenXmlWriter.Create(xl.WorkbookPart);
    oxw.WriteStartElement(new Workbook());
    oxw.WriteStartElement(new Sheets());

    // you can use object initialisers like this only when the properties
    // are actual properties. SDK classes sometimes have property-like properties
    // but are actually classes. For example, the Cell class has the CellValue
    // "property" but is actually a child class internally.
    // If the properties correspond to actual XML attributes, then you're fine.
    oxw.WriteElement(new Sheet()
    {
        Name = "Sheet1",
        SheetId = 1,
        Id = xl.WorkbookPart.GetIdOfPart(wsp)
    });

    // this is for Sheets
    oxw.WriteEndElement();
    // this is for Workbook
    oxw.WriteEndElement();
    oxw.Close();

    xl.Close();
}

This creates a spreadsheet with 100 columns and 50000 rows of data. On my machine, it ran in about 18 seconds.

I’m not going to list all the available XML attributes, but you can find out the correct attribute name by exploring the Open XML SDK documentation, or the official ECMA documentation.

Sorry, comments are closed, but you can contact me directly if you like.