How to fix Open XML SDK created spreadsheet for iOS devices

Note that this is only for Open XML SDK 2.0. I haven’t tested for SDK 2.5 (or later, but there’s no “later” version at the point of this writing).

The fix

You have to manipulate the [Content_Types].xml file. This XML file is zipped together in the spreadsheet file, and is at the root of the file. If you don’t know what I’m talking about, you’re probably reading the wrong article.

Unfortunately, this can’t be done within Open XML SDK. As I understand it, the SDK runs internally on System.IO.Packaging namespace objects.

Now a package is a zip file with structure. But a zip file is not necessarily a package. (go maths logic!) See Microsoft reference (bottom of page).

As I understand it, you can’t manipulate a package’s [Content_Types].xml with any of the methods or classes from the System.IO.Packaging namespace. This XML file is supposed to be tamper-proof.

So if you can’t change it with the Packaging stuff, then you’ll have to manipulate it as a pure zip file. And as at .NET Framework 3.5 (because I’m using Open XML SDK 2.0 as reference), there’s no in-built zipping mechanism for the kind of zipping algorithm you need (the Gzip and Deflate isn’t exactly the tool).

This means you need an outside zip library. I leave it to you to find your favourite library. Assuming you found one, here’s why you need to change the [Content_Types].xml file.

The one generated by Excel looks something like this:

<?xml version="1.0" encoding="UTF-8" standalone="true"?>
<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types">
<Default ContentType="application/vnd.openxmlformats-package.relationships+xml" Extension="rels"/>
<Default ContentType="application/xml" Extension="xml"/>
<Override ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml" PartName="/xl/workbook.xml"/>
<Override ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml" PartName="/xl/worksheets/sheet.xml"/>
<Override ContentType="application/vnd.openxmlformats-officedocument.theme+xml" PartName="/xl/theme/theme.xml"/>
<Override ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.styles+xml" PartName="/xl/styles.xml"/>
<Override ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sharedStrings+xml" PartName="/xl/sharedStrings.xml"/>
<Override ContentType="application/vnd.openxmlformats-package.core-properties+xml" PartName="/docProps/core.xml"/>
</Types>

The one generated by Open XML SDK 2.0 looks something like:

<?xml version="1.0" encoding="UTF-8"?>
<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types">
<Default ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml" Extension="xml"/>
<Default ContentType="application/vnd.openxmlformats-package.relationships+xml" Extension="rels"/>
<Override ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml" PartName="/xl/worksheets/sheet.xml"/>
<Override ContentType="application/vnd.openxmlformats-officedocument.theme+xml" PartName="/xl/theme/theme.xml"/>
<Override ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.styles+xml" PartName="/xl/styles.xml"/>
<Override ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sharedStrings+xml" PartName="/xl/sharedStrings.xml"/>
<Override ContentType="application/vnd.openxmlformats-package.core-properties+xml" PartName="/docProps/core.xml"/>
</Types>

Spot the difference time!

The workbook part is missing. Specifically, look for the XML tag with the ContentType attribute equal to “application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml”

Open XML SDK, for whatever reason, doesn’t explicitly set the PartName=”/xl/workbook.xml” part. This is why iOS devices can’t see the spreadsheet file.

I don’t know whether the fix will work on Android devices or Windows phones. But a developer emailed me that iOS 6.0 devices seem to still be able to read the spreadsheet files by Open XML SDK, but iOS 7.0 just gives up.

Microsoft, please, you created the Open XML specs. Other companies are following the specs and the one useful tool you made to help generate Open XML documents fail to follow the specs.

Specious spreadsheet security

If not for Chinese, it would’ve worked.

So in Excel, you can set a password for either protecting a particular worksheet, or protecting the entire workbook/spreadsheet. This password is then hashed, and the result is stored within the spreadsheet contents.

Now with the use of Open XML spreadsheets, this means the resulting hash is stored in “plain text” within XML files. Without going into too much detail, here’s the algorithm for the hash as documented in the Open XML SDK 2.0 help docs:

// Function Input:
//    szPassword: NULL-terminated C-style string
//    cchPassword: The number of characters in szPassword (not including the NULL terminator)
WORD GetPasswordHash(const CHAR *szPassword, int cchPassword) {
      WORD wPasswordHash;
      const CHAR *pch;
 
      wPasswordHash = 0;
 
      if (cchPassword > 0)
            {
            pch = &szPassword[cchPassword];
            while (pch-- != szPassword)
                  {
                  wPasswordHash = ((wPasswordHash >> 14) & 0x01) | ((wPasswordHash << 1) & 0x7fff);
                  wPasswordHash ^= *pch;
                  }
            wPasswordHash ^= (0x8000 | ('N' << 8) | 'K');
            }
      
      return(wPasswordHash);
}

This algorithm is wrong. Or at least it doesn't give the resulting hash that Excel produces.

Granted, I didn't really expect the algorithm to be correct. Because from the SDK help:

An example algorithm to hash the user input into the value stored is as follows:

It's an example algorithm, so it may or may not be the one that Excel actually use. However, for the purposes of usability, the password used by users have to be encrypted using Excel's algorithm, so we have to somehow get the resulting hash. Either we get the algorithm itself, or we simulate an algorithm such that the resulting hash matches that of Excel's.

Which is what Kohei Yoshida did. See his modified algorithm. This algorithm worked!

Given that I'm Chinese, I did what's natural: I used Chinese characters as the password.

And the modified algorithm failed. It only worked if the password consisted only of Latin alphabets. I tried Japanese characters. Failed too.

This is why I don't support password protection in SpreadsheetLight. I don't want to give the false impression that the worksheet/workbook encryption works. This is one feature where "partially worked" is unacceptable.

Granted Open XML spreadsheets also support other types of encryption, and you can store the name of the algorithm and salt value you used, and even the number of times the hash algorithm was run.

But.

This is in the context of a spreadsheet library.

What are spreadsheet libraries mostly used for? Automation.

Which means minimal (if at all) human interaction and intervention.

And so the question comes up.

Where do you store the password?

If a human is protecting the worksheet/workbook, she will provide the password herself, and then encrypts it (well Excel does it), and she just remembers the password.

If a spreadsheet library is generating the workbook, and encrypting it, the password has to be gotten from somewhere, right? So it's stored, maybe in a text file, maybe in a database, maybe hardcoded in code, or whatever.

The point being that the password is not held solely by a human being. And a computer hard drive is easier to hack into than a human brain.

And I will prefer not to be a party to facilitating insecure spreadsheet generation. Besides, it's Open XML. The data is supposed to be open and sharable. Password protection seems to be the opposite. Even Microsoft cautions the use of depending just on encrypted Excel files.

We already have social engineering used by devious people to deceive people into giving their passwords over. Storing passwords on a computer seems suicidal because computers have no common sense at all.

As a final word, I'd say using the Open XML SDK can be either verbose, or obscenely painstakingly verbose. Simple tasks need a couple of dozens of lines of code, and complex tasks take at least 2 magnitudes of work to do. For individualistic, compartmentalisable (that's not a word, right?) tasks, you can do it from scratch. Add even a smidgeon of complexity, and you'll find a library more useful. Try mine.

SpreadsheetLight version 3

Version 3 of my spreadsheet library is now available. There’s a whole bunch of updates, including Excel 2010 conditional formatting such as data bars with negative value fill colours and icon sets with no icons.

SpreadsheetLight is possibly the most developer-friendly spreadsheet library ever. Even if I do say so myself. 🙂

Calculate Excel column width pixel interval

Brace yourself. You’re about to learn the secret behind how Excel mysteriously calculates the column width intervals.

In this article, I’m not going into the details of the column widths, but the column width intervals. There’s a difference. From the Open XML SDK specs:

width = Truncate([{Number of Characters} * {Maximum Digit Width} + {5 pixel padding}] / {Maximum Digit Width} * 256) / 256

To put it mildly, that’s a load of hogwash. In the documentation, it says that for Calibri 11 point at 96 DPI, the maximum digit width is 7 pixels. That is also another load of hogwash. It’s actually 8 pixels (well, 7 point something…).

When you move the line on the column width in Excel, just 1 pixel to the left, what is the column width? When you move it 1 pixel to the right, what’s the column width?

It turns out the each pixel interval isn’t a simple multiple of an internal column width interval.

Let’s take Calibri 11 pt 96 DPI again. With a maximum digit width of 8 pixels, each column width interval per pixel is supposedly 1/7 or 1/(max digit width -1).

But wait! It’s not actually 1/7. It’s the largest number of 1/256 multiples that is less than 1/7.

Now 1/7 is about 0.142857142857143. The actual interval is 0.140625, which is 36/256.

4/7 is about 0.571428571428571. The actual interval is 0.5703125, which is 146/256. And you will note that 146 is not equal to (4 * 36).

If you’re using Open XML SDK (or however you choose to access an Open XML Excel file), when you set the column width as 8.142857142857143, internally, Excel will save it as 8.140625.

Here’s some code:

int iPixelWidth = 8;
double fIntervalCheck;
double fInterval;
for (int step = 0; step < iPixelWidth; ++step)
{
    fIntervalCheck = (double)step / (double)(iPixelWidth - 1);
    fInterval = Math.Truncate(256.0 * fIntervalCheck) / 256.0;
    Console.WriteLine("{0:f15} {1:f15}", fIntervalCheck, fInterval);
}

So now you know how the intervals are calculated. But what about the actual column width? Hmm... perhaps another article...

P.S. I'm currently doing research for how to do autofitting for rows and columns for my spreadsheet library. I found this "secret" after fiddling with Excel files for a couple of hours... I know I'm talking about my library a lot, but it's taking up a lot of my brain space right now, so yeah...

Spreadsheet chart reference manual now available

After weeks of working on this, I’ve finally finished my reference manual on Open XML charts in spreadsheets. It’s called (imaginatively) Spreadsheet Open XML Charts.

It’s sort of a sequel to my other reference manual on Open XML spreadsheets.

If you do any work on Open XML spreadsheets and in charting specifically, I’m certain you’ll find this useful. Please check it out and let me know what you think.

Now if you’ll excuse me, I’ve got to go crash and do nothing. Perhaps get a cup of tea or hot chocolate.