Considerations for storing Excel cell value in code

You may hate Excel, but you may find a discussion of how Excel stores cell values interesting.

So I have a spreadsheet library. The biggest concern at the initial stage was how to store all the spreadsheet data efficiently. I hear people talking about millions of cells, so I’m scared. If my program stores a spreadsheet cell using 10 bytes (for example), a million cells would take up 10 million bytes in memory.

Let’s start by looking at all the different types of information you can type into a spreadsheet cell. You have:

  • booleans: TRUE or FALSE
  • numbers
  • text
  • rich text (different styled text within the entire text itself)
  • dates and times

For us programmers, “numbers” can be separated into floating point or integer types. An Excel user won’t see a difference.

So how does Excel actually store those values? I’m going to focus only on Open XML because I’m not interested in BIFF files…

  • booleans: TRUE stored as text “1” and FALSE stored as text “0”
  • numbers: stored as text
  • text: duh
  • rich text: stored in a separate shared strings list, with the index to that list stored as text here.
  • dates and times: stored as number that’s in text form

You will see everything is basically stored as text. That’s because the underlying XML files are text files. There’s a property (XML attribute) that differentiates the data, such as boolean, number, string, inline string, shared string.

So why are dates stored as a number? It’s easier to do date calculations with 41449 than “24 June 2013”. So how is this number obtained? See here.

So if you’ve been looking closely enough, Excel’s optimisation tactic is to store everything as numeric text as far as possible. So I want to follow that.

Before doing so however, I went to read what other people are doing AKA open source spreadsheet libraries. In code, they use an object to store the cell value. As in System.Object, the mother of all data types in .NET.

So you have an integer? Dump it into the object variable. Floating point? Dump into object. String of characters? Dump.

How do you read it out? Boxing and unboxing. You remember it’s a floating point value and cast it back from an object to a double variable type.

So what did I do? I have a double variable and a string variable, and I store the cell value in one or the other based on the input.

The “all in object” way has variable (no pun intended) memory size, based on the contents. Sort of. I’m not an expert in this.

My way has a fixed memory size for double’s. Each double takes up 8 bytes (for sure?). A string variable takes up variable size, but because the optimisation tactic is to store data as a number, I can assign the data to the double variable and set the string variable to null. This means the string variable size is sort of fixed too.

So this is what I do. If it’s a number, I store it in the double variable and set the string variable to null. If it’s text, I convert it to a number by using shared strings (out of scope for discussion here) and store the index into the double variable and set the string variable to null. The only cases where the string variable is actually used is if I store the text there, or if I want to store the actual number there (because “1.23456789” may not be stored exactly as that in a double variable. Go read on how floating points are implemented for details), which are rare.

According to Jon Skeet, strings take up 20 + (n/2)*4 bytes (where n is the number of characters). But a null string takes up 8 bytes (it’s either 4 or 8 bytes. I’ll assume the worse scenario).

This means for the most part, each cell has a double variable that takes up 8 bytes and a null string that takes up 8 bytes. A cell value of 10 or 3.14 or 12345678.9 takes up 16 bytes regardless.

Since 16 bytes is less than 20 + (n/2)*4 bytes, I save more memory in most cases. I also have less boxing and unboxing operations, which make things go faster.

People buy what they value

“I don’t have money leh. After 15th?”

2 freaking dollars. They don’t have 2 freaking dollars.

Side note: The “leh” is an affectation of Singaporean English speech. It’s appended to most sentences as a sort of finishing element. By itself, it doesn’t mean anything.

Collecting money can be tough

I was tasked to collect mess fees from non-specialists in my unit. I was a lance corporal in the military. I was 20 years old.

In case you’re not familiar with military terms (I know I’m not…), the “mess” refers to the place where soldiers eat. Specialists refer to sergeants and above, until you hit officer ranks. For the purpose of this article, non-specialists are recruits (just joined), privates, lance corporals and corporals (ranked in that order).

I can’t remember why mess fees were needed, but I was to collect them from non-specialists (in my unit only). The specialists have their own specialists mess. The whole military compound had a food hall, which was free. Then there’s the non-specialists mess (which we hardly visit, but maybe other units frequent). Then there’s the specialists mess. And then there’s the 1 stall just outside my unit (the men in my unit preferred this than trekking all the way to the non-specialists mess).

Anyway, I was sort of favoured by the S4. I type bloody fast and he gave me paper documents which I was to transform into digital Word documents and save into a floppy disk. (Haha! Floppy disks! It was 1997.)

Yes, some of those documents were sensitive. No, I can’t remember anything. Torturing me will be a waste of your time. Have I mentioned it was 1997?

As a reference point, the S4 was the officer in charge of logistics and was one of the highest ranking officers in the compound. He had his own personal clerk. When his clerk left (the clerk finished his mandatory period of service), his duties were somehow passed on to me. One of those duties was to collect mess fees.

Coincidentally, I was the treasurer when I was in the Chinese Orchestra in secondary school. My advice? Do not be directly responsible for other people’s money if you can help it. I couldn’t sleep when I found the money I had on hand was different from what the record books said. I was about 15 years old. Good grief…

So. Recruits and privates were to pay $1, lance corporals to pay $2, and corporals to pay $3. The men were good-natured enough, but getting them to cough up money was a pain…

Why the 15th? Well, I was to get the money to the mess hall by the 10th of the month (I can’t remember the exact payment date. Let’s go with the 10th). After several months of failed attempts to submit on time, I managed to persuade the mess hall people to let me pay after 15th. This was because the army pays everybody on the 15th.

Granted, we weren’t paid a lot. It’s about a couple of hundred dollars a month, depending on your rank and length of service. $2 was maybe 1% or less of your military salary. But in absolute terms, $2 is nothing. The men typically spend more than that at the canteen every day.

Recession? What recession?

People pay for what they value. The men didn’t value the mess that much. Hence the reluctance to pay.

People still buy the latest iPhone, even though they still own a perfectly working previous version. People still go on vacations. People still go to expensive restaurants. The price isn’t the issue. If people value something enough to overcome the price, they’ll pay for it.

Here’s an interesting observation. I had little trouble with the recruits, privates and corporals. The recruits and privates were new to the military, and as a lance corporal *ahem* I was able to get them to pay up. The corporals were people who were going to the university after they finish their military service. They’d pay up so that I’m out of their hair or they don’t want my life to be miserable or whatever.

The lance corporals were from the hardier sides of Singapore. Polytechnic students or with lower education status.

Now I’m not saying the education status was the cause. I’m saying the attitude is different. The lance corporals were negotiating the terms. (My own rank was a different story. I was eventually promoted to a full corporal).

Once it was after the 15th, the men didn’t give me any more excuses. They’d just pay up. They weren’t trying to make my life difficult in the first place.

Entrepreneurs and polymaths

Entrepreneurs and polymaths have something in common. Entrepreneurs need to be knowledgeable in many areas, though mostly in the management of business, and mostly being knowledgeable enough is enough (delegation then comes into the picture). Polymaths are knowledgeable in many areas, though the reasons for it may be different. I pin the common reason as simply being curious.

Scott Adams (who draws Dilbert) wrote an article on entrepreneurship.

The first thing you should learn in a course on entrepreneurship is how to make yourself valuable. […] The world has plenty of better artists, smarter writers, funnier humorists and more experienced business people. The rare part is that each of those modest skills is collected in one person. That’s how value is created.

We still need better artists, smarter writers, funnier humorists and more experienced business people. But to have one person advise you on art(istic inclinations), writing, humor and business? Priceless.