Academic self-publishing

As of this writing, I’ve worked in the academic industry for slightly over 3 months. More on this later.

Claire Morgan sent me this article on academic self-publishing. There are good points and bad points regarding self-publishing, which are exaggerated particularly so for the academic industry. I should know, since I’m making a web application for internal use in a university.

As part of my work, I learnt a lot about how the Ph.D. degree students go through their academic years, and what are the processes they go through. Yeah my web application is going to be used by staff and future doctors (albeit Ph.D.’s). Go me.

One of the more important parts of being a Ph.D. student is publications. It’s not necessary to publish any papers but they help increase your credibility. Perhaps in helping with raising the impressions of examiners who attend your thesis defence. Every little bit helps, I guess.

But you can’t just publish your work anywhere. The publisher has to be of repute and your work must be peer reviewed.

This means self-publishing is looked down upon even more by the academic industry.

But wait!

There are 2 points I found interesting in Claire’s article.

The 1st point is that you don’t have to publish your entire work. Maybe a non-vital piece of your research but is still informative and useful. “Non-vital” is of course relative to your entire research work as a whole.

The 2nd point is that you can publish mistakes or research results that didn’t go anywhere. These don’t help you and your final research publication, but they might help some other poor fellow from wasting resources going down that route.

So what’s happening lately for me?

As I said, I’m working in a university now. Well, technically they outsourced the work to me so I’m still considered self-employed. We’ll see…

I’m stationed in a research lab. I’m surrounded by Ph.D. students. My manager is a Ph.D. student. Or at least he’s the project officer who’s my liaison between the IT support department and me. But psshht, semantics.

As part of my work, I’m using ASP.NET MVC (model-view-controller) and jQuery. Both of which I’m not familiar with at the start of my contract. Fun times… I was more of a back-end kind of guy.

Speaking of back-end, my spreadsheet library is doing well. Go check out SpreadsheetLight if you haven’t, and it’s also on NuGet (which I just learnt about, even though I’ve been using Visual Studio for like donkey years. Like I said, I’m more of a back-end guy. How do you pronounce “NuGet”? Like “nugget”?).

So yeah, if you haven’t, go read that article. Tell me what you think.

Specious spreadsheet security

If not for Chinese, it would’ve worked.

So in Excel, you can set a password for either protecting a particular worksheet, or protecting the entire workbook/spreadsheet. This password is then hashed, and the result is stored within the spreadsheet contents.

Now with the use of Open XML spreadsheets, this means the resulting hash is stored in “plain text” within XML files. Without going into too much detail, here’s the algorithm for the hash as documented in the Open XML SDK 2.0 help docs:

// Function Input:
//    szPassword: NULL-terminated C-style string
//    cchPassword: The number of characters in szPassword (not including the NULL terminator)
WORD GetPasswordHash(const CHAR *szPassword, int cchPassword) {
      WORD wPasswordHash;
      const CHAR *pch;
 
      wPasswordHash = 0;
 
      if (cchPassword > 0)
            {
            pch = &szPassword[cchPassword];
            while (pch-- != szPassword)
                  {
                  wPasswordHash = ((wPasswordHash >> 14) & 0x01) | ((wPasswordHash << 1) & 0x7fff);
                  wPasswordHash ^= *pch;
                  }
            wPasswordHash ^= (0x8000 | ('N' << 8) | 'K');
            }
      
      return(wPasswordHash);
}

This algorithm is wrong. Or at least it doesn't give the resulting hash that Excel produces.

Granted, I didn't really expect the algorithm to be correct. Because from the SDK help:

An example algorithm to hash the user input into the value stored is as follows:

It's an example algorithm, so it may or may not be the one that Excel actually use. However, for the purposes of usability, the password used by users have to be encrypted using Excel's algorithm, so we have to somehow get the resulting hash. Either we get the algorithm itself, or we simulate an algorithm such that the resulting hash matches that of Excel's.

Which is what Kohei Yoshida did. See his modified algorithm. This algorithm worked!

Given that I'm Chinese, I did what's natural: I used Chinese characters as the password.

And the modified algorithm failed. It only worked if the password consisted only of Latin alphabets. I tried Japanese characters. Failed too.

This is why I don't support password protection in SpreadsheetLight. I don't want to give the false impression that the worksheet/workbook encryption works. This is one feature where "partially worked" is unacceptable.

Granted Open XML spreadsheets also support other types of encryption, and you can store the name of the algorithm and salt value you used, and even the number of times the hash algorithm was run.

But.

This is in the context of a spreadsheet library.

What are spreadsheet libraries mostly used for? Automation.

Which means minimal (if at all) human interaction and intervention.

And so the question comes up.

Where do you store the password?

If a human is protecting the worksheet/workbook, she will provide the password herself, and then encrypts it (well Excel does it), and she just remembers the password.

If a spreadsheet library is generating the workbook, and encrypting it, the password has to be gotten from somewhere, right? So it's stored, maybe in a text file, maybe in a database, maybe hardcoded in code, or whatever.

The point being that the password is not held solely by a human being. And a computer hard drive is easier to hack into than a human brain.

And I will prefer not to be a party to facilitating insecure spreadsheet generation. Besides, it's Open XML. The data is supposed to be open and sharable. Password protection seems to be the opposite. Even Microsoft cautions the use of depending just on encrypted Excel files.

We already have social engineering used by devious people to deceive people into giving their passwords over. Storing passwords on a computer seems suicidal because computers have no common sense at all.

As a final word, I'd say using the Open XML SDK can be either verbose, or obscenely painstakingly verbose. Simple tasks need a couple of dozens of lines of code, and complex tasks take at least 2 magnitudes of work to do. For individualistic, compartmentalisable (that's not a word, right?) tasks, you can do it from scratch. Add even a smidgeon of complexity, and you'll find a library more useful. Try mine.

SpreadsheetLight version 3

Version 3 of my spreadsheet library is now available. There’s a whole bunch of updates, including Excel 2010 conditional formatting such as data bars with negative value fill colours and icon sets with no icons.

SpreadsheetLight is possibly the most developer-friendly spreadsheet library ever. Even if I do say so myself. 🙂

In The Mirror (cover)

This is one of my favourite piano pieces by Yanni.

I’m not a pianist. I’ve never taken piano lessons. I’m like grade -1 or something. This took me over a dozen tries…

Credits and permission granted from:

Music by Yanni
23rd Street publishing Inc/Yanni Music Publishing (ASCAP)
Used by Permission. All Rights Reserved

Passion problems

It was a fluke. I just fell into the spreadsheet business.

My main products currently, both in writing and in code, revolve around spreadsheets. Specifically around Excel workbooks or spreadsheets or files or whatever you call them.

I like programming and writing code. I’m great at it. I just didn’t specifically choose making spreadsheet libraries.

There’s a say out there about following your passion and “then the money will follow”. The problem is that you might just wait for that passion of yours to appear.

People pay for skill. They pay for convenience, pain relief, time saving, effort saving, money saving, revenue generation. They pay for what’s important to them. They don’t pay for your passion.

When it all started, I was working in the billing support department. Lots of financial numbers flying around. Lots of spreadsheets needed.

So I was forced to figure out a way to create Excel files with a minimum of fuss. I started with just XML files which Excel accepted.

Then more complexity was needed. Slowly I honed my skills at deciphering what users want, what Excel does behind the scenes and how to create Excel files.

By making my spreadsheet library as easy to use as using Excel, I am now an expert at using Excel.

A user unchecks a checkbox. I do the same thing and see what changes in the spreadsheet.

I may not be one of those Excel gurus who write 500 page Excel user guides, but I’m close enough. My spreadsheet library may not have all the features of commercial libraries that cost thousands of dollars, but it bloody well does the job in the most accessible manner possible.

This is skill honed over hours of study and practice and experimentation.

The funny thing is, I kinda have a passion for it developed over time. Oh I still hate Excel sometimes for being weird or incomprehensible in behaviour, but it’s mostly in jest. Ok I have an unseemly hatred for how Excel calculates column widths…

Develop a skill worth paying for. You may have a passion for playing video games but only a few people manage to be paid doing so. (if you’re interested, check out the gameplay videos on YouTube. That’s one way of being paid. Advertising or even sponsorship)

What are your marketable skills? How can you become so good that people have to pay attention to you, and thus pay you to help them?