Software is everywhere

Coming soon to toasters and refrigerators. YouTube hates me, or WMV files, because the rendered video has visual static…

And celebrate whatever holiday is meaningful to you. Happy holidays!

Regular polygon equation

A while ago, a blog reader named BJ (he seems to prefer being called BJ. He? *checks email…* Yeah, he) emailed me with his answer to this question: Is there an equation to describe regular polygons?

I’m not clever enough to do much editing and explanation, so I’ll post his email (got his permission and clarified some points) here.

*Start email quote*

An example: A polygon equation can be approximated by a single continuous implicit equation. Suppose x*y=0.1 The concept is to think of this hyperbola as almost being the product of two lines intersecting at x=0 and y=0.

Construct a (3,4,5) right triangle like this: Begin with the multiplication (x – 1)(y + 1) = 0.1. This product is the first approximated vertex. Multiply a second time using (3y – 4x – 5). This creates two more curved vertices. The multiplication expression now has three factors on the LHS. The RHS remains 0.1. The final equation will form a figure with the approximated triangle being a central “island”. Below is an equation for a (3,4,5) right triangle that morphs from intersecting lines to the triangle and then on to the circle. I sandwiched the approximated triangle between two circles, and it becomes one circle in the limit as 0.1 -> 0.

((x + 0.5)^2 + (y – 1)^2 – 6.125) – ((x + 0.5)^2 + (y – 1)^2 – 6.125)(3y – 4x – 5)(x – 1)(y + 1) = 0.1

Notice that I have multiplied by a circle in the last term. I sandwiched the approximated triangle between two circles, and it becomes one circle in the limit as .1 -> 0. The last term is subtracted from the circle that in the first term. The multiplication expression goes to zero, and only the circle remains. This method is somewhat analogous to the method used in Euclidean geometry.

*End email quote*

He also sent a follow-up email:

*Start email quote*

Here is a quadratic form for the triangle [(0,0),(4,0),(4,3)].
(x – 4) y (y – 0.75x) = 0.01

Solve for y=f(x) on the range 0 <= x <= 4 y = (15x^2 - 60x + sqrt(225x^4 - 1800x^3 + 3600x^2 + 16x - 64)) / (40x - 160) y = (15x^2 - 60x - sqrt(225x^4 - 1800x^3 + 3600x^2 + 16x - 64)) / (40x - 160) Plot using two colors, one for each solution. *End email quote* You can add your thoughts on this in the comments.

Story time

I don’t have much else to add to that, so I’ll just tell you a story instead.

Back when I was in university, there was a programming problem that’s to calculate the value of PI. There were 2 methods involved.

The first method used Monte Carlo simulation. Basically you have a circle with radius 1 unit. So the diameter is 2 units. And you have a square that just contains this circle, so the square is 2 units wide by 2 units high.

The area of the square is 2*2 = 4 units. The area of the circle is given by PI*r*r which is PI (because r is 1 unit). And the ratio of area of circle to area of square is PI/4.

Using Monte Carlo simulation, I randomly selected a point within the square. I made a note of whether the point was within the circle or without. And

[Number of points within circle]/[Total number of points] = PI/4

Since PI is the only “unknown”, there you have it. Solve for PI. The more points you use, the more accurate PI is.

The other method involves calculating the circumference of a circle. Suppose you have a square with a width of square root 2. This is chosen such that the diagonal length of the square is 2 units. This means the distance from the centre of the square to the corner point of the square (any of the 4 of them) is 1 unit.

Are you getting the idea yet?

The perimeter of this square is 4 * (square root of 2). Then we double the number of sides so we get an 8-sided polygon, an octagon. But still keeping the “from centre to outer-most point is 1 unit length” condition. Using some more maths, we get the length of one side of this octagon and multiply it by 8. That will be the perimeter.

As we keep doubling the number of sides, this polygon eventually approximates a circle. And so the perimeter of said (regular) polygon approaches the circumference of a circle, which is 2*PI*r, or just 2*PI (because r is 1 unit). Solve for PI.

Exercise

To end this, I will leave it to you as an exercise to calculate the length of that regular polygon in the 2nd method. Start by understanding why one side of a square is square root of 2. Then continue with calculating the side length for an octagon.

Side note: The 2nd method terminates faster as an iterative process than the Monte Carlo simulation. However, it is also less “stable”. Hahaha… for extra credit, explain why it is less “stable”.

Extra side note: If people knew the formula for a circle is PI*r*r, wouldn’t they have known the value of PI already? This was why I found the Monte Carlo simulation method a little on the self-fulfilling side. You’re solving for something that you sort of know the value of. That’s weird, almost like cheating.

Modulo 26 and column names

I was sitting in the lecture theatre valiantly trying to keep awake. The professor was speaking on the rigorous application and proving of the modulus function. It’s basically the remainder, but I’ve never been introduced to it in such, uh, rigor.

He brought up an example using modulo 26. And demonstrated the wrapping around of values. And the use of it in cryptology (a class I took later on, and I got tasked by the cryptography professor to write a program to do simple encryption. But another story perhaps…).

Modulo 26 is similar to finding the remainder. The difference is that the remainder is unique. This is important to our discussion.

“About what?” you may ask.

Excel column names.

There are 2 types of cell references used in spreadsheets, the R1C1 format and the A1 format. The R1C1 is simple. If the row index is 5 and the column index is 7, the result is R5C7.

The A1 format takes on a column name and the row index. Using our example again, the result is G5, because “G” is the 7th alphabet. Yes, that list of 26 alphabets.

The version of Excel currently (Excel 2007 and 2010) has up to 3 letters, with XFD as the last column name (that’s the 16384th column). What happens is you have column names A, B, and then up to Z. Then the next column name is AA, then AB and then up AZ. Then BA, BB and so on.

Basically, it’s base 26 arithmetic.

As far as I know, the typical method of getting the column name given the column index, is to run a loop. You add 1 until it hits 26, then you move to the next “position”, and then start from 1 again.

There’s nothing wrong with this method. It’s just that you have to iterate as many times as the given column index. If you’re given 16384, the loop runs 16384 times. This is regardless of the fact that the result is always the same. Given the range of values, the result can only be one of 16384 values.

So it was with this in mind, and my dabbling in game development (which said “Precalculate everything!”), that I precalculated an array of strings that are the column names. The array has 16384 items. The context is a spreadsheet software library.

Now to recoup that precalculation cost, I’d have to access my array at least 16384 times. This is where the context of the spreadsheet library comes in. Everybody (and their dog) wants to know if my library can handle millions of cells. This means if the column name calculation is in a method, that method is called millions of times. Given the iteration loop in it, that means the method with the iteration loop thing isn’t efficient (it’s O(n^2)).

However, due to technical issues, I can’t keep the array of strings. The column name array needs to be static to be available throughout the library. This causes issues if multi-threads or multi-processors or multi-whatevers comes in.

So I can’t use my static array anymore. Bummer. The O(n) of simple array access was working so well.

But I still want to have an efficient way of getting column names. So instead of iterating, I used simple division and modulus operations.

Consider 587. How do you know there’s 5 hundred? 587 / 100 equals 5 (truncating remainders). How do you know there’s 8 tens? 87 / 10 equals 8 (truncating remainders).

Yes, it’s elementary arithmetic, but it works great. So we can do the same for column names. There’s a problem though.

In the case above, we divided by 100. Why 100? Because it’s 10^2, and we’re concerned with the 2nd position after the ones position. The ones position is 10^0 by the way, which is 1.

So for our case, for the “hundreds” position, we divide by 676, which is 26^2. And for the “tens” position, we divide by 26.

Now 587 is 100*5 + 10*8 + 7. I’m going to use the notation (5,8,7) to denote this.

Now consider a column index of 3380. 3380 is equal to 676*5 + 26*0 + 0. This is (5,0,0).

However, in our case, our acceptable range of values is 1 through 26. It doesn’t contain zero. So (5,0,0) is not valid.

In the case of 587, we’re working in base 10, with the acceptable range of values being 0 to 9. This is “proper” remainder. Given any number in base 10, there’s a unique number within [0, 9] that’s the remainder.

However, for our purposes, there’s no unique number. Because we’re working in modulo 26, not just “remainder 26”.

The correct column name corresponding to column index 3380 is “DYZ”. This corresponds to (4,25,26). Or
3380 = 676*4 +26*25 + 26.

Note that 3380 is also 676*5 + 26*0 + 0.

My solution is to start from the “ones” position. If it’s greater than zero, fine. If it’s less than or equal to zero, borrow from the next larger position. Then we move to the next larger position, and check again. Continue to borrow until there are no zero values (or negatives) on the “right” side of the resulting notation (we can have “leading” zeroes).

So (5,0,0) becomes (5, -1, 0 + 26), or just (5,-1,26), borrowing 1 from the “tens” position. We cannot have -1, so that becomes (4, -1 + 26, 26), which becomes (4, 25, 26).

An interesting effect is that we typically assign 0 to A, 1 to B, and 25 to Z. In this case, 1 is assigned to A, 2 is to B, and most interestingly, both 0 and 26 map to Z. In fact, any multiple of 26 will map to Z.

Don’t think Z is special. Any multiple of 26 plus 1 also maps to A. So 1, 27, 53 and so on map to A. This is a property of the modulo thing.

Do you have a better way of converting (5,0,0) to (4,25,26)? Let me know in the comments.

Being a software god is tough

“Can this value be negative?” asked my colleague.

We were in a meeting with a product manager to get project requirements. The software application was to calculate settlement revenue between our company and our company’s partners, who were content providers. We charge the public customers for the content, then we share the revenue with the content providers.

The ideal situation was that all the numbers are nice and neat, people pay on time, everyone plays nice and so on and so forth. But reality isn’t this simple.

The value in question was that the product manager wanted to have a mechanism for him to introduce adjustments. You know, in case something happens and we should bill the content provider more, meaning we share less revenue with them.

But you know, you can’t just give the content provider a net number. They’d want to know why they’re receiving less money. So the adjustment had to be a line item on the settlement report.

Since I wasn’t the senior developer there, I decided to voice the concern to my colleague later on. But my colleague anticipated it. “Can this value be negative?” asked my colleague.

The product manager thought about it, and said yes, it could be negative. It means our company had to pay the content provider more money. Say we calculated last month’s settlement wrongly and we’re correcting that this month *cough*.

Let me tell you, finance people are getting blase with the number of zeroes in financial figures, but put one hyphen in front of a number and the whole financial sector goes into collective apoplexy. Sheesh…

My point is that unthinkable input values are always possible. Like negative values. Sometimes, I think developers forget the entire infinity of numbers on the other side of the real line…

I’m continuing sort of a discussion of the book “Geekonomics” by David Rice. Rice wrote that writing software was akin to creating an entire world in which the developer was, well, the supreme being.

Every single rule is determined by the developer. Every limit. Every calculation. Every display.

Well, everything that the developer is aware of anyway.

If the developer didn’t remember to put in the laws of gravity, that cannonball would’ve fired straight into space instead of landing nicely on the enemy tanks in that simulation war game.

Mother Nature takes care of anomalies in her stride. Entire species of dinosaurs dying out? No problem. Hey this mammalian species looks interesting. Let me give them a chance.

Software anomalies (read: bugs) aren’t so easily absorbed… With the world increasingly overrun by software, where medical devices, stock markets, airline ticket prices (supposedly tuned by software to remain competitive by comparing prices of other airlines), traffic lights, cars (Google cars are driven by software), online commerce and possibly even your toaster (if it isn’t already wired to the Internet), software anomalies can have a huge impact.

Software is written by developers. A developer is human. A human is flawed. Hence software is by design, flawed.

The Architect in The Matrix designed all the software in the world of Matrix (ok, maybe there’s delegation…). Right down to leaves falling and wind blowing in your face and steak tasting like steak (and not chicken) and pigeons flying like they’re supposed to.

But the Architect is also the creation of a human being. The Architect simulated the “real” world flawlessly, but only so far as human knowledge goes. What if Newtonian physics didn’t exist? If the imperial system or the metric system wasn’t invented, would the Architect invent some other measurement system?

In the end, the entire Matrix was bugged by an anomaly. Agent Smith.

And before that, the Matrix was bugged by another systemic anomaly. Which was sort of solved by creating the notion of The One. Hence Neo. (Ironically, the anomaly Agent Smith was the creation of the then current The One, the solution to solving the original systemic anomaly).

Get this. The Architect had to solve the systemic anomaly by flushing the entire Matrix, killing everyone except The One and the people chosen by The One to repopulate the Matrix.

Software is precise and elegant. Until it meets humans. Then everything hits the fan.

The Architect couldn’t solve his own software bugs (unless you call purging the entire Matrix as “solving”). Being a software god is tough.

Would you say the computer software entity known as The Architect is proficient in writing code? Because coming up soon, I’m going to write about another topic, that of software developer licensing, something that Rice also touched on. I’m talking “pass the bar or you don’t get to practice law” kind of accreditation.

Calculate Excel column width pixel interval

Brace yourself. You’re about to learn the secret behind how Excel mysteriously calculates the column width intervals.

In this article, I’m not going into the details of the column widths, but the column width intervals. There’s a difference. From the Open XML SDK specs:

width = Truncate([{Number of Characters} * {Maximum Digit Width} + {5 pixel padding}] / {Maximum Digit Width} * 256) / 256

To put it mildly, that’s a load of hogwash. In the documentation, it says that for Calibri 11 point at 96 DPI, the maximum digit width is 7 pixels. That is also another load of hogwash. It’s actually 8 pixels (well, 7 point something…).

When you move the line on the column width in Excel, just 1 pixel to the left, what is the column width? When you move it 1 pixel to the right, what’s the column width?

It turns out the each pixel interval isn’t a simple multiple of an internal column width interval.

Let’s take Calibri 11 pt 96 DPI again. With a maximum digit width of 8 pixels, each column width interval per pixel is supposedly 1/7 or 1/(max digit width -1).

But wait! It’s not actually 1/7. It’s the largest number of 1/256 multiples that is less than 1/7.

Now 1/7 is about 0.142857142857143. The actual interval is 0.140625, which is 36/256.

4/7 is about 0.571428571428571. The actual interval is 0.5703125, which is 146/256. And you will note that 146 is not equal to (4 * 36).

If you’re using Open XML SDK (or however you choose to access an Open XML Excel file), when you set the column width as 8.142857142857143, internally, Excel will save it as 8.140625.

Here’s some code:

int iPixelWidth = 8;
double fIntervalCheck;
double fInterval;
for (int step = 0; step < iPixelWidth; ++step)
{
    fIntervalCheck = (double)step / (double)(iPixelWidth - 1);
    fInterval = Math.Truncate(256.0 * fIntervalCheck) / 256.0;
    Console.WriteLine("{0:f15} {1:f15}", fIntervalCheck, fInterval);
}

So now you know how the intervals are calculated. But what about the actual column width? Hmm... perhaps another article...

P.S. I'm currently doing research for how to do autofitting for rows and columns for my spreadsheet library. I found this "secret" after fiddling with Excel files for a couple of hours... I know I'm talking about my library a lot, but it's taking up a lot of my brain space right now, so yeah...