Only quarters counted

Last week, at the end of the article, I presented a problem on floating point values:

Given the range of non-negative 2-decimal-point values less than 1, what are the values that can be represented exactly?

I was referring to dollars and cents, monetary values. As of this writing, there’s no answer submitted. I will assume that you’re very modest, and so will provide my version of the answer. First, we dissect the question. “non-negative” means cannot be negative, so zero and positive numbers only. “less than 1” means, well less than 1. And the values are restricted to numbers with up to 2 decimal points of accuracy.

So this means the set of values 0.00, 0.01, 0.02 … 0.97, 0.98, and 0.99 (1.00 not counted).

Before writing this article, I already an answer, intuited without actual calculations. Then I went to do some research of mine to verify that answer. I was mortified that my understanding of the IEEE floating point standard was wrong. I dug up my old university programming textbook, where there’s a section on internal floating point representation, and confirmed my misunderstanding.

My error was in how the exponent and mantissa were represented. In the end, I realised I had to convert all the 100 values (0.00 to 0.99) into binary representation to check. I didn’t relish doing long division by hand for a possible indefinite number of iterations (up to 32 anyway), and for 100 values. So I took the easier way out. I wrote a program. *smile*

decimal decBinary = 0.5m;
decimal decBuffer = 0;
int i, j;
for (i = 0; i < 100; ++i)
    decBuffer = i * 0.01m;
    Console.Write("{0} ", decBuffer.ToString("f2"));
    decBinary = 0.5m;
    for (j = 0; j < 32; ++j)
        if (decBuffer < decBinary)
            decBuffer -= decBinary;
        decBinary /= 2;

    if (decBuffer != 0)
        Console.WriteLine(" non-terminating");
        Console.WriteLine(" terminated");

It's in C# and there's not much comments. But you should still be able to follow much of the code, even if you aren't familiar in C#. This is assuming a 32-bit is used to represent a float. I ran the long division iteration up to 32 times, even though in a real representation, only 23 bits are used. Here's a Java applet IEEE binary converter you can play with too.

What happens is that, if after 32 bits are assigned (a 0 or 1), if more bits need to be assigned (not terminated), then the floating point representation isn't exact (since there's extra information truncated). You will then note that only 0.00, 0.25, 0.50 and 0.75 have terminating binary representations. This means, only those 4 values can be represented in a float exactly.

So, what was my original intuitive answer? Let's look at only 4 binary places in the mantissa. So 1010 represents 1*2^(-1) + 0*2^(-2) + 1*2^(-3) + 0*2^(-4), which is 0.5 + 0.125 = 0.625

Look at the value with only the 3rd mantissa digit as 1, 0010. It represents 0.125 in value. How do you "get rid" of the 0.005 to get 0.12 or 0.13? You can't, not from switching on bits in any of the mantissa digits. So what does this mean? The moment the 3rd mantissa digit is a 1, the floating point representation can never represent any of the 100 2-decimal-point values exactly.

In fact, the moment any mantissa digit that's the 3rd or after (4th, 5th and so on) is a 1, that representation is inexact already. So only the 1st and 2nd mantissa digits are "allowed" to be 1. So 0000 is 0.00, 0100 is 0.25, 1000 is 0.5, and 1100 is 0.75.

Same answer as from my laborious calculations. Out of the 100 values, only combinations of quarters or 25 cents have an exact floating point representation.

I hope this has been an interesting thought process for you. Has this changed the way you think about storing monetary values?

The float, the finance and the folly

When I was studying mathematics in university, I was introduced to 3 number “lines”, for lack of a better word. They were the integers (denoted by Z with a double line at the slant), natural numbers (denoted by N with a double line at the slant) and real numbers (denoted by R with a double line at the vertical). There were others, such as complex numbers, rational and irrational numbers. Let’s keep this simple, shall we?

Mathematical symbol for natural, integer and real numbers

Yeah, that’s how I wrote those symbols. I particularly like writing the Z with the 2 disconnected strokes. Ah, those times of writing mathematical proofs… To forestall a question you might have, the I is used to denote imaginary numbers, that’s why Z is used for integers. As to why Z is used for integers, Z

stands for Zahlen (German for numbers)

Integers (or whole numbers) are numbers without fractional parts, such as 5 or 72 or -3, including zero. Natural numbers are numbers for counting, referring to zero and positive integers (sometimes referred to as non-negative integers). Depending on context, zero might not be included. Real numbers are every single existing number available. Numbers such as 1.23456 and 3.14159 and 1.414. They also include integers, and naturally include the natural numbers.

Where am I going with this?

When I learned programming, the idea of having 2 sets of variable types dealing with numbers, one of integral types and one of non-integral types (which I gave a basic introduction here and here), was very natural. On the one hand, we have byte, short, int and long. On the other, we have float and double. I also learned the limitations of each variable type and the appropriate usage of each type.

You should be familiar with the integral types. It’s the non-integral types, or the floating points that I’m worried about. You will note that, while the integrals can’t represent all integers (they have a limit, like 2 ^ 31), they do represent the particular integer value exactly. A 100 is definitely a 100, for example (unless you change the form of representation).

The float and double variable types present a problem. They only hold a number value that’s close to the value you want (most of the time). It’s something about the IEEE representation, the mantissa and the exponent. Research on your own; it’s good for your learning.

Real numbers can have a precision stretching up to infinity. Obviously, our float and double have a bit of a problem with that. Actually this reminds me of a song I remembered from a children’s show (from way back when I was a child):

Thaaaaaaattt’ss iiiiinnnfinnnity
Yooouu can count forever
There’ll always be one more

That’s infinity
Count from dusk till dawn
You’ll never reach infinity
It’ll just go on, and on, and on, and on, and on…

You know, maybe I’ll sing and record it down… you know what? Maybe not. I’ll spare you the agony…

Ok, where was I? Oh yes, precision.

Games don’t need it

More precisely, precision usually isn’t an issue in games. floats and doubles are used in games for storing values such as the (x,y,z) positional coordinate of a player for instance. They don’t have to exact. They just have to be relatively close to the exact value. It’s not like you’re going to notice a 0.00000005 unit left shift, or you rotated like 0.000003 radians clockwise more than is required.

This is why floats are preferable to doubles in games (I might be dated on this…), because they’re faster to add and multiply to, and you don’t need the extra precision.

Financial applications depend on it

If you work with financial applications, it’s a whole new story. The Singapore and American currency use the dollar and cents. 100 cents equal a dollar. It’s typical to use $12.34 to represent 12 dollars and 34 cents (my apologies to the comma-toting Europeans). So the dollar is correct (and exact) up to 2 decimal places.

Does anyone see a problem with using floats on this?

Sure, the float is more than capable of displaying just 2 decimal places. But it cannot represent the smallest unit of currency 0.01 exactly. To illustrate this, try the following piece of code:

float f = 0f;
int i;
for (i = 0; i < 100; ++i)
    f += 0.01f;

It's in C#, and you should be able to translate to C or some other language you use. Note the output. 0.01 is not 0.01 when stored in a floating point variable type.

This presents some exceedingly infuriating debugging sessions while figuring out why certain values don't work out correctly. It's also why you cannot do equality checks with floating point values (I mentioned this in my recent newsletter, and if you're not on it, please go sign up).

If you're using C# or some other modern language, there should be a variable type for fixed-point numbers (fixed number of digits after radix [or the decimal point]), the decimal type being the one for C#. Or the numeric data type in databases, but stay away from the money type. I don't know, I find it funny to store my monetary values in a money variable type. *shudder*

If you're using C or some other language without fixed-point number variable types, then my advice is to avoid calculation with floating points as much as possible. Do them in the environment where they are still exact.

For example, you want to retrieve some values from the database and bind them to local program variables. Then you're going to do some calculation with the variables, and update them back into the database. Can you do the calculation entirely in the database environment?

You might not have grasped the significance of the code above. Sure, the values are still exactly represented for most of the 100 decimal values. Imagine adding, subtracting, multiplying and dividing those values tens of times, hundreds of times. Ever heard of rounding errors? Errors don't disappear, they accumulate. If you're lucky, they cancel each other out. If not...

Here's a question for you. Even though it's a folly to represent the 2-decimal-point currency values with a floating point variable, there are values which a float can represent exactly. Given the range of non-negative 2-decimal-point values less than 1, what are the values that can be represented exactly?

I'll publish my answer and reasoning in another (later) article. I also want to hear your answers too. Add your answer in a comment. Explain your reasoning too. Frankly, I'd much rather publish your answers, and add in some of my comments.