## Floating point errors – Sine X Taylor series

Stop rolling your eyes. I actually have a point to make, and the Taylor series (from math) is the best candidate to illustrate it. Think of it as expanding your education.

In simple terms, the Taylor series is a polynomial approximating another function. For example, the Taylor series for the trigonometry function sin(x) is

x – x^3/3! + x^5/5! – x^7/7! …

(x^3 means x to the power of 3, or x*x*x. 5! means 5 factorial, or 1*2*3*4*5)

As you can see, the series is infinite. You’ll also note that if you leave only the first term, sin(x) = x. Remember my warning about the pitfall.

So, that’s about all the math knowledge you need. That was easy, right? Our problem is in calculating that series. Suppose you don’t have a math library in your code base, then you’ll need to resort to the Taylor series to help approximate the sine function.

In an ideal world, the math logic you implement in code will evaluate exactly in value. Because of the finiteness of representing numbers in computers, you will get an approximate answer. And it’s more “approximaty” if you come up with the wrong algorithm for implementation too.

Note that factorial values can grow very big. 12! is 479001600, which is the limit for a 32-bit integer (13! is 6227020800, exceeding 2147483648 = 2^31). Suppose we want to stick to 32-bit integer variables, then this means the Taylor series for sin(x) can only go up to

x – x^3/3! + x^5/5! – x^7/7! + x^9/9! – x^11/11!

6 terms only. It doesn’t feel very accurate… What if we could “remove” the factorial? Note that

x^5/5! = x/1 * x/2 * x/3 * x/4 * x/5

Then we could have more terms when we reduce the factorials into floating point values.

And I’m going to let you in on another secret. **The direction in which you add the terms together yields different results!** So your calculation for

x – x^3/3! + x^5/5! – x^7/7! + x^9/9! – x^11/11!

gives a slightly different answer for

– x^11/11! + x^9/9! – x^7/7! + x^5/5! – x^3/3! + x

This happens because of the limitation of the variable type you use to store intermediate results. Generally speaking, you should start with the smallest intermediate values and work your way up. So this:

– x^11/11! + x^9/9! – x^7/7! + x^5/5! – x^3/3! + x

is preferred. The factorial in the denominator increases faster than the exponential in the numerator, so higher order terms are smaller in value.

In a following code sample, I’ve labelled sections A, B, C, D, E, and F.

Sections A, C and E contain code where the `for`

loops are in ascending order. Sections B, D and F contain code where the `for`

loops are in descending order.

Sections A and B form the pair where the `float`

variable type is used to store intermediate values.

Sections C and D form the pair where the `double`

variable type is used.

Sections E and F still use the `double`

variable type, and the implementation logic for x^n/n! is do x^n and n! separately, then divide x^n by n!. This reduces the number of floating point operations (by about half actually), and so reduces the calculation errors.

Because of the use of factorial, we’ll use a `long`

(a 64-bit integer in C# and appropriate operating system) which has more capacity to store the intermediate factorial value.

We’ll be calculating sin(PI), so the Taylor series is

PI – PI^3/3! + PI^5/5! – …

Alright, a short lesson in math. Odd numbers can be represented in the form of (2n + 1), where n is an integer. Hence this part of the code

k = 2 * i + 1;

And this part does the alternating minus and plus sign of the series

if (i % 2 == 1) fBuffer = -fBuffer;

And here’s the code:

const int cnTerms = 9; float floatx = (float)Math.PI; double doublex = Math.PI; float fAsc, fDesc, fBuffer; double fAscInDouble, fDescInDouble, fBufferInDouble; double fAscAlt, fDescAlt; long iBuffer; int i, j, k; StreamWriter sw = new StreamWriter("taylorsinepi.txt"); fAsc = 0.0f; for (i = 0; i < cnTerms; ++i) { k = 2 * i + 1; fBuffer = 1.0f; for (j = 0; j < k; ++j) { fBuffer *= floatx / (float)(j + 1); } if (i % 2 == 1) fBuffer = -fBuffer; fAsc += fBuffer; sw.WriteLine("A {0,0:d2} {1,0:f8}", i, fBuffer); } sw.WriteLine(); fDesc = 0.0f; for (i = cnTerms - 1; i >= 0; ----i) { k = 2 * i + 1; fBuffer = 1.0f; for (j = k - 1; j >= 0; ----j) { fBuffer *= floatx / (float)(j + 1); } if (i % 2 == 1) fBuffer = -fBuffer; fDesc += fBuffer; sw.WriteLine("B {0,0:d2} {1,0:f8}", i, fBuffer); } sw.WriteLine(); fAscInDouble = 0.0; for (i = 0; i < cnTerms; ++i) { k = 2 * i + 1; fBufferInDouble = 1.0; for (j = 0; j < k; ++j) { fBufferInDouble *= doublex / (double)(j + 1); } if (i % 2 == 1) fBufferInDouble = -fBufferInDouble; fAscInDouble += fBufferInDouble; sw.WriteLine("C {0,0:d2} {1,0:f16}", i, fBufferInDouble); } sw.WriteLine(); fDescInDouble = 0.0; for (i = cnTerms - 1; i >= 0; ----i) { k = 2 * i + 1; fBufferInDouble = 1.0; for (j = k - 1; j >= 0; ----j) { fBufferInDouble *= doublex / (double)(j + 1); } if (i % 2 == 1) fBufferInDouble = -fBufferInDouble; fDescInDouble += fBufferInDouble; sw.WriteLine("D {0,0:d2} {1,0:f16}", i, fBufferInDouble); } sw.WriteLine(); fAscAlt = 0.0; for (i = 0; i < cnTerms; ++i) { k = 2 * i + 1; fBufferInDouble = 1.0; iBuffer = 1; for (j = 0; j < k; ++j) { fBufferInDouble *= doublex; iBuffer *= (j + 1); } if (i % 2 == 1) fBufferInDouble = -fBufferInDouble; fAscAlt += (fBufferInDouble / (double)iBuffer); sw.WriteLine("E {0,0:d2} {1,0:f16}", i, fBufferInDouble); } sw.WriteLine(); fDescAlt = 0.0; for (i = cnTerms - 1; i >= 0; ----i) { k = 2 * i + 1; fBufferInDouble = 1.0; iBuffer = 1; for (j = k - 1; j >= 0; ----j) { fBufferInDouble *= doublex; iBuffer *= (j + 1); } if (i % 2 == 1) fBufferInDouble = -fBufferInDouble; fDescAlt += (fBufferInDouble / (double)iBuffer); sw.WriteLine("F {0,0:d2} {1,0:f16}", i, fBufferInDouble); } sw.WriteLine(); sw.WriteLine("ASC : {0}", fAsc.ToString("f16")); sw.WriteLine("DESC : {0}", fDesc.ToString("f16")); sw.WriteLine("ASC x2: {0}", fAscInDouble.ToString("f16")); sw.WriteLine("DESC x2: {0}", fDescInDouble.ToString("f16")); sw.WriteLine("ASC ALT: {0}", fAscAlt.ToString("f16")); sw.WriteLine("DESC ALT: {0}", fDescAlt.ToString("f16")); sw.Close();

And this is the output:

A 00 3.14159300 A 01 -5.16771300 A 02 2.55016400 A 03 -0.59926460 A 04 0.08214591 A 05 -0.00737043 A 06 0.00046630 A 07 -0.00002192 A 08 0.00000080 B 08 0.00000080 B 07 -0.00002192 B 06 0.00046630 B 05 -0.00737043 B 04 0.08214591 B 03 -0.59926460 B 02 2.55016400 B 01 -5.16771300 B 00 3.14159300 C 00 3.1415926535897900 C 01 -5.1677127800499700 C 02 2.5501640398773500 C 03 -0.5992645293207920 C 04 0.0821458866111282 C 05 -0.0073704309457144 C 06 0.0004663028057676 C 07 -0.0000219153534478 C 08 0.0000007952054001 D 08 0.0000007952054001 D 07 -0.0000219153534478 D 06 0.0004663028057676 D 05 -0.0073704309457144 D 04 0.0821458866111282 D 03 -0.5992645293207920 D 02 2.5501640398773400 D 01 -5.1677127800499700 D 00 3.1415926535897900 E 00 3.1415926535897900 E 01 -31.0062766802998000 E 02 306.0196847852810000 E 03 -3020.2932277767900000 E 04 29809.0993334462000000 E 05 -294204.0179738900000000 E 06 2903677.2706132800000000 E 07 -28658145.9693880000000000 E 08 282844563.5865330000000000 F 08 282844563.5865330000000000 F 07 -28658145.9693880000000000 F 06 2903677.2706132800000000 F 05 -294204.0179738900000000 F 04 29809.0993334462000000 F 03 -3020.2932277767900000 F 02 306.0196847852810000 F 01 -31.0062766802998000 F 00 3.1415926535897900 ASC : -0.0000000102118100 DESC : 0.0000000000000000 ASC x2: 0.0000000224195107 DESC x2: 0.0000000224195102 ASC ALT: 0.0000000224195103 DESC ALT: 0.0000000224195102

I’ve printed the intermediate values for better comparison. And it looks beautiful that the section pairs appear symmetrical. So let’s see the first pair of results.

-0.0000000102118100 versus 0.0000000000000000

That don’t look very equal to me. I don’t know why we got zero for the second one (maybe the alternating sign code is wrong). I mean the correct answer *is* zero, just that it shouldn’t be from an approximation algorithm…

Anyway, the next pair looks promising:

0.0000000224195107 versus 0.0000000224195102

Except the last digit.

The last pair has different last digits too.

So what have we learnt? Use the correct variable type. And come up with a good algorithm. The most accurate variable type can’t save you if your algorithm is faulty. Change the cnTerms constant to something else, like 8 (which gives a more believable disparity for the first pair), to get a feel of the algorithm, code and results.

For your information, sin(PI) is zero, which is the value the algorithms are iteratively approaching.