## Ancient Egyptians and Chinese got math simple

And here I thought bit shifts were already very simple to understand.

I was actually hoping that the video would shed some light on the Udja Eye thing, but it didn’t… oh well.

## Help: Systolic arrays and matrix multiplication

Commenter apit had given me a problem which I am unable to help with: Matrix multiplication using systolic arrays.

I searched and I came up with very little. About the only thing I understood was that the “array” was not the variable array in code. It’s an array of processors. Systolic arrays are used for parallel processing.

So how is it used for matrix multiplication?

Hmm… I’ve never heard of systolic arrays before this. So I asked my friend and he found a PowerPoint presentation that gave the best explanation that I could understand.

Systolic arrays & their applications, by Jonathan Break.

I believe apit didn’t understand that each “array cell” don’t have past memory of operations, only an intermediate result.

If you happen to understand it, and can explain it better than I do, please chime in with your comment. Thanks.

## Multiplications, additions and bit shifts

In a previous article, I asked what this does

```int foo(int para)
{
int temp = para << 3;
return temp + temp + temp;
}
```

The function basically returns a value of the parameter multiplied by 24 (thanks Mark!).

So why was it written that way? Speed.

From what I understand, in the olden days, multiplication was slower than addition and bit shift operations. I don't know how slow, but it was slow. It was so slow that programmers (particularly game programmers) started replacing multiplication operations with a combination of additions and bit shifts.

Our example above with 1 bit shift and 2 additions was faster than just 1 multiplication operation. Hmm... that example may not be the best to use... Let me give another one.

The idea is to split the multiplication into additions first. So
i * 80
becomes
i * (64 + 16)
which becomes
(i * 64) + (i * 16)
which becomes
(i << 6) + (i << 4) where we swap multiplications with powers of 2 to bit shifts. See, you do need some math in game programming.

Bit shifts are faster than additions, which are faster than multiplications. The speed difference obtained from rewriting the code was enough for some game programmers to adopt for use.

With modern computer chips, multiplications are just as fast as additions and bit shifts. So I did some comparison. Before I show you the code, there are 2 warnings:

• The code segments aren't supposed to be tested on their own. The original speed improvements made sense when combined with their surrounding code.
• Measuring something changes it, so you're not measuring the true value (some principle from physics. If you can find me a reference, I'd appreciate it).

I will present 4 different ways of multiplying by 24 using 4 functions. The first is the one posed in the previous post:

```int foo(int para)
{
int temp = para << 3;
return temp + temp + temp;
}
```

The second function uses multiplication.

```int goo(int para)
{
int temp = para;
return temp * 24;
}
```

I didn't really have to declare the variable `temp`. I did it so all 4 functions declare a variable. Even variable declaration takes time, minute as it is, so I had to take that into account and standardise it across the 4 functions. This is the part where "measuring something changes it"...

The third function uses 2 bit shifts and 1 addition.

```int hoo(int para)
{
int temp = para;
return (temp << 4) + (temp << 3);
}
```

Convince yourself that the return value is a multiplication of the parameter by 24.

The fourth function is a modification of the first function.

```int loo(int para)
{
int temp = para << 3;
return (temp << 1) + temp;
}
```

If bit shifts are faster than additions, `temp << 1` is faster than `temp + temp`, right? The use of temporary variables to hold intermediate values for another calculation is another technique used in game programming. This sounds a little vague, so I'm publishing another article to explain this further.

And here's the timing code

```const int cnMax = 500000000;
const int cnLoops = 20;
const int cnValue = 5;
DateTime dtStart = DateTime.Now;
DateTime dtEnd = DateTime.Now;
TimeSpan ts;
int i, j, temp = 0;

temp = 0;
dtStart = DateTime.Now;
for (i = 0; i < cnLoops; ++i)
{
for (j = 0; j < cnMax; ++j)
{
temp = foo(cnValue);
}
}
dtEnd = DateTime.Now;
ts = dtEnd - dtStart;
Console.WriteLine(ts.TotalSeconds);
Console.WriteLine(temp);

temp = 0;
dtStart = DateTime.Now;
for (i = 0; i < cnLoops; ++i)
{
for (j = 0; j < cnMax; ++j)
{
temp = goo(cnValue);
}
}
dtEnd = DateTime.Now;
ts = dtEnd - dtStart;
Console.WriteLine(ts.TotalSeconds);
Console.WriteLine(temp);

temp = 0;
dtStart = DateTime.Now;
for (i = 0; i < cnLoops; ++i)
{
for (j = 0; j < cnMax; ++j)
{
temp = hoo(cnValue);
}
}
dtEnd = DateTime.Now;
ts = dtEnd - dtStart;
Console.WriteLine(ts.TotalSeconds);
Console.WriteLine(temp);

temp = 0;
dtStart = DateTime.Now;
for (i = 0; i < cnLoops; ++i)
{
for (j = 0; j < cnMax; ++j)
{
temp = loo(cnValue);
}
}
dtEnd = DateTime.Now;
ts = dtEnd - dtStart;
Console.WriteLine(ts.TotalSeconds);
Console.WriteLine(temp);
```

I've initialised `temp` to 0 just before the test loops, and printed its value after the loops to check that the calculations give correct results. The loops are coded with a standard structure, and the only difference is the function used. All 4 functions are coded in as similar a structure as possible, to isolate only the method of calculation.

I've used nested `for` loops to increase the number of iterations. One loop can only go as high as 2^31 iterations (unless I use `long`). Nested loops increase this limit by doing a lower iteration limit multiple times.

And the conclusion? Multiplications are now just as fast. I didn't get consistent results to rank the methods in order of speed though. Generally speaking, the 2-shift-1-add method (3rd function) is faster than multiply-by-24 method (2nd function).

That's it. As an exercise, you might want to come up with better code to time them. Follow some simple rules:

• The results must be repeatable
• Tests must be standardised

A suggestion? Perhaps you can add more code to the loop structure (or the functions). So long as all 4 loops (or function code) are similar, you don't need a minimalistic approach. This allows you to increase the percentage of identical parts between the tests, and thus highlight the difference in only the calculation method (let me know if you're confused by this).

Have fun!