29 September, 2008 | Written by Vincent 4 Comments

Linear and cubic interpolation

Interpolation is a method of calculating a value from a set of given values. We’ll be looking at interpolation with a bias towards image processing, but the theory can be generalised for other purposes. You’ve probably already solved some interpolation problems without knowing it. Let me give you an example.

A distance problem

Suppose there are 3 towns A, B, C and they happen to lie on a straight line, in that order. B is 5 kilometres away from A, and C is 15 kilometres away from A. If you travel one quarter of the way from town B to town C, how far are you from town A?

Distance problem

To solve it, you can figure out the distance between B and C, which is 15 – 5 = 10 km. One quarter of the way means 1/4 * 10 = 2.5 km. Then add the distance between A and B to this and you have 5 + 2.5 = 7.5 km.

Linear interpolation

If you visualise the problem as interpolating between 2 points, then B becomes the point p0 with a value of 5 (km) and C becomes the point p1 with a value of 15 (km). The usual variable used is t, so the generic formula is:
f(t) = (1 – t) * p0 + t * p1, where t lies between 0 and 1 inclusive.

Using this, we have
f(1/4) = (1 – 1/4) * 5 + 1/4 * 15
= 3/4 * 5 + 1/4 * 15
= 7.5

This is linear interpolation. Linearity refers to the power of the variable t, which is 1. Note that there’s no stopping you from using negative values of t or values greater than 1.

Suppose you travelled from B to A one quarter of the distance between B and C. How far are you from town A?
f(-1/4) = (1 – (-1/4)) * 5 + (-1/4) * 15
= 5/4 * 5 – 1/4 * 15
= 2.5

Suppose you travelled from B to C and went past C by a quarter of the way. How far are you from town A?
f(5/4) = (1 – 5/4) * 5 + 5/4 * 15
= -1/4 * 5 + 5/4 * 15
= 17.5

What happens if you get a negative result?
f(-1) = (1 – (-1)) * 5 + (-1) * 15
= 2 * 5 – 15
= -5

It means you’re 5 kilometres away from town A. You’re just in the opposite direction from towns B and C. The calculation result is correct. It’s how you interpret the value.

Applications in image processing

A common operation in image processing is manipulating height maps. Height maps are usually greyscale bitmap files where a white pixel (RGB values are 255 for all 3) is the highest point, and a black pixel (RGB values are 0 for all 3) is the lowest point.

Terrain editor in Bryce

You know enlarging photographs can give you some weird results. What happens is you’re trying to fill in the blanks in a larger image using values from the original image. Where do you think the image editing software comes up with values? Interpolation.

If you think of the red, green and blue values of image pixels as 3 different “height maps”, then you’re just performing interpolation on 3 values. Suppose we’re talking about linear interpolation between two pixels. You’ll interpolate between the red component of the 2 pixels and get a value. Similarly you do it for the green and blue components. The calculated results of the red, green and blue become the interpolated colour.

Cubic Bezier interpolation

There are all kinds of cubic curves available. The Catmull–Rom spline, the non-uniform rational B-spline (NURBS) and I didn’t really want to write anything on the subject after I remember my Hermite splines… I love Bezier curves though, so I thought maybe I can write something with that.

Instead of 2 points used in linear interpolation, cubic interpolation uses 4 points. To illustrate, suppose you’re on an undulating plain with small hills undulating in their usual carefree manner. You’re in between two such (undulating) hills and you want to find out how high you are.

Instead of linear interpolating your way through these two (undulating) hills, the better way will be to interpolate with another 2 (undulating) hills! Ok, I’m stopping with the undulating thing…

Undulating hill interpolation

The Bezier curve equation looks like this:
B(t) = (1-t)^3 * p0 + 3*(1-t)^2 * t * p1 + 3*(1-t)* t^2 * p2 + t^3 * p3
where p0, p1, p2, p3 are the (height) values, and t lies between 0 and 1 inclusive.

You will be between p1 and p2. Let’s also assume that the hills are equidistant from each other. Like the pixels on an image, the hills shall be of equal distance from its neighbour.

Because of this equidistant property, p1 is 0.33 (roughly 1/3) units away from p0, p2 is 0.67 (roughly 2/3) units away from p0 and p3 is 1 unit away from p0.

How do you know what’s the value of t to use? You might be able to calculate the t if you do linear interpolation between p1 and p2. But that t value is different from the t value in the Bezier curve.

Ahhh… once you get the t-linear value, you interpolate with 0.33 and 0.67 to get the t-Bezier value. Confused? Suppose you’re one quarter way from p1 to p2. Your t-linear value is 1/4. Interpolate that with 0.33 and 0.67 to get
f(1/4) = (1 – 1/4) * 0.33 + 1/4 * 0.67
= 0.415

And 0.415 is your t-Bezier value. Voila!

You skipped the quadratic power!

I know. It’s logical to think that there’s a power 2 somewhere. But there isn’t. There is one fundamental flaw with quadratic interpolation. Which segment do you use?

Quadratic interpolation has issues...

In closing

Interpolation is just a method of creating data values using a set of existing data. What those created values mean is up to you to interpret.

In image processing, interpolation can be used to fill in blanks when enlarging an image. It doesn’t guarantee that the enlarged image looks good. Image processing is very much an aesthetic-based operation. I’ll talk a bit more on this when I get to writing code to rotate images.

26 September, 2008 | Written by Vincent Comments Off

Solving the Resident Evil Zero game math puzzle

Previously,

On the screen is shown a “00/81″. A number pad with number keys 1 to 9 is on the right. When a number key is entered, a knob is lit in red and the value of the number is added to the numerator of the displayed ratio. There are 10 unlit knobs.

The puzzle should be obvious. Match the numerator and the denominator with 10 entries.

So how would I solve it? My hint as stated, was:

the last entry is the most important

The first 9 entries are to get you as close to the required number as possible. This is actually similar to blackjack, a card game where you try to get as close to 21 points as possible without going over. The difference is that you don’t get to choose your next number (or card value) in blackjack. You do here.

Blackjack
[image by pavlen]

So my first instinctual thought was to finish up the first 9 entries to get to the last entry. My second instinctual thought was to make these 9 entries all the same number value. It’s a timed effort. I don’t have time to go figure out different combinations of sums.

Based on those thoughts, my next (instinctual) step was to divide the required number by 9, rounding down to nearest integer. Don’t ask me why at this point, because I’ll know the reason only after thinking it through, and I’m not thinking it through at this point.

So we have 81 divide by 9, and rounded down, we get 9. We have a problem. It’s not just 9. It’s exactly 9. There’s no number value 0 for the final entry. So we use 8 (1 less than 9).

Then we multiply 8 by 9 (entries) to get 72. And then 81 (required number) – 72 is 9, the final entry. So the answer is 8,8,8,8,8,8,8,8,8,9.

This method gives the simplest of combinations (only 2 distinct numbers used) and is the fastest to get you to the end point, the final entry. It’s the final entry that “corrects” the sum to the required number.

I have to say, the changing of 9 to 8 is (I’m starting to use the word generously) instinctive. I don’t know how changing 9 to 8 would solve the problem. I just know.

Let’s use this method on the second part of the puzzle: 67.

67 divide by 9 rounded down gives 7. 7 multiplied by 9 is 63. 67 – 63 is 4. So the answer is 7,7,7,7,7,7,7,7,7,4.

There you have it, the algorithm to solving the math puzzle. The second puzzle uses the “normal” mode of the algorithm. The first puzzle tested the edge case of the algorithm.

Hmm… can we write a function that spits out the combination using the above algorithm? Can we write a function that spits out all possible combinations?

25 September, 2008 | Written by Vincent 1 Comment

Messy indices on the right please

In case you don’t know, indices are the plural form of index. Wait, isn’t the plural form of index, indexes? Well, indexes is the plural form, if you’re talking about database table indexes (I’m confused at first too). I’m referring to the offset of an array, the index of an array.

So what do I mean by messy indices on the right? Variable assignment. I mentioned something of this in the matrix rotation article, and I want to talk more on it here. And I will tell you why I prefer them on the right side of the assignment. First, I need to mention raytracing.

Tracing a light ray from the eye back to the scene

In 3D graphics, a scene is rendered and you need to know what colour each resulting pixel is. You can throw light rays everywhere, let them hit an object, bounce off that object, and continue bouncing and if that ray happen to shoot through to the camera, render it. That doesn’t sound very efficient, does it?

There are millions of light rays to shoot from a light source. They can bounce off objects in any number of ways. And only a small percentage of them will ever enter the camera.

Raytracing simplified

What you can do is, start from the camera. Trace a ray from the camera through the resulting pixel, back out into the scene and see what it hits. If it doesn’t hit an object, it will travel all the way into the sky or space. Render accordingly. If it does hit an object, continue to track the bounced (or reflected) ray until it hits nothing. Or hit a light source. Either way, you know how to render the colour. It’s a bit more involved than that, and I’m simplifying the explanation.

Basically you start from the end.

Rotating the square matrix

Let’s revisit that rotating-the-matrix problem.

For Y = 0 to 3
    For X = 0 to 3
        Destination(3-Y,X) = Source(X,Y)
    Next
Next

Essentially, the author concentrated on the source, iterating through it with appropriate indices and assigning those values to the destination. See how “messy” the indices on left side are? I call it, the “throw values over” assignment:

Throwing values over

Then there’s the raytracing way, the “finding values” way:

Finding values assignment

There doesn’t seem to be any difference between the two. Until I tell you about rotating images…

Rotating an image 30 degrees clockwise

Suppose you want to rotate an image in non-right-angled degrees, say 30 degrees clockwise.

Rotated image outside rectangle

We can discuss the code to rotate the image some other time. Hmm… new article fodder…

Anyway, you can start from the source image, iterating through each pixel and “throw values” to the destination image. Or you can start iterating the destination image and “find values” from the source image. Each method requires you to check the boundaries of the image (index must be within width and height).

If both seem equivalent, what’s the problem? Image quality.

You see, with the algorithm involved, there’s the possibility that you’d throw different pixels from the source image onto the same pixel on the destination image. You’d be doing sine’s and cosine’s and rounding the calculation into an integer so you can map it to an index. This isn’t good, since you can override a previous calculation.

With the “find value” way, you have a similar problem. There’s the possibility that for different pixels on the destination image, you’d map back to the same pixel on the source image. However, you don’t have to use the exact colour of that source pixel. You can use interpolation to get a sort of “average” colour from the surrounding pixels.

You know, I should write something on the image rotation part. Alright, coming up soon, an image processing article focused on image rotation!

And that’s the reason why…

I prefer the messy indices on the right side of the assignment. Because I start from the end (or destination), the iteration indices are “clean” on the left side.

And now, homework! See if you can come up with code (or algorithm) to rotate an image any number of degrees, assuming the rotation is about the image’s centre. You can make it easier by rotating about the top-left corner. And you have to iterate through the pixels, not use some in-built function to manipulate the image. You can also make the image black and white, so you focus on grey values instead of an RGB triplet, to make the code simpler to write.

If you’re adventurous, and want to try the interpolation thingy to make the resulting image look better, you can research on linear interpolation and cubic splines.

Next Page →