Matching textures with heuristics

I was in my 4th year in university. The course was on digital image processing, touching on both theory and application in equal measure. There were only 3 students, including me.

The course was interesting, albeit mind-numbing when some of the equations march into the lecture. The programming assignments were more fun, since we got to apply the theories. One of them was a rotating-an-image assignment, which formed the basis of my bilinear interpolation code. That was fun.

There’s this assignment where the professor gave us a set of texture images as samples. I can’t remember how many there were, so let’s say there were 200 of them. Then he gave us, say, 50 images. The assignment was to match those 50 images with the controlled set of textures. All textures were greyscale to simplify the assignment.

The 50 unknowns didn’t match pixel for pixel with the controlled samples. But they were of the same texture. For example, the controlled samples had one of marbled floor. One of the unknown images was taken with that marble floor, but in a different position. Of course, the professor could have given us red herrings to match, but he said all 50 were taken from the sample set.

Then there’s the fact that he wanted to play with his new camera back then (he admitted to it), and took lots of pictures to give us as assignments… There was an assignment with a picture of a rubber ducky…

I can’t remember exactly all the tests I used to match the textures. What I did was come up with a theory/test, and compute that test for all the samples. Then I did the same thing for the unknown textures. Then I match the unknowns with the knowns. If they were within some threshold of acceptance, that unknown texture was deemed matched to the respective sample texture.

Basically, I’m matching the textures using heuristics.

One of the tests used histograms. Basically I charted from 0 to 255 the number of pixels with a specific greyscale value. Pure white pixels have a value of 255, and pure blacks have 0 value. Then I matched the unknowns with the samples using mean squared error. If the sample matched with the least error was less than some threshold I set, then that sample was the matched texture.

I had another test involving Fast Fourier Transforms (FFT). I think I discarded the complex values and matched the unknowns using the real values part.

There’s another test involving median filtering. The idea was to capture the groups of neighbouring pixels as some usable data. So instead of a 128 by 128 pixel sample, I reduced it to a 16 by 16 matrix. You know, this one’s a bit iffy… I can’t remember whether I actually did it, or I just came up with it writing this…

Anyway, there’s a test to capture “pattern” data. The histogram test involves all pixels. The median filter test (if I actually did one) cluster pixel information in groups. Let me see if I can explain this better…

Marble texture

In the image above, the top right corner has more black swirly thingies close together than other parts of the image. The histogram test cannot detect that the top right corner has more black. It can only detect how much black in total there is in the image. Positional information is lost. Hence the need for a pattern test.

The histogram test is objective. Test results are verifiable and repeatable. However, matching the unknown textures require that I set a threshold. This is where the tests become subjective. Who’s to say a particular threshold value is more accurate than another?

In the end, I think I had 5 or 6 tests, and gotten a 94 (or was it 96?) percent accuracy. I was tweaking my threshold values so I could yield higher accuracy rates. See how subjective those tests of mine were? *smile*

The programming language of choice was MATLAB (yes, Will?), as dictated by the professor. So everything was coded in MATLAB. Which was good, because I’d hate to implement FFT on my own…

There’s something else too. I weighted those test results. Say test A was supposedly more accurate than test B. Then I gave the results of test A more weight in my final calculation. Thus, roughly speaking, if 3 tests out of 6 say texture A was the one, then that’s the one. It could also mean 2 tests had more sway if both carried high weights, and the other 4 tests weren’t conclusive enough.

One of my classmates got higher accuracy rates (97 or 98 percent) than I did, no matter how much I tweaked threshold values and weights, no matter how many kinds of tests I added (or took out).

But here’s the thing, and I want you to note this. Given a larger sample size, and a different set of unknown textures to match, my set of tests might actually yield better results than those of that irritatingly smug classmate of mine.

Here’s another takeaway. No one test can conclusively confirm and match the unknowns (even with some error margin). It took a few tests working in concert to obtain a relatively high accuracy rate. Think about that.

Digital image processing is mostly for aesthetics

I intended to write an explanation of some digital image processing methods, followed by the point of this article’s existence (see title). Since I can’t remember much about the topic already, I rummaged through my pile of stuff to find my university textbook. Found it, and skimmed over the content pages to see what I needed to research on.

I am ashamed to say, I have forgotten a lot about the topic already… This is hard! There’s graphs, functions, formulas, integration, summations and matrices.

The book is “Digital Image Processing” (second edition) by Rafael C. Gonzalez and Richard E. Woods, published by Prentice Hall. Unfortunately, it’s not for sale in USA, Mexico or Canada, so if you’re from those countries, sorry. But it’s really good.

[UPDATE: Commentor Will has pointed out that Gonzalez’s and Woods’ book is in fact available in the United States. Go get it.]

Anyway, I was planning on stunning you with my brilliance on histograms, median filtering, Gaussian blurs, pixel neighbours and the like. Sadly, I’m woefully ill-equipped to do that… At least, not without doing some serious reading and research first. Well, I can’t do that and still write about it in one night.

But I want to tell you about something my professor said. There are many things we do to process images. We apply median filtering to remove noise. We do some operation to sharpen images and blur them. Images can be modified to black and white, or set to a faded tone to simulate old photographs.

There are other kinds of information we can extract after processing images, such as detection of edges, shapes and even faces. But as far as I can tell, they are mostly subjective. Subjective because it is up to us, humans, who finally dictate whether the final image is what’s required.

For example, noise reduction. Simply put, image noise is the small specks of “something” in the image. They could be caused by an inferior camera, or simply because of the environment (such as in space images). Median filtering means for each pixel, check out the neighbouring pixels, and if they are mostly of one colour, then that’s the “correct” colour to be.

But there are different strengths when applying the median filter. The software cannot tell you how strong the filter should be. It can guess, it can have default values, it can apply the strength most commonly used. But it’s ultimately up to the user to decide when enough is enough. Perhaps at the highest strength, the algorithm produces an image that’s somehow pleasing (or useful) to the user, even if that result isn’t what the user wanted originally.

And this was what my professor said. Image processing is mostly about aesthetics, digital or analog. Digital computations just make it faster and easier to play around with the images. You can physically process images to produce sepia tones. Or you could do it digitally.

I’m introducing the concept of image processing because I have another story to tell you. And it’ll make more sense if you know about image processing first.

Have you written code to do image processing before? Please share your experience with your comments.

Linear and cubic interpolation

Interpolation is a method of calculating a value from a set of given values. We’ll be looking at interpolation with a bias towards image processing, but the theory can be generalised for other purposes. You’ve probably already solved some interpolation problems without knowing it. Let me give you an example.

A distance problem

Suppose there are 3 towns A, B, C and they happen to lie on a straight line, in that order. B is 5 kilometres away from A, and C is 15 kilometres away from A. If you travel one quarter of the way from town B to town C, how far are you from town A?

Distance problem

To solve it, you can figure out the distance between B and C, which is 15 – 5 = 10 km. One quarter of the way means 1/4 * 10 = 2.5 km. Then add the distance between A and B to this and you have 5 + 2.5 = 7.5 km.

Linear interpolation

If you visualise the problem as interpolating between 2 points, then B becomes the point p0 with a value of 5 (km) and C becomes the point p1 with a value of 15 (km). The usual variable used is t, so the generic formula is:
f(t) = (1 – t) * p0 + t * p1, where t lies between 0 and 1 inclusive.

Using this, we have
f(1/4) = (1 – 1/4) * 5 + 1/4 * 15
= 3/4 * 5 + 1/4 * 15
= 7.5

This is linear interpolation. Linearity refers to the power of the variable t, which is 1. Note that there’s no stopping you from using negative values of t or values greater than 1.

Suppose you travelled from B to A one quarter of the distance between B and C. How far are you from town A?
f(-1/4) = (1 – (-1/4)) * 5 + (-1/4) * 15
= 5/4 * 5 – 1/4 * 15
= 2.5

Suppose you travelled from B to C and went past C by a quarter of the way. How far are you from town A?
f(5/4) = (1 – 5/4) * 5 + 5/4 * 15
= -1/4 * 5 + 5/4 * 15
= 17.5

What happens if you get a negative result?
f(-1) = (1 – (-1)) * 5 + (-1) * 15
= 2 * 5 – 15
= -5

It means you’re 5 kilometres away from town A. You’re just in the opposite direction from towns B and C. The calculation result is correct. It’s how you interpret the value.

Applications in image processing

A common operation in image processing is manipulating height maps. Height maps are usually greyscale bitmap files where a white pixel (RGB values are 255 for all 3) is the highest point, and a black pixel (RGB values are 0 for all 3) is the lowest point.

Terrain editor in Bryce

You know enlarging photographs can give you some weird results. What happens is you’re trying to fill in the blanks in a larger image using values from the original image. Where do you think the image editing software comes up with values? Interpolation.

If you think of the red, green and blue values of image pixels as 3 different “height maps”, then you’re just performing interpolation on 3 values. Suppose we’re talking about linear interpolation between two pixels. You’ll interpolate between the red component of the 2 pixels and get a value. Similarly you do it for the green and blue components. The calculated results of the red, green and blue become the interpolated colour.

Cubic Bezier interpolation

There are all kinds of cubic curves available. The Catmull–Rom spline, the non-uniform rational B-spline (NURBS) and I didn’t really want to write anything on the subject after I remember my Hermite splines… I love Bezier curves though, so I thought maybe I can write something with that.

Instead of 2 points used in linear interpolation, cubic interpolation uses 4 points. To illustrate, suppose you’re on an undulating plain with small hills undulating in their usual carefree manner. You’re in between two such (undulating) hills and you want to find out how high you are.

Instead of linear interpolating your way through these two (undulating) hills, the better way will be to interpolate with another 2 (undulating) hills! Ok, I’m stopping with the undulating thing…

Undulating hill interpolation

The Bezier curve equation looks like this:
B(t) = (1-t)^3 * p0 + 3*(1-t)^2 * t * p1 + 3*(1-t)* t^2 * p2 + t^3 * p3
where p0, p1, p2, p3 are the (height) values, and t lies between 0 and 1 inclusive.

You will be between p1 and p2. Let’s also assume that the hills are equidistant from each other. Like the pixels on an image, the hills shall be of equal distance from its neighbour.

Because of this equidistant property, p1 is 0.33 (roughly 1/3) units away from p0, p2 is 0.67 (roughly 2/3) units away from p0 and p3 is 1 unit away from p0.

How do you know what’s the value of t to use? You might be able to calculate the t if you do linear interpolation between p1 and p2. But that t value is different from the t value in the Bezier curve.

Ahhh… once you get the t-linear value, you interpolate with 0.33 and 0.67 to get the t-Bezier value. Confused? Suppose you’re one quarter way from p1 to p2. Your t-linear value is 1/4. Interpolate that with 0.33 and 0.67 to get
f(1/4) = (1 – 1/4) * 0.33 + 1/4 * 0.67
= 0.415

And 0.415 is your t-Bezier value. Voila!

You skipped the quadratic power!

I know. It’s logical to think that there’s a power 2 somewhere. But there isn’t. There is one fundamental flaw with quadratic interpolation. Which segment do you use?

Quadratic interpolation has issues...

In closing

Interpolation is just a method of creating data values using a set of existing data. What those created values mean is up to you to interpret.

In image processing, interpolation can be used to fill in blanks when enlarging an image. It doesn’t guarantee that the enlarged image looks good. Image processing is very much an aesthetic-based operation. I’ll talk a bit more on this when I get to writing code to rotate images.