So the last time, I told you I came up with my own video format so I could load videos into my game engine. Today, I’m going to tell you what it is.
The concept isn’t revolutionary. Image compression is the art of expressing an image in as small a form as possible. There are 2 versions: lossy and lossless. Lossy compression means some data is lost, meaning the file size can be smaller, but overall you won’t notice the difference (much). The jpeg format uses this. Lossless compression doesn’t lose any data at all. The png format uses a lossless compression.
I don’t know much about video compression. However, a video is a series of images. So what I did was instead of focusing on image-by-image compression, I adopted time-based compression.
For each pixel position, I try to come up with an algorithm (or specifically a maths formula) that will represent all the pixels in that pixel position throughout the entire video.
Let’s say we use the top-left pixel of the video. And let’s say that pixel is pure white for half of the video, and black for the other half of the video. My aim is to represent this “pure white 1st half, pure black 2nd half” information in 1 formula.
With this in mind, let’s say I’ve got a maths formula somehow. Let’s say it’s a 640 by 480 pixel video (it was considered good resolution back then. I know we have HD now…). Let’s say there are 3 variables in that maths formula. This means, I have to store 3 * 640 * 480 variables in my movie format.
Yes, that’s the compression tactic.
Well, more specifically, I had to come up with 3 * 640 * 480 maths formulas that compresses itself as efficiently as possible. With some research, I found that the less the variables vary (hahahah…), the better the compression. This means the colour space I use is important. RGB values change too much.
And with some more research, I found the YUV colour space ideal for use. I think what I actually did was count the number of consecutive values and store that. The YUV space has a “lightness” component, which for most videos, doesn’t change very often.
This means in the ideal case, I store “N of X” for the “lightness” component, where N is the number of times the value X occurs for the “lightness” component of that pixel position. And in the ideal case, N would be approximately the frame rate multiplied by the number of seconds.
So for each pixel position, I store a series of counts of the Y component (could be 10, 50 or ideally just 1). Then I store a series of counts of the U component. And similarly for the V component.
Then I move on to the next pixel.
The idea is that the longer the video, the “better” the compression. It’s not that the compression tactic is better. It’s just that the tactic takes advantage of the fact that certain colour components don’t change much. If a pixel is always white, it doesn’t matter if the video is 1 second long or 1 hour long, we just store “30 white” and “3600 white” respectively (assuming 30 frames per second).
This works great for screencast videos because the screen typically doesn’t change much. It’s usually just the mouse moving around and clicking stuff and sometimes the screen changes. This means most of the screen stays the same. And if you’re like me, you’ve probably watched some screencast videos where nothing on the screen happens for minutes. It’s just the person talking.
Of course, I fudged the actual values a bit so they can compress better. Meaning it’s a lossy compression. But in the end, it was comparable to a moderately compressed QuickTime movie file, so I was ok with that.
But even if the compressed size is still larger than other movie file types, at least it’s not copyrighted. So I can use that. And if you want to try this compression method, go ahead and use it (you have my permission). Let me know how it works out for you.