Wednesday, September 06, 2017

Simulated Annealing for JPEG Quantization

I have a new paper on the arxiv (web page describing results, and full version** available) with two Harvard undergrads, Max Hopkins and Sebastian Wagner-Carrena (who may both be applying for grad school, and who are both great, hint, hint everybody...), titled "Simulated Annealing for JPEG Quantization", on finding better default JPEG quantization tables.   This work started as their class project for my graduate class on "Algorithms at the end of the wire" last year, and they stuck with it through the year and into the summer to hone it to a paper.  I'm always excited when class projects to turn into papers, especially when it's done by undergrads.

Here's the story of the paper.  To start, as the wikipedia page on JPEG says, in JPEG after you transform 8 by 8 pixel blocks into 64 coefficients in the frequency domain, you "quantize" the coefficients -- that is, divide each coefficient by a fixed number (with a different number for each coefficient), and then round.  This reduces the amount of information needed to represent the block, at the expense of some loss of fidelity.  In particular, high frequency coefficients are quantized with much higher numbers, so that most high frequency coefficients become 0, further improving available compression.  This is generally OK because reducing the higher frequency components has less of an effect on the human interpretation of the image than reducing the lower frequency components.  

There are "default" quantizations tables built into the JPEG standard.  Back when the powers that be developed the original JPEG quantization tables, they were essentially choosing them "by hand", with much less to go on today about models for the human visual system, and much less computing power available.  These days, we have better models, or better said actual metrics, of how "good" a compressed image is with respect to the original in terms of the human visual system, and a great deal more computing power.  This suggests we should be able to effectively explore the very large space of possible JPEG quantization matrices using heuristic techniques, such as simulated annealing.  This idea has been considered before, but we seem to be at a much better point in terms of metrics and computation now than with past attempts.   

Of course it's still not so easy (it is a 64-dimensional space, and the metrics have their own issues), and there was a good amount of work and testing to be done (and some insight to be gained).  But we found what we think are clearly some better "default" JPEG tables (that we've made available at the web page above), and because of the way JPEG works, you can add them in to your JPEG file to use in place of the "standard default".  So it's all backward-compatible, ready-to-use, no need to change JPEG, which is ubiquitous enough that actual changes are unlikely to happen.  Any experts out there, go ahead and try them out and let us know what you think.  Also, I think the work generally opens the door to others to aim for additional improvements, using different metrics and/or different heuristic techniques.

** The arxiv version is, strangely, not the full version.  Apparently, arxiv has "size limits", and because we're doing side-by-side comparison of many images, apparently we're above the size limit, and they wouldn't give us an exception.  I hadn't known that could happen on arxiv, something to keep in mind in the future.  

1 comment:

Anonymous said...

I've gotten an arXiv exception before. It's weird that they didn't give you one. I assume you asked?