Saturday, August 4, 2012

New Language For Image Processing is Halide

Halide is a new open source language designed specifically for image processing and computational photography. It not only makes it easy to implement photo algorithms, it also makes them run fast by semi-automatic parallelization.

Algorithms that work with images are ideal for parallel implementation because they usually work with small isolated blocks of data that means the task can be parallelized without worry about interactions. The only problem is that even converting something that is ripe for parallelization from serial code to something that runs on today's confusing architecture of CPU cores and GPUs is difficult.

Halide is a new functional programming language from MIT, (with help from Stanford and Adobe) that allows you to specify image processing algorithms, mostly block convolution methods, more easily and without having to worry about how the algorithm is implemented. A second section of the program then provides a general description of how the algorithm should be parallelized. It not only describes how the algorithm should be split up among computational elements but how to organize the data to keep the processing pipelines running at maximum efficiency by avoiding restarts.

The easiest way to understand the general idea is to see a simple example (taken from the paper):
Func halide_blur(Func in) f
 Func tmp, blurred;
  Var x, y, xi, yi;
  // The algorithm
  tmp(x, y) = (in(x-1, y) +
          in(x, y) + in(x+1, y))/3;
  blurred(x, y) = (tmp(x, y-1) +
          tmp(x, y) + tmp(x, y+1))/3;
  // The schedule
  blurred.tile(x, y, xi, yi, 256, 32)
        .vectorize(xi, 8).parallel(y);
  tmp.chunk(x).vectorize(x, 8);
 return blurred;

The first part of the program defines a simple 3x3 blur filter split into a blur horizontal followed by a blur vertical step. The last part of the program, the schedule specifies how the algorithm can be treated in a parallel implementation. The Schuyler is machine specific and has to be changed to get the best performance out of a particular processor pipeline.

No comments: