Thursday, January 13, 2011

Histogram Puzzle: A faster solution?

Here is the problem statement. The brute-force solution is O(N^2).

Here I outline a faster O(N log N) solution. There may be an even better way to do it, but usually, when I get to O(N log N), I am happy, and move on to other things.

Given a histogram, the first thing we do is locate any columns that have zero height. Such columns punctuate the overall histogram, and can be used to break a big problem, into a series of smaller ones (smaller 'N').

We then work with a (piece of the) histogram that has no zero height columns, such as the one illustrated before (and below), for concreteness.

Also, for clarity, I will perform some additional steps, which need not be explicitly implemented.
As before, we initialize,
maxRect = max{h(i)}     # max.Rect.area() = 9, here

Next, we quicksort the column heights. This is an O(N log N) operation. I also remember the position of a given column in the original histogram. In the illustration above, I get a sorted array which looks something like:

1  7
2  1
2  5
3  6
3  2
3  3
4  8
9  4

where the first column represents h(i), and the second column denotes the location in the original histogram.

We look at the first row with height 1, and here's the key insight.
The biggest rectangle that can be constructed in which this column (h=1) participates is 1 * N = 8. Since this is smaller than MaxRect.area() (=9), this column is not part of the solution.

So we throw it away.

But we throw it away in a unique fashion.

We delete the magnitude of this column from every other column. This is what happens to the remaining histogram.
We have now introduced additional zero height columns. We use this to break our problem further, and examine the pieces punctuated by the column that just disappeared.

One of those pieces (column N has height 4) is useless, and we throw it away, and focus on the other piece.

Rinse, dry, repeat.

The rest of the strategy should be reasonably clear, from this point on.

During each step, we decide whether a (many) column(s) can be part of the biggest rectangle. This is O(N). In practice, this may be even faster (?), because we break the big problem into smaller problems. The "N" is not the same.

No comments: