# Binary Indexed Trees

## November 27, 2014

“A problem that seems difficult may have a simple, unexpected solution.” Unlike the advanced methods, the aha! insights of algorithms don’t come only after extensive study; they’re available to any programmer willing to think seriously before, during and after coding.

*Jon Bentley (quoting Martin Gardner), in the column Aha! Algorithms from his book Programming Pearls*

#### The Origin of BITs

The Binary Indexed Tree is hands down the most elegant data structure that I’ve had the pleasure of studying. It offers an embarrassingly simple yet highly efficient solution to the problem of maintaining cumulative object frequencies.

Imagine you’re supposed to keep track of the number of times each symbol appears in a long string, knowing the complete universe of possible symbols (or the *symbol alphabet*). Two operations are key - `UPDATE`

the frequency of a symbol as it is read from the string, and `QUERY`

the frequency of a symbol at any given time.

A third kind of operation becomes important when a meaningful order can be imposed on the symbol alphabet, like say if the alphabet was a list of numbers that can be sorted. Here, it might be useful to track the cumulative frequency or `COUNT`

of symbols less than a given symbol *x*.

The Binary Indexed Tree solves each of the `UPDATE`

, `QUERY`

and `COUNT`

problems in O(M) time, and using O(n) space where n is the cardinality of the symbol alphabet, and M is just the number of set bits (1s) in n’s binary representation!

BITs were first described in a 1994 paper titled A New Data Structure for Cumulative Frequency Tables by Peter Fenwick, and are also known by their eponymous name, Fenwick Trees.

#### The Aha! Insight

The unexpected search technique used by BITs can be better understood by drawing parallels with the way many people learn how to convert from binary to decimal notation. We use the following lookup table:

Bit position, i |
0 | 1 | 2 | ... | n |
---|---|---|---|---|---|

Value, v_{i} |
1 | 2 | 4 | ... | 2^{n} |

and the decimal representation is simply the sum of the values related with the positions of the set bits. For 13, for example:

13 = (1101)_{2} = v_{3} + v_{2} + v_{0} = 8 + 4 + 1

The Fenwick Tree is an array analogous to the lookup table above, where each of the array elements si, is a carefully computed partial frequency sum (a sub-frequency). If fi is the actual frequency of the ith element, and r is the position of the least significant bit in i (for example, r1 = 0, r2 = 1, r3 = 0, and so on), then

s_{i} = f_{i - 2r + 1} + f_{i - 2r + 2} + ... + f_{i}

#### Three key observations

- Note that the element frequencies that make up s
_{i}and s_{i - 2r}are disjoint sets. - Note also, that s
_{0}= f_{0} - Finally, note that any frequency f
_{j}appears in each of the sums s_{i}where j ≥ i - 2^{r}+ 1

Talking in terms of bits, i - 2^{r} can be computed by stripping away the least significant bit from i. For more on bit-twiddling, take a look at this useful reference guide.

#### Implementing `COUNT`

and `QUERY`

`COUNT`

The cumulative frequency up until the ith element, COUNT_{i} is defined as the sum of the actual frequencies of all elements less than or equal to the ith. COUNT_{i} can be calculated from the Fenwick Tree by observing that:

COUNT_{i}

= f_{0} + ... + f_{i - 1} + f_{i}

= f_{0} + ... + f_{i - 2r - 1} + f_{i - 2r} + (f_{i - 2r + 1} + f_{i - 2r + 2} + ... + f_{i})

= s_{0} + ... + s_{i - 2r} + s_{i}

In order to compute COUNT_{i}, we start with the sub-frequency at index i and at every step add a new sub-frequency after stripping away i’s least significant bit. This continues till i runs out of set bits, i.e., till i becomes equal to zero. For example,

COUNT_{13} = s_{13} + s_{12} + s_{8} + s_{0}

`QUERY`

Retrieving the actual frequency of the element at index *i* is as simple as calculating COUNT_{i} and COUNT_{i - 1} and computing the difference. Both `QUERY`

and `COUNT`

are thus equally efficient.

Never forget that `QUERY`

can be made even faster by identifying the longest common suffix in the binary representations for *i* and *i - 1*. (How does this help?)