Sep 2, 2022 13 min read bitmaps

A primer on Roaring bitmaps: what they are and how they work

Unfortunately not what this post is about. Credit: katiebcartoons.com

I came across Roaring bitmaps when I learned about this fun hack to do retention analyses at scale with bitmaps. Using Roaring bitmaps instead of traditional bitmaps in that application reduced memory usage from ~125GB to 300MB, an impressive 99.8% savings.

But... how?

You can learn about Roaring bitmaps over two research papers:

In this post I briefly describe what bitmaps are, what they're used for, and what Roaring bitmaps solve that traditional bitmaps don't. Then, I distill the high-level structure of Roaring bitmaps and how they work, one step at a time.

Roaring bitmaps employ a number of algorithms, techniques, and heuristics that I won't go into in detail. They also offer some operations beyond the ones I describe. These details are not critical to understanding the basic internal structure and operation of Roaring bitmaps which is the focus of this post.

Let's begin!

What are bitmaps and what are they used for?

Bitmaps are arrays of bits used to store sets of integers.

They work by setting the Nth bit when an integer N is in the set, as illustrated in Figure 1.

**Figure 1.** An illustration of how a bitmap works.

By storing sets of integers this way, bitmaps can take advantage of extremely fast bitwise-AND and bitwise-OR CPU instructions to compute set intersections and unions.

It turns out that fast set intersections and unions are critical for many search and database applications. Various operations exist in search and database indexes that boil down to having two sets of integers and needing to intersect or union them quickly.

Take an inverted search index, for example:

You've indexed billions of documents. Each document has an integer id.
The index maps terms to a set of documents in which they appear. For example, the term pigeon appears in documents with these ids: {2, 345, 2034, ...}.
Queries that search across terms use set operations. In order to resolve a search query like carrier AND pigeon you need the intersection of the set of documents that contain carrier and the set of documents that contain pigeon.
Bitwise operations can perform these set operations quickly. If you represent sets of document ids as bitmaps, the query above is a bitwise-AND.

Columnar databases use set operations similarly for certain classes of queries.

If you'd like to dive into a use case outside search and databases, a previous post I wrote discusses how bitmaps can be used to analyze user retention for SaaS products.

Unfortunately, bitmaps suffer from awful compression in common cases involving very large sets of integers – scenarios that appear frequently in the use cases I just described.

Recall the figure I cited at the beginning of the post:

Using Roaring bitmaps instead of traditional bitmaps in that application reduced memory usage from ~125GB to 300MB, an impressive 99.8% savings.

That's the problem with bitmaps, and I'll walk you through why this happens shortly.

What are Roaring bitmaps?

From roaringbitmap.org:

Roaring bitmaps are compressed bitmaps. They can be hundreds of times faster.

A great, pithy marketing statement on the Roaring bitmap website! Let's expand on it a wee bit.

Roaring bitmaps are just optimized bitmaps, which I'll refer to henceforth as "traditional bitmaps".

Both traditional and Roaring bitmaps offer a set data structure for integers. You can insert integers, check the existence of an integer, and get the intersection and union of two sets of integers.

Roaring bitmaps offer better compression than traditional bitmaps. Importantly, they do so without significantly sacrificing the performance of set operations.

roaringbitmap.org boasts an impressive list of OLAP databases and search systems that use Roaring bitmaps under the hood. These are all applications that:

... need to store large sets of integers
... in as little memory as possible
... and execute fast set operations.

What problem do Roaring bitmaps solve that traditional bitmaps don't?

When a set is sparse, traditional bitmaps compress poorly.

Recall that traditional bitmaps will set the Nth bit when you add an integer N to it (see Figure 1).

Let's say you have an empty traditional bitmap to which you add the integer 8,000,000. Here's what will happen:

It will allocate 1,000,000 bytes.
It will set the 8,000,000th bit.

This is illustrated in Figure 2.

**Figure 2.** What happens if you allocate the 8 millionth bit outright in an empty bitmap.

Why is this bad?

Your set has 1 integer.
An integer takes up 4 bytes.
Your traditional bitmap has allocated 1 megabyte.

That's 6 orders of magnitude more memory than you need.

Whoops!

Roaring bitmaps solve this problem. Importantly, they do so while maintaining fast set operations. This is what makes Roaring bitmaps special.

Prior research attempts to solve poor compression in bitmaps achieve impressive results too, but at the cost of efficient set operations.

How do Roaring bitmaps work?

There isn't one deep insight that allows Roaring bitmaps to perform well. But there is a lot of stuff going on that is greater than the sum of its parts.

The following builds up an understanding of Roaring bitmaps one concept at a time.

Part 1: how Roaring bitmaps are represented in memory

All 32-bit integers are partitioned into contiguous chunks.

**Figure 3.** How the space of 32-bit integers is partitioned into chunks in Roaring bitmaps.

Each chunk shares the same 16 most significant bits.

As shown in Figure 3, the partitioning scheme used by Roaring bitmaps ensures that an integer will always belong to the same chunk of 2^16, or 65,536 consecutive integers.

Note: there are 64-bit implementations of Roaring bitmaps, but this post does not go into them. See CRoaring, this native Go implementation, and, uh, this paper written in French (if you find a translation, please let me know).

Integers in the same chunk are stored in containers.

Chunks are how integers are logically partitioned in Roaring bitmaps. All integers that belong to a chunk are physically (in memory) stored in the same container.

**Figure 4.** Three contrived examples of containers from the first Roaring bitmap paper.

Figure 4 shows examples of three different containers for three different chunks.

A chunk will always only have one container in a Roaring bitmap, defined by the 16 most significant bits of all the integers in the chunk.

If your program inserted the first 1,000 multiples of 62 into a Roaring bitmap, then they would be end up in the left-most container in Figure 4. That container's cardinality would be 1,000.

If you later inserted the integer 63, it would end up in the same container. The container's cardinality would then be 1,001.

As you'll see next, the cardinality of a container determines how it will be represented in memory.

Sparse containers contain <= 4,096 integers. These are stored as sorted packed arrays.

Two of the containers in in Figure 4 are sparse (with cardinalities 1,000 and 100) so they will be stored as sorted packed arrays of 16-bit integers.

By packed, we mean that we will be packing 32-bit integers into 16-bit integers as shown in Figure 5.

**Figure 5.** Two sparse Roaring bitmap containers from Figure 2 alongside examples of how they are stored in memory.

Packing integers is possible because each container can store at most 2^16 distinct integers. To get the original 32-bit integer in a sparse container, we have to unpack it by combining the 16-bit integer with its 16 most significant bits.

These arrays are dynamically allocated so the memory used by a sparse container grows as it accrues integers.

Dense containers contain > 4,096 integers. These are stored as bitmaps.

One of the containers in Figure 4 is dense (with cardinality 2¹⁵) so it will be stored as a traditional bitmap.

**Figure 6.** A dense Roaring bitmap container from Figure 2 alongside an example of how it is stored in memory.

Dense containers are bitmaps containing 2^16 bits (8 kilobytes), allocated outright. The Nth bit in the bitmap maps to the Nth integer in a chunk.

A first-level index points to all containers. The index is stored as a sorted array.

The first-level index stores the 16 most significant bits for each container in the Roaring bitmap along with a pointer to the corresponding container.

**Figure 7.** The containers described in Figure 2, 3, and 4 with a first-level index pointing to them.

The index is stored as a sorted array and grows dynamically as new containers are added to the Roaring bitmap.

Part 2: how set operations work with Roaring bitmaps

Integer insertion varies by container type and may cause a container's type to change.

To insert an integer, N, get N's 16 most significant bits (N / 2^16) and use it to find N's corresponding container in the Roaring bitmap.

Insertions in array and bitmap containers work differently:

Bitmap: set the bit at N % 2^16.
Array: insert N % 2^16 in its position in the sorted array.

Insertions may change the container type. If an array container has 4,096 integers, first convert it to a bitmap container. Then set the bit at N % 2^16.

If a container doesn't already exist then create a new array container, add it to the Roaring bitmap's first-level index, and add N to the array.

Checking for existence varies by container type.

To check if an integer N exists, get N's 16 most significant bits (N / 2^16) and use it to find N's corresponding container in the Roaring bitmap.

If the container doesn't exist, then N is not in the Roaring bitmap.

Checking for existence in array and bitmap containers works differently:

Bitmap: check if the bit at N % 2^16 is set.
Array: use binary search to find N % 2^16 in the sorted array.

Intersect matching containers to intersect two Roaring bitmaps. Algorithms vary by container type(s), and container types may change.

To intersect Roaring bitmaps A and B, it is sufficient to intersect matching containers in A and B.

This is possible because of how integers are partitioned in Roaring bitmaps: matching containers in A and B store integers with the same 16 most significant bits (the same chunks).

Intersection algorithms vary by the types of the containers involved, as do the resulting container types:

Bitmap / Bitmap: Compute the bitwise AND of the two bitmaps. If the cardinality is <= 4,096, store the result in an array container, otherwise store it in a bitmap container.
Bitmap / Array: Iterate over the array, checking for the existence of each 16-bit integer in the bitmap. If the integer exists, add it to the resulting array container – note that intersections of bitmap and array container types will always create an array container.
Array / Array: Intersections of two array containers always create a new array container. The algorithm used to compute the intersection varies by a cardinality heuristic described at the bottom of page 5 here. It will either be a simple merge (as used in merge sort) or a galloping intersection, described in this paper.

If there is a container in either Roaring bitmap without a corresponding container in the other, it will not exist in the result: the intersection of an empty set and any set is an empty set.

Union matching containers to produce a Roaring bitmap union. Algorithms vary by container type(s), and container types may change.

To union Roaring bitmaps A and B, union all matching containers in A and B.

Union algorithms vary by the container types involved, as do the resulting container types:

Bitmap / Bitmap: Compute the bitwise OR of the two bitmaps. Unions of two bitmap containers will always create another bitmap container.
Bitmap / Array: Copy the bitmap and set corresponding bits for all the integers in the array container. Unions of a bitmap and array container will always create another bitmap container.
Array / Array: If the sum of the cardinalities of the two array containers is <= 4,096, the resulting container will be an array container. In this case, add all integers from both arrays to a new array container. Otherwise, optimistically assume the resulting container will be a bitmap: create a new bitmap container and set all corresponding bits for all integers in both arrays. If the cardinality of the resulting container is <= 4,096, convert the bitmap container back into an array container.

Finally, add all containers in A and B that do not have a matching container to the result. Remember: this is a union, so all integers in Roaring bitmaps A and B must be in the resulting set.

Part 3: how a third and final container type – the "run" container – optimizes long runs of consecutive integers.

Parts 1 and 2 of this post cover most of the internal structure and operation of Roaring bitmaps. This final part covers an important optimization described in the second Roaring bitmap paper.

Run containers represent runs of consecutive integers with two 16-bit integers: the run start and run length.

From page 3 of the second Roaring bitmap paper:

The new container is conceptually simple: given a run (e.g., [10, 1000]), we store the starting point (10) and its length minus one (990) ... packing the starting points and the lengths in pairs, using 16 bits each ...

This technique is known as run-length encoding and seems to back most of the prior art described in the two papers. Run-length encoding can compress bitmaps effectively but degrades performance for set operations in many cases.

Run containers are formed explicitly when a client invokes a runOptimize function or, in some cases, implicitly when a large range is added to the Roaring bitmap.

Unlike sparse and dense containers, run containers generally do not materialize automatically.

Clients can invoke runOptimize to optimize their Roaring bitmap for large runs of consecutive integers. Run containers may replace existing array or bitmap containers in this case.
Roaring bitmaps offer an operation to add a range of values. Run containers may materialize automatically in this case.

The papers don't actually prescribe how or when #2 should happen. I'd guess that if a range of values were added for a chunk that does not yet have a container, it makes sense to create a run container instead of an array or bitmap container.

runOptimize only creates a run container if it will be smaller than the container it would replace.

runOptimize first counts the number of runs in a container.

Then, it decides whether or not to create a run container using a simple heuristic: the run container must be smaller than its equivalent array or bitmap container.

The algorithms used to count runs and a description of how to compute the heuristic above are described on page 6 and 7 in the second Roaring bitmap paper.

If you'd like to work out computing the heuristic yourself, it'll help to recall the following:

... array containers contain no more than 4,096 integers, packed into 16-bits each.
... bitmap containers contain > 4,096 integers in a bitmap with 2^16 bits (8,192 bytes).
... each run in a run container takes up 32 bits (16 bits for the start, 16 bits for the length).

The addition of run containers introduces new algorithms for all set operations.

The Roaring bitmap papers do not describe the algorithms used to insert and check for the existence of integers in run containers: these operations are relatively straightforward.

But the addition of run containers requires that Roaring bitmaps implement performant algorithms for unions and intersections of three new container type pairs:

Run / Run
Run / Array
Run / Bitmap

These algorithms also introduce new heuristics to determine the resulting container type from these operations.

I won't go into the details of these algorithms as I did in Part 2. The algorithms are not significantly more complicated (their descriptions start on page 10 here if you're curious), but the numerous details are beyond the scope of this post and the paper is well-written and makes them extremely accessible.

Roaring bitmaps: a game of performance whack-a-mole!

Roaring bitmaps use a kitchen sink of algorithms and techniques to achieve better compression and faster performance than other bitmap implementations.

They're challenging to implement but they do the job extremely well, especially when used in OLAP workloads. The creators managed to root out inefficiencies in common yet very different scenarios – sparse data, dense data, data with a lot of runs – and address all of them at once.

And they go even further!

A third paper describes an implementation the creators wrote in C that leverages vectorized algorithms they designed to use SIMD (single instruction multiple data) instructions. That implementation, CRoaring, along with bindings and implementations in multiple other languages are available here. They're used in mainstream columnar databases and search applications and are actively maintained, improved, and optimized regularly.

Very cool.