Thursday, November 13, 2008

Oracle Bitmap indexes - implementation

Oracle uses BBC (Byte aligned Bitmap Code) 1-sided algorithm to store Bitmap indexes.

 

What is BBC?

 

Thinking of Bitmap indexes, most of us recall a simple Bitmap Index such as the one mention below:

Columns

Row 1

Row 2

Row 3

Row 4

Row 5

Row 6

Row 7

Row 8

Row 9

Red

0

0

0

0

0

0

0

1

1

Green

1

1

0

0

0

0

1

0

0

Blue

0

0

1

1

1

1

0

0

0


Fig: 1 - Table having 9 rows with data distribution shown as 0 and 1.

 

In the above style of storing bits is uncompressed i.e. all 0s and 1s are stored as it is. So, while doing any bitwise operation (and/or/xor), we use the uncompressed version of bitmaps.

 

Potential drawbacks of using this technique:

  1. high storage cost
  2. high query cost

 

Can compression be achieved?

Yes, we can compress 0s and 1s which are consecutive. Think of compressing seven 0s as one with a special header saying that this 0 is equivalent to seven 0s. 

 

But this has its own disadvantage. For any bit-wise operation, I have to first decompress bitmap indexes in question and do the “and”/”or” operation, which outweighs any savings in storage costs.

 

So, the need was for a compress bitmap technique using which bit-wise operations could be done on the compressed state.

One such example of such techniques is BBC (used by Oracle) – called Byte aligned bitmap code.

 

This technique of bitmap compression is based on run-length encoding. Using this encoding BITMAPs are aligned along the BYTE boundaries. Each aligned BYTE is called a gap-byte (GBYTE) if all the bits of that BYTE store the same logical value (0 or 1). If it’s a mix of bits having different logical values, its called a non-gap-byte or map-byte (MBYTE). Adjacent bytes of the same class are grouped and controlled by a control byte (CBYTE), which keeps the information whether the bytes following it are MBYTE or GBYTE and if its GBYTE what is the length of bytes? And if it’s a MBYTE, which type of MBYTE (is it an offset or not? – offset is a MBYTE having only single bit different from other bits in that byte)

 

In the above explanation, a gap-byte could be for 0 or for 1. so, it is called 2-sided BBC.

Oracle uses 1-sided BBC, in which only 0s are candidate for GBYTE not 1s. All 1s will still make an MBYTE in Oracle. I’m not sure why Oracle uses 1-sided BBC technique. It could be due to integration complexities with other part of RDBMS-engine. 

But going back on the question, why do we need Byte alignment when I can compress without it?

  1. Computers are happy dealing with BYTEs than BITS.
  2. I can do BIT-WISE operations as efficiently (in fact better) as I can do in decompress form.
  3. Control Bytes (CBYTES) are very handy in decoding the compress BITMAP.

 

Can this BITMAP compression and query response get better?

From what I see after doing research on net, the answer is Yes.

 

The technique of handling the bits BYTEs way has gotten better by handling the WORD way.

 

The concepts are similar to the ones followed in BBC, the major difference is aligning the bits using “word” rather than “byte”. Computers today are more friendly dealing in words (multiple bytes) than on byte.

 

 

I will write more on word aligned hybrid compression technique in perspective of oracle in coming days.

 

Thanks for reading!

 

 

Inputs:

Patent# 5363098 – by Antoshenkov, Gennady  (Oracle corp).