Bit Level Operations

In some programming situations it may be necessary to do manipulations at the level of individual bits. This is something of a challenge, because the smallest addressable unit of information in a computer is a byte, which is 8 bits. These lecture notes will present some simple techniques you can use to manipulate the individual bits in a byte or any other data type.

Bit-level logical operators

Both C and C++ offer bit-level logical operators. These operators perform logical operations at the level of individual bits. To describe what these operators do, it suffices to show logic tables that describe what effect these operators have on pairs of bits we apply them to. The following table shows what the bitwise-and, bitwise-or, and bitwise-exclusive-or operators do.

aba & ba | ba ^ b
00000
10011
01011
11110

Because C and C++ do not offer a method for accessing individual bits in a byte, these logical operators are designed to operate in parallel on the 8 pairs of bits in a pair of bytes. The table below shows what results these operators produce when given particular pairs of bytes to operate on.

ab a & b a | b a ^ b
00000000000011110000000000001111 00001111
000000000101010100000000 0101010101010101
0000111110010110 000001101001111110011001
10101010110011001000100011101110 01100110

Using masks to manipulate individual bits

With a basic understanding of these logical operators and some cleverness we can design operations to manipulate individual bits in a byte. In all of the examples below I will assume that the individual bits in a byte are numbered from 0 to 7, with the 0 bit at the right end and the 7 bit at the left end.

To set the kth bit in a byte to 1, we do a bitwise-or of the byte with a mask that has its kth bit set to 1 and all other bits set to 0. Here is an example to demonstrate how to set bit 3 to 1:

11010110 | 00001000 = 11011110

To set the kth bit in a byte to 0, we do a bitwise-and of the byte with a mask that has its kth bit set to 0 and all other bits set to 0. To set bit 3 of a byte to 0 we do

11001100 & 11110111 = 11000100

To flip the kth bit in a byte to its opposite value, we do a bitwise-exclusive-or of the byte with a mask that has its kth bit set to 1 and all other bits set to 0:

11010110 ^ 00001000 = 11011110

11001100 ^ 00001000 = 11000100

To test whether or not a particular bit is set, we can combine the byte with a mask using the bitwise-and operation and then test whether or not the result is 0:

11010110 & 00001000 = 00000000

Representing binary numbers in hexadecimal

Up to this point in your programming career you have been in the habit of using decimal numbers for integers. As the examples above show, to manipulate individual bits we have to switch to thinking in binary. Unfortunately, C++ offers no standard syntax for specifying numbers in binary. Instead, C++ does have syntax that allows you to represent numbers in hexadecimal (base 16) form. The hexadecimal number system uses digits 0-9 and letters a-f to represent the base 16 digits 0-9 and 10-15, respectively. Each hexadecimal digit represents 4 bits, so a pair of hexadecimal digits suffices to represent a byte. In C++ you can write a number in hexadecimal form by prefixing it with 0x - the following examples show how to convert each of the binary examples above to hexadecimal notation:

11010110 | 00001000 = 11011110 => 0xd6 | 0x08 = 0xde

11001100 & 11110111 = 11000100 => 0xcc & 0xf7 = 0xc4

11010110 ^ 00001000 = 11011110 => 0xd6 ^ 0x08 = 0xde

11001100 ^ 00001000 = 11000100 => 0xcc ^ 0x08 = 0xc4

Data types and their sizes

Here are some common C++ data types and their sizes in bytes.

data typesize in bytes
unsigned char1
unsigned short2
unsigned int4
unsigned long4
unsigned long long4 or 8

Bear in mind that if you want to manipulate individual bits in any of these data types by using masks, you need to make sure that your mask is the same size as the thing you are manipulating. For example, if you have a pointer variable and you want to set its third bit to 1, you have to use a mask with 32 bits, or 8 hexadecimal digits. To further confuse things, some data types will change their size when you recompile for a different operating system.

If you working with some data type whose size you don't know, you can always determine its size by using the sizeof() operator. sizeof() takes any data type as its argument and returns its size in bytes.

A convenient trick you can use to guarantee that your mask is the same size as the thing it is manipulating is to not hard code the mask but compute it at run time. For example, to make a byte mask with binary form 00001000 use can use the following trick:

const unsigned char myMask = ((unsigned char) (1 << 3));

This trick uses the bit shift operator to take the initial number 00000001 and shift its bits 3 spaces to the left to make the desired 00001000. Since the number 1 by default has type int, the expression 1 << 3 will also have type int. A type cast is thus required to convert the number to the desired form.

Another convenient trick that is useful for forming masks of the form 11110111 makes use of the bitwise-not operator, ~. This operator flips all the bits in a number to their opposite values. Thus, to quickly construct the mask 11110111 we use the trick above to first construct 00001000 and then apply the ~ operation to it:

const unsigned char myNegativeMask = ~((unsigned char) (1 << 3));

bit fields

One application for the techniques shown above that dates back to the time when C programmers tried to squeeze every possible efficiency out of the memory available in a computer is a space-saving trick. Here is a Wikipedia page that shows an application where it is possible to save space by packing four bools together in to a single byte to save space, and a cleaner alternative syntax that makes use of bit fields.