In some programming situations it may be necessary to do manipulations at the level of individual bits. This is something of a challenge, because the smallest addressable unit of information in a computer is a byte, which is 8 bits. These lecture notes will present some simple techniques you can use to manipulate the individual bits in a byte or any other data type.
Both C and C++ offer bit-level logical operators. These operators perform logical operations at the level of individual bits. To describe what these operators do, it suffices to show logic tables that describe what effect these operators have on pairs of bits we apply them to. The following table shows what the bitwise-and, bitwise-or, and bitwise-exclusive-or operators do.
| a | b | a & b | a | b | a ^ b |
|---|---|---|---|---|
| 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 1 | 1 |
| 0 | 1 | 0 | 1 | 1 |
| 1 | 1 | 1 | 1 | 0 |
Because C and C++ do not offer a method for accessing individual bits in a byte, these logical operators are designed to operate in parallel on the 8 pairs of bits in a pair of bytes. The table below shows what results these operators produce when given particular pairs of bytes to operate on.
| a | b | a & b | a | b | a ^ b |
|---|---|---|---|---|
| 00000000 | 00001111 | 00000000 | 00001111 | 00001111 |
| 00000000 | 01010101 | 00000000 | 01010101 | 01010101 |
| 00001111 | 10010110 | 00000110 | 10011111 | 10011001 |
| 10101010 | 11001100 | 10001000 | 11101110 | 01100110 |
With a basic understanding of these logical operators and some cleverness we can design operations to manipulate individual bits in a byte. In all of the examples below I will assume that the individual bits in a byte are numbered from 0 to 7, with the 0 bit at the right end and the 7 bit at the left end.
To set the kth bit in a byte to 1, we do a bitwise-or of the byte with a mask that has its kth bit set to 1 and all other bits set to 0. Here is an example to demonstrate how to set bit 3 to 1:
11010110 | 00001000 = 11011110
To set the kth bit in a byte to 0, we do a bitwise-and of the byte with a mask that has its kth bit set to 0 and all other bits set to 0. To set bit 3 of a byte to 0 we do
11001100 & 11110111 = 11000100
To flip the kth bit in a byte to its opposite value, we do a bitwise-exclusive-or of the byte with a mask that has its kth bit set to 1 and all other bits set to 0:
11010110 ^ 00001000 = 11011110
11001100 ^ 00001000 = 11000100
To test whether or not a particular bit is set, we can combine the byte with a mask using the bitwise-and operation and then test whether or not the result is 0:
11010110 & 00001000 = 00000000
Up to this point in your programming career you have been in the habit of using decimal numbers for integers. As the examples above show, to manipulate individual bits we have to switch to thinking in binary. Unfortunately, C++ offers no standard syntax for specifying numbers in binary. Instead, C++ does have syntax that allows you to represent numbers in hexadecimal (base 16) form. The hexadecimal number system uses digits 0-9 and letters a-f to represent the base 16 digits 0-9 and 10-15, respectively. Each hexadecimal digit represents 4 bits, so a pair of hexadecimal digits suffices to represent a byte. In C++ you can write a number in hexadecimal form by prefixing it with 0x - the following examples show how to convert each of the binary examples above to hexadecimal notation:
11010110 | 00001000 = 11011110 => 0xd6 | 0x08 = 0xde
11001100 & 11110111 = 11000100 => 0xcc & 0xf7 = 0xc4
11010110 ^ 00001000 = 11011110 => 0xd6 ^ 0x08 = 0xde
11001100 ^ 00001000 = 11000100 => 0xcc ^ 0x08 = 0xc4
Here are some common C++ data types and their sizes in bytes.
| data type | size in bytes |
|---|---|
| unsigned char | 1 |
| unsigned short | 2 |
| unsigned int | 4 |
| unsigned long | 4 |
| unsigned long long | 4 or 8 |
Bear in mind that if you want to manipulate individual bits in any of these data types by using masks, you need to make sure that your mask is the same size as the thing you are manipulating. For example, if you have a pointer variable and you want to set its third bit to 1, you have to use a mask with 32 bits, or 8 hexadecimal digits. To further confuse things, some data types will change their size when you recompile for a different operating system.
If you working with some data type whose size you don't know, you can always determine its size by using the sizeof() operator. sizeof() takes any data type as its argument and returns its size in bytes.
A convenient trick you can use to guarantee that your mask is the same size as the thing it is manipulating is to not hard code the mask but compute it at run time. For example, to make a byte mask with binary form 00001000 use can use the following trick:
const unsigned char myMask = ((unsigned char) (1 << 3));
This trick uses the bit shift operator to take the initial number 00000001 and shift its bits 3 spaces to the left to make the desired 00001000. Since the number 1 by default has type int, the expression 1 << 3 will also have type int. A type cast is thus required to convert the number to the desired form.
Another convenient trick that is useful for forming masks of the form 11110111 makes use of the bitwise-not operator, ~. This operator flips all the bits in a number to their opposite values. Thus, to quickly construct the mask 11110111 we use the trick above to first construct 00001000 and then apply the ~ operation to it:
const unsigned char myNegativeMask = ~((unsigned char) (1 << 3));
One application for the techniques shown above that dates back to the time when C programmers tried to squeeze every possible efficiency out of the memory available in a computer is a space-saving trick. Here is a Wikipedia page that shows an application where it is possible to save space by packing four bools together in to a single byte to save space, and a cleaner alternative syntax that makes use of bit fields.