Two additional parts of the C language

In these notes I am going to take your through two important parts of the C language that we have not yet had a chance to discuss in detail. The first portion of these notes will cover the C preprocessor. The second portion of these notes will cover bit manipulation operators.

Besides being important in their own right, I am covering these topics now because they will play an important role in the example that I show in the lecture coming up on Monday of next week.

The C preprocessor

Compiling a C source code file is a two stage process. The first step in the process is to run the source code file through a special program called the C preprocessor. The output of the preprocessor then gets run through the C compiler itself.

The preprocessor is a text substitution program. The preprocessor's job is to look for special preprocessor directives embedded in the source code. These directives are easily identifiable: they all start with the # character.

Includes, defines, and if/else constructs

The most familiar preprocessor directive is the #include command. The purpose of this command is to include the text of header files in source code files.

When the preprocessor encounters an #include directive such as

#include <stdio.h>

it locates the header file in question and copies its entire contents into the source code file in place of the #include statement.

Another familiar directive is the #define command, which is used to define values for use in the program. An example of this is

#define ARRAY_SIZE 100

The preprocessor uses the #define command to perform text substitutions. After encountering the #define in the example above the preprocessor will replace the text ARRAY_SIZE with the value 100 everywhere in the source code file.

#defines are sometimes used in combination with the #ifdef .. #else .. #endif construct. Here is an example.

#ifdef ARRAY_SIZE
  int A[ARRAY_SIZE];
#else
  int A[20];
#endif

The purpose of this construct is to selectively include code in a source code file. In the example above, if the symbol ARRAY_SIZE is defined, the preprocessor will leave the first declaration for A in the source code file and strip out the second. If ARRAY_SIZE is not defined, the preprocessor will strip out the first declaration for A and put in the second instead.

macros

Another type of preprocessor directive is the macro. The purpose of macros is to set up something that looks and behaves a little bit like a function declaration.

Here is an example. The #define below sets up a macro that defines something like a squaring function.

#define SQ(x) x*x

This defines the macro SQ, which computes the square of its argument. macros operate via a text substitution mechanism. Once the macro SQ is defined, the preprocessor will look for any text in the program that looks like SQ(<text>) and replace it with the text <text>*<text>.

Because a macro is actually a text substitution mechanism, you have to exercise a little caution when setting up and using macros. Here is an example of what can go wrong. Suppose you had some code in your program that looks like this:

if(SQ(x+1) > 4)

When the preprocessor encounters this line it will perform the text substitution set up in the definition of SQ to produce something like this:

if(x+1*x+1 > 4)

This result is probably not what you had intended. Because a macro is a text substitution mechanism and not a function definition, it will screw up examples like this one.

The fix for this is to employ some defensive parentheses in the original macro definition. If we change the definition to

#define SQ(x) (x)*(x)

the preprocessor will translate

if(SQ(x+1) > 4)

into

if((x+1)*(x+1) > 4)

which is correct.

Bit manipulation in C

The example program I will be showing on Monday of next week needs to do some bit manipulation operations. For those of you who have never done bit manipulation in C, here is a short tutorial on the C language features needed to do this.

The hexadecimal number system

The first thing you will need to know is how to represent sequences of bits. The most widely used method to represent bit sequences in C is to use hexadecimal numbers. The hexadecimal number system is a base 16 number system, in which each digit represents a sequence of four bits. Here is a table of the digits in the hexadecimal number system, along with the binary equivalent for each of these digits.

DigitBinary
00000
10001
20010
30011
40100
50101
60110
70111
81000
91001
a1010
b1011
c1100
d1101
e1110
f1111

The C language allows you write hexadecimal numbers by using the 0x notation. For example, 0xa stands for the hexadecimal number a, whose binary equivalent is 1010.

You can create longer bit sequences by using more hexadecimal digits. For example, if you need to represent the bit sequence

1000010011001110

you first break it into groups of four bits

1000 0100 1100 1110

and then use the table above to convert each bit sequence into its hexadecimal digit equivalent.

0x84ce

C bitwise operators

Once you have set up a bit sequence you will want to be able to access and manipulate individual bits in the bit sequence. To do this, C offers three bit manipulation operators, &, |, ^, and ~.

The ~ operator is the bitwise negation operator, which flips each bit in a sequence to its opposite. For example, if an unsigned char variable x contains 0xb = 1011, ~x evaluates to 0100, which is 0x8.

The & operator is the bitwise logical and operator, which forms the logical and of pairs of bits taken from two numbers. For example, if we have two unsigned char variables x and y containing the values 0xa = 1010 and 0x6 = 0110, x&y will evaluate to (1010)&(0110) = (0010) = 0x2.

Likewise, the | operator forms the bitwise or. x|y would evaluate to (1010)|(0110) = (1110) = 0xe.

The ^ operator forms the exclusive or of bits. The exclusive or of two bits is 0 if the bits are the same and 1 if they are different. x^y would evaluate to (1010)^(0110) = (1100) = 0xc.

Working with individual bits

These bitwise manipulation operators are used to perform bit level manipulations, such as checking the value of individual bits or manipulating individual bits in a bit sequence.

To test the value of a particular bit in a number we use the bitwise and operator &. For example, to determine whether the last bit in a number is a 0 or a 1, we form the bitwise and of the number with a specially constructed mask number. In this case the mask number we would use is the number all of whose digits are 0 except for a 1 in the last position. For example, if we have an unsigned short variable x with the value of 0xa8 = 10101000 and we want to determine whether or not the last bit is a 0, we would use the test

x&0x01 == 0

We can access other bits in the number just by changing the mask that we & against x. For example, to check to see whether or not the third bit is a 1 we would use

x&0x20 == 0x20

To set bits in a number to particular values we use either | or & depending on whether we want to set that bit to be a 1 or a 0. For example, to set the fourth bit in an eight bit variable y to be a 1 we would use

y = y|0x10

To set the fourth bit of y to 0 we would use & with the mask 0xef = 11101111:

y = y&0xef

Another way to do this is to use the ~ operator to help construct the mask:

y = y&(~0x10)