Example Project

Base 64 encoding

The next project we are going to work on will involve embedding information in a URL. Specifically, we are going to construct URLs that contain integer code numbers. We could embed the code numbers directly in a URL, but another goal for the project we will be working on is to make the URLs as short as possible. To make numbers shorter when they appear in a URL we are going to switch from the usual decimal notation for integers to a base 64 representation. The advantage of doing this is that respresenting large numbers in base 64 takes fewer digits than the usual base 10 representation.

To make all of this possible I have written a couple of functions that can be used to encode unsigned ints in base 64 and decode base 64 character strings back into unsigned integers.

Here is the code for those two functions:

const char table[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_";

void encode(unsigned int n,char* dest) {
    dest[6] = '\0';
    for(int k = 5;k > 0;k--) {
      dest[k] = table[n%64];
      n /= 64;
    }
    dest[0] = table[n];
}

unsigned int charToInt(char ch) {
    if(ch >= 'A' && ch <= 'Z')
        return (int) (ch-'A');
    if(ch >= 'a' && ch <= 'z')
        return 26 + (int) (ch-'a');
    if(ch >= '0' && ch <= '9')
        return 52 + (int) (ch - '0');
    if(ch == '-')
        return 62;
    if(ch == '_')
        return 63;
    return 0;
}

unsigned int decode(char* source) {
    unsigned int n = charToInt(source[0]);
    for(int k = 1;k <= 5;k++) {
      n *= 64;
      n += charToInt(source[k]);
    }
    return n;
}

In this base 64 encoding we represent each of the possible digits from 0 to 63 with a character. The table array in the code above stores the 64 characters needed for this encoding.

The encode() function will take an unsigned int and encode it as a sequence of 6 characters. We need 6 characters total because an unsigned int is a 32 bit data type. Each base 64 digit carries 6 bits of information, so it takes us 6 6 bit entries to cover the full 32 bits.

Likewise, the decode() function takes a pointer to an array containing 6 characters and turns that back into a single unsigned int.

Packaging the code

Since I may want to use this code in multiple projects I am going package this code as a combination of a C source code file and a C header file. The source code file, base64.c, contains the code shown above. Here is the header file, base64.h, meant to accompany this code:

// Encode an int into six characters
void encode(unsigned int n,char* dest);
// Decode a set of six characters into an int
unsigned int decode(char* src);

This header file contains function prototypes for the two functions defined in the source code file.

To test this code I have also written a short test program contained in a separate file, main.c:

#include <stdio.h>
#include "base64.h"

int main() {
  while(1) {
    printf("(E)ncode, (D)ecode, or (Q)uit: ");
    char buffer[16];
    unsigned int n;
    scanf("%s",buffer);
    if(buffer[0] == 'E' || buffer[0] == 'e') {
      unsigned long long limit = ((unsigned long long) 2) << 31;
      printf("Number must be less than %lld.\n",limit);
      printf("Enter an int: ");
      scanf("%u",&n);
      encode(n,buffer);
      printf("%s\n",buffer);
    } else if(buffer[0] == 'D' || buffer[0] == 'd') {
      printf("Enter code: ");
      scanf("%s",buffer);
      n = decode(buffer);
      printf("%u\n",n);
    } else
    break;
  }
}

Compiling the code

This is the first multi-file project we have seen in this course. Up to this point we have been using the build commands in Visual Studio Code to compile our projects. Once a project grows beyond a single source code file the build commands in Code do not work as well. For multi-file projects we are going to switch to using a makefile for the project. Using the makefile tools extension in Code we can easily compile, run, and debug multi-file projects.

Here is the makefile for this project:

test : base64.c main.c
  gcc base64.c main.c -g -o test

Since the project consists of two C source code files we have to list both files in the command line to build the project. Since we may want to also debug this code I have added the -g switch to the command line. This compiles the executable along with debugging information needed by the debugger.