Example Project

Implementing the final version of the URL shortening service

In the final version of the URL shortening service I am going to make two major changes:

The server

The server I am going to use for this project is based on the code for the CGI server example I showed in a previous lecture. That previous example demonstrated how to set up a server and a CGI application to handle a POST request. In this version of the server we are going to be working with two CGI applications, one that handles a POST and another that handles a GET.

Prefiltering GET requests

Another small change I am going to introduce is the use of a filter. In a web server a filter is a piece of code that modifies a request in some way before serving the request. In the case of the URL shortening application we will need to make a small change to the URL for one of the use cases. When users use the shortening service they will be sending GET requests with URLs that take the form

http://<server address>/s/XXXXXX

where XXXXXX is the six character code for the URL they want. Since we are going to pass this GET request along to a CGI application for processing, we have to start by rewriting the URL to put it in a more standard form for GET requests. The modified form will be

http://<server address>/cgi/decode?code=XXXXXX

This is the standard form for a GET request that contains a query parameter. The query parameter is the part of the URL that appears after the ? character in the URL.

To make this modification, I will add the following prefiltering function to the web server.

void prefilter(char* method,char* url) {
  if(strcmp(method,"GET") == 0 && strncmp(url,"/s/",3) == 0) {
    char newURL[128];
    sprintf(newURL,"/cgi/decode?code=%s",url+3);
    strcpy(url,newURL);
  }
}

Handling requests

Here now is the code for the serveRequest() function in the modified server:

void serveRequest(int fd) {
  char lineBuffer[256];

  // Read the first line of the request
  readLine(fd,lineBuffer,255);
  // Grab the method and URL
  char method[16];
  char url[128];
  sscanf(lineBuffer,"%s %s",method,url);
  prefilter(method,url);

  if(strcmp(method,"POST") == 0) {
    if(strncmp(url,"/cgi/",5) == 0) {
      // Read everything up to the blank line.
      // Grab the content length as you go.
      char contentLength[16];
      while(1) {
        readLine(fd,lineBuffer,255);
        if(strncmp(lineBuffer,"Content-Length:",15) == 0) {
          strcpy(contentLength,lineBuffer+16);
        } else if(lineBuffer[0] == '\r')
          break;
      }
      if (fork() == 0) {
        char* emptylist[] = { NULL };
        setenv("CONTENT_LENGTH", contentLength, 1);
        dup2(fd, STDIN_FILENO);
        dup2(fd, STDOUT_FILENO);
        execve(url+1, emptylist, environ);
      }
      wait(NULL);
    } else {
      handle404(fd);
    }
  } else {
    if(strncmp(url,"/cgi/",5) == 0) {
      if (fork() == 0) {
        char* emptylist[] = { NULL };
        char* queryPtr = strstr(url+5,"?");
        char query[32];
        if(queryPtr == NULL) {
          query[0] = '\0';
        } else {
          strncpy(query,queryPtr+1,32);
          *queryPtr = '\0';
        }
        setenv("QUERY_STRING", query,strlen(query));
        dup2(fd, STDIN_FILENO);
        dup2(fd, STDOUT_FILENO);
        execve(url+1, emptylist, environ);
      }
      wait(NULL);
    } else {
      // Try to get the file the user wants
      char fileName[128];
      strcpy(fileName,"www");
      strcat(fileName,url);
      int filed = open(fileName,O_RDONLY);
      if(filed == -1) {
        handle404(fd);
      } else {
        const char* responseStatus = "HTTP/1.1 200 OK\n";
        const char* responseOther = "Connection: close\nContent-Type: text/html\n";
        // Get the size of the file
        char len[64];
        struct stat st;
        fstat(filed,&st);
        sprintf(len,"Content-Length: %d\n\n",(int) st.st_size);
        // Send the headers
        write(fd,responseStatus,strlen(responseStatus));
        write(fd,responseOther,strlen(responseOther));
        write(fd,len,strlen(len));
        // Send the file
        char buffer[1024];
        int bytesRead;
        while(bytesRead = read(filed,buffer,1023)) {
          write(fd,buffer,bytesRead);
        }
        close(filed);
      }
    }
  }
  close(fd);
}

This is fairly generic code for a web server. The web server will route all requests whose URLs start with /cgi/ to CGI applications, and will try to serve all other requests normally. Note the call to the prefilter() function right after reading the method and the URL from the first line of the request.

The code for handling POST CGI requests is essentially identical to the code I used in the earlier example. The code for handling GET CGI requests is new here. The only small difference when handling a CGI GET request is that the portion of the URL that appears after the ? character gets stored in a QUERY_STRING environment variable. Since GET requests don't contain bodies, the CGI application will simply read the details of the query from that QUERY_STRING environment variable.

The database

The database system we are going to be using to store our URLs is the SQLite database system. This is a very simple, light-weight SQL database system that is implemented via a C library that we can link into our applications. Setting up SQLite in Linux is very simple. All we have to do is to run the following commands in a terminal to install the necessary packages:

sudo apt install sqlite3
sudo apt install libsqlite3-dev
sudo apt install sqlitebrowser

The sqlitebrowser package installs an application that will allow you to set up databases and view their contents.

The database we are going to be using for this project is included in the project files linked to at the top of these lecture notes. The database is stored in a SQLite .db file located in the CGI folder. You can open this database file in the sqlite browser application to see the structure and contents of the database. The database consists of a single table, Urls, that has two columns. The id column is an integer id number for each URL. This column is set as the primary key for the table, and also has the autoincrement option set. Whenever we insert a new URL into the table SQLite will automatically generate an integer id for it. The second column in the Urls table is the URL column, which stores the URLs.

The encode CGI application

The first step in the URL shortening service is for users to fill out a form requesting a shortened URL. Here is the HTML code for the page that contains this form:

<!DOCTYPE html>
<head>
    <title>URL shortening</title>
</head>
<body>
    <h1>URL shortening service</h1>
    <form action="http://localhost:8888/cgi/encode" method="Post">
        <label for="url">URL:</label>
        <input type="text" id="url" name="url"><br>
        <input type="submit" value="Submit">
    </form>
</body>

The action attribute of the form will route this request to the encode CGI application. That application is designed to read the requested URL from the form, store the URL in a database, and then respond with a web page that contains the shortened URL.

Here now is the code for the encode CGI application:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <sqlite3.h>
#include <base64/base64.h>

int fromHex(char ch) {
  if(ch >= '0' && ch <= '9')
    return (int) ch - '0';
  return (int) ch - 'A' + 10;
}

void decodeURL(char* src,char* dest) {
  while(*src != '\0') {
    if(*src == '%') {
      ++src;
      int n1 = fromHex(*src++);
      int n2 = fromHex(*src++);
      *dest++ = (char) n1*16+n2;
    } else {
      *dest++ = *src++;
    }
  }
  *dest = '\0';
}

int main(void) {
  sqlite3 *db;
  int rc = sqlite3_open("cgi/urls.db", &db);

  char* lengthStr;
  if ((rc == SQLITE_OK) && ((lengthStr = getenv("CONTENT_LENGTH")) != NULL)) {
    // Get the content length
    int length;
    sscanf(lengthStr,"%d",&length);

    // Read the query from stdin
    char buffer[256];
    read(STDIN_FILENO,buffer,length);
    buffer[length] = '\0';
    // Isolate the url from the query string.
    char* url = buffer+4;
    decodeURL(url,buffer);

    // Store the URL in the database
    char *err_msg = 0;
    char sql[128];
    sprintf(sql,"INSERT INTO Urls(URL) VALUES('%s');",buffer);
    rc = sqlite3_exec(db, sql, NULL, 0, &err_msg);

    // Get the id number of the inserted entry
    sqlite3_int64 id = sqlite3_last_insert_rowid(db);

    // Encode the id number
    char code[7];
    encode((unsigned int) id,code);

    // Make the body of the response
    char content[1024];
    sprintf(content,"<!DOCTYPE html>\r\n");
    sprintf(content,"%s<head>\r\n",content);
    sprintf(content,"%s<title>URL shortening</title>\r\n",content);
    sprintf(content,"%s</head>\r\n",content);
    sprintf(content,"%s<body>\r\n",content);
    sprintf(content,"%s<h1>Your URL</h1>\r\n",content);
    sprintf(content,"%s<p>Your URL is http://localhost:8888/s/%s<\p>\r\n",content,code);
    sprintf(content,"%s</body>",content);

    // Send the response
    printf("HTTP/1.1 200 OK\r\n");
    printf("Connection: close\r\n");
    printf("Content-length: %d\r\n", (int)strlen(content));
    printf("Content-type: text/html\r\n\r\n");
    printf("%s", content);
    fflush(stdout);
  } else {
    // Send back an error response
    printf("HTTP/1.1 500 Internal Server Error\r\n");
    printf("Connection: close\r\n");
    printf("Content-length: 21\r\n");
    printf("Content-type: text/plain\r\n\r\n");
    printf("Something went wrong.");
    fflush(stdout);
  }
  sqlite3_close(db);
  return 0;
}

Just as in the previous example of a CGI POST application that we saw in an earlier lecture, this application will read the length of the request from the CONTENT_LENGTH environment variable and will then read the request from the standard input. After storing the requested URL in the database, the application will respond by printing the reply page to the standard output.

The new thing going on in this example is the database interaction. To be able to communicate with the SQLite database we need to make use of the SQLite C library. The first step in doing this is including the appropriate header file:

#include <sqlite3.h>

In the makefile for the CGI applications you will also see me linking this library into each of the two CGI applications.

The first step in interacting with the database file is to open a connection to the database:

sqlite3 *db;
int rc = sqlite3_open("cgi/urls.db", &db);

The sqlite3_open() function makes the connection to the database. The first parameter to this function specifies the location of the database file. An obscure detail here is the correct way to set up this location. Although we are going to be opening the urls.db file from the CGI applications and the CGI applications live in the same directory as the database file, we have to use "cgi/urls.db" here and not "urls.db". The reason for this has to do with the way we are going to run the CGI applications: these will be launched by forking the server and then using execve() to run the applications. When you use execve() to launch an application, it inherits all of the characteristics of the parent application, including the current working directory. The path from the server's current working directory to the database file is "cgi/urls.db", so this is the path we specify in the sqlite3_open() function.

The sqlite3_open() function returns a result code that can tell use whether or not the connection to the database was successful. If this result code is anything other than SQLITE_OK we simply have the CGI application give up and return an HTML 500 Internal Server Error response page.

The database interaction we are going to perform here is a SQL Insert statement. Here is the code to carry out this request:

// Store the URL in the database
char *err_msg = 0;
char sql[128];
sprintf(sql,"INSERT INTO Urls(URL) VALUES('%s');",buffer);
rc = sqlite3_exec(db, sql, NULL, 0, &err_msg);

The buffer used here is a character array that stores the URL the user wants stored in the database. We use the sqlite3_exec() to execute the SQL Insert statement.

When we execute this insert statement, SQLite will add a new row to the Urls table for our URL. That row will have an automatically generated integer id. To learn what that id is we use this code:

// Get the id number of the inserted entry
sqlite3_int64 id = sqlite3_last_insert_rowid(db);

Finally, to generate the shortened URL for the user we have to use our base64 library's encode() function:

// Encode the id number
char code[7];
encode((unsigned int) id,code);

When we are done working with the database we close the connection to the database:

sqlite3_close(db);

The decode CGI application

When the server application receives a GET request that takes the form

http://<server address>/s/XXXXXX

it will first rewrite it into the standard form for a CGI GET request URL

http://<server address>/cgi/decode?code=XXXXXX

The server will then copy the query parameter

code=XXXXXX

into the QUERY_STRING environment variable and then invoke the decode CGI application.

Here now is the code for the decode application:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <sqlite3.h>
#include <base64/base64.h>

void errorResponse() {
  // Send back an error response
  printf("HTTP/1.1 500 Internal Server Error\r\n");
  printf("Connection: close\r\n");
  printf("Content-length: 21\r\n");
  printf("Content-type: text/plain\r\n\r\n");
  printf("Something went wrong.");
  fflush(stdout);
}

char URL[256];

int callback(void *NotUsed, int argc, char **argv, char **colNames) {
    if(strcmp(colNames[0],"URL") == 0)
      strncpy(URL,argv[0],255);
    else
      URL[0] = '\0';

    return 0;
}

int main() {
  int status = 0;
  sqlite3 *db;
  int rc = sqlite3_open("cgi/urls.db", &db);

  char* codeStr;
  if ((rc == SQLITE_OK) && ((codeStr = getenv("QUERY_STRING")) != NULL) && (strncmp(codeStr,"code=",5)==0)) {
    unsigned int code = decode(codeStr+5);

    char *err_msg = 0;
    char sql[128];
    sprintf(sql,"SELECT URL from Urls WHERE id = %u;",code);
    rc = sqlite3_exec(db, sql, callback, 0, &err_msg);
    if (rc != SQLITE_OK || URL[0] == '\0') {
      errorResponse();

      if(rc != SQLITE_OK)
        sqlite3_free(err_msg);

      status = 1;
    } else {
      printf("HTTP/1.1 301 Permanently Moved\n");
      printf("Location: ");
      printf("%s\r\n\r\n",URL);
      fflush(stdout);
    }
  } else {
    errorResponse();
    status = 1;
  }

  sqlite3_close(db);
  return status;
}

After opening the connection to the database and reading the QUERY_STRING environment variable, the application prepares a SQL select query to fetch the URL from the database. As usual, we will use the SQLite sqlite3_exec() function to run that query. The difference this time around is that we are doing a SQL Select, which will return one or more rows of data from the database. To read the date that comes back in those rows we have to provide a pointer to a callback function that the SQLite library will call for each row that comes back from the database.

Here is the callback function we will use:

int callback(void *NotUsed, int argc, char **argv, char **colNames) {
    if(argc > 0 && strcmp(colNames[0],"URL") == 0)
      strncpy(URL,argv[0],255);
    else
      URL[0] = '\0';

    return 0;
}

The SQLite library forces us to use a very specific structure for this callback function. The argv parameter is a pointer to an array of strings: this array contains the data for the row we are processing. The colNames parameter is a pointer to an array of strings containing the names of the columns for each of the data values stored in argv. The argc parameter tells us how many strings are in the argv and colNames arrays.

In this case, we are expecting to get back a single row from the database containing a single column. The code for the callback function confirms that the name of the sole column returned is "URL" and then copies the data item from argv[0] into a global array URL that will store the returned URLs. Finally, the callback function is expected to return an int that signals whether or not the read was successful.

With the callback function in place we can run the query to fetch the row the users wants:

unsigned int code = decode(codeStr+5);

char *err_msg = 0;
char sql[128];
sprintf(sql,"SELECT URL from Urls WHERE id = %u;",code);
rc = sqlite3_exec(db, sql, callback, 0, &err_msg);

The third parameter to sqlite3_exec() is where we specify the callback function to use.