Example Project

A web server that supports CGI applications

A widely used design pattern in computer science is the extensible application, which is an application that offers some mechanism for adding new features to the application. The most common way to make an application extensible is by offering a plug-in mechanism which gives end users the ability to customize the application by adding special software components called plug-ins.

Unix has long had a tradition of supporting applications with a modular architecture. The main mechanism underlying modularity in Unix is the ability of one process to launch another process to serve as a helper. Early web servers took advantage of this capability to make themselves extensible by end users by adding special helper applications called CGI applications. CGI is an acronym that stands for "Common Gateway Interface". CGI is a set of rules that specify how helper applications can offload work from a web server in certain circumstances.

In today's example we are going to modify the web server we have been building to give it the ability to use external CGI applications to help with form processing. Specifically, we are going to modify the web server to hand off certain POST requests to an external program for processing.

A simple form that needs processing

After extending our web server to allow it to work with CGI applications we will put together a simple test example. The example consists of a simple web page containing a form that needs some processing and a CGI application that will take over the processing.

Here is the HTML code for the web page:

<!DOCTYPE html>
<head>
    <title>Welcome to miniweb</title>
</head>
<body>
    <h1>Register</h1>
    <p>Fill out the form below to register for our newsletter.</p>
    <form action="http://localhost:8888/cgi/register" method="post">
      Your name: <input type="text" name="name"><br>
      Your email: <input type="email" name="email"><br>
      <input type="submit">
    </form>
</body>

The action attribute of the form in this page points to a URL that contains /cgi/ as part of the URL. We are going to modify the web server so that when it receives a POST request for a URL that contains /cgi/ the web server will pass the request along to a CGI application whose name appears in the URL after the /cgi/.

We are going to add a cgi folder to the directory where the miniweb server application is stored and use that folder to store CGI applications that the server can invoke. In this example the URL in the form request wants to make use of a register CGI application.

The CGI application

CGI is a specification that sets up rules for how CGI applications should work. The CGI specification states that an application that wants to process a POST for a web server should do the following things:

  1. The application should start by reading the environment variable CONTENT_LENGTH. This variable specifies the size of the body in the POST request.
  2. The application should then read the body of the POST request from the standard input.
  3. The application should construct an HTTP response and print its response to the standard output. Typically the response will consist of a set of headers followed by a body containing HTML code for the response.

Here now is the code for the CGI application for our example. This example CGI application fakes putting some information in a database for the user, and then sends back a response indicating that the action was successful.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>

int main(void) {
  char* lengthStr;
  if ((lengthStr = getenv("CONTENT_LENGTH")) != NULL) {
    // Get the content length
    int length;
    sscanf(lengthStr,"%d",&length);
    // Read the query from stdin
    char buffer[256];
    read(STDIN_FILENO,buffer,length);
    buffer[length] = '\0';
    // Isolate the name from the query string.
    char* p = strchr(buffer, '&');
    *p = '\0';
    char* name = buffer+5;
    p = strchr(name,'+');
    if(p != NULL)
    *p = ' ';

    // Make the body of the response
    char content[1024];
    sprintf(content,"<!DOCTYPE html>\r\n");
    sprintf(content,"%s<head>\r\n",content);
    sprintf(content,"%s<title>Registration successful</title>\r\n",content);
    sprintf(content,"%s</head>\r\n",content);
    sprintf(content,"%s<body>\r\n",content);
    sprintf(content,"%s<p>Hello, %s. You are now registered.<\p>\r\n",content,name);
    sprintf(content,"%s</body>",content);

    // Send the response
    printf("HTTP/1.1 200 OK\r\n");
    printf("Connection: close\r\n");
    printf("Content-length: %d\r\n", (int)strlen(content));
    printf("Content-type: text/html\r\n\r\n");
    printf("%s", content);
    fflush(stdout);
  } else {
    // Send back an error response
    printf("HTTP/1.1 500 Internal Server Error\r\n");
    printf("Connection: close\r\n");
    printf("Content-length: 21\r\n");
    printf("Content-type: text/plain\r\n\r\n");
    printf("Something went wrong.");
    fflush(stdout);
  }
  return 0;
}

To read the contents of the CONTENT_LENGTH environment variable the application uses the getenv() function, which returns a pointer to a character array that stores the value of that variable. Once we know the length of the body the browser has sent us, we than use the read() function to read that many bytes from STDIN_FILENO, which reads the bytes from the standard input.

Likewise, the response gets written to the standard output. One additional special step here is the call to fflush(). This call is not needed normally when we write to the standard output. However, when this application runs its standard output is going to be connected to the socket that connects to the browser. To ensure that the output that we have written gets flushed across the network to the browser we need to add the call to fflush().

Invoking the CGI application

To add the ability to run CGI applications to our web server we need to make some modifications to the server's serveRequest() function. Here is the code for the updated serveRequest():

void serveRequest(int fd) {
  char lineBuffer[256];

  // Read the first line of the request
  readLine(fd,lineBuffer,255);
  // Grab the method and URL
  char method[16];
  char url[128];
  sscanf(lineBuffer,"%s %s",method,url);

  if(strcmp(method,"POST") == 0) {
    if(strncmp(url,"/cgi/",5) == 0) {
      // Read everything up to the blank line.
      // Grab the content length as you go.
      char contentLength[16];
      while(1) {
        readLine(fd,lineBuffer,255);
        if(strncmp(lineBuffer,"Content-Length:",15) == 0) {
          strcpy(contentLength,lineBuffer+16);
        } else if(lineBuffer[0] == '\r')
          break;
      }
      if (fork() == 0) {
        char* emptylist[] = { NULL };
        setenv("CONTENT_LENGTH", contentLength, 1);
        dup2(fd, STDIN_FILENO);
        dup2(fd, STDOUT_FILENO);
        execve(url+1, emptylist, environ);
      }
      wait(NULL);
    } else {
      handle404(fd);
    }
  } else {
    // Try to get the file the user wants
    char fileName[128];
    strcpy(fileName,"www");
    strcat(fileName,url);
    int filed = open(fileName,O_RDONLY);
    if(filed == -1) {
      handle404(fd);
    } else {
      const char* responseStatus = "HTTP/1.1 200 OK\n";
      const char* responseOther = "Connection: close\nContent-Type: text/html\n";
      // Get the size of the file
      char len[64];
      struct stat st;
      fstat(filed,&st);
      sprintf(len,"Content-Length: %d\n\n",(int) st.st_size);
      // Send the headers
      write(fd,responseStatus,strlen(responseStatus));
      write(fd,responseOther,strlen(responseOther));
      write(fd,len,strlen(len));
      // Send the file
      char buffer[1024];
      int bytesRead;
      while(bytesRead = read(filed,buffer,1023)) {
        write(fd,buffer,bytesRead);
      }
      close(filed);
    }
  }
  close(fd);
}

One important change in this code is the way that we read the request from the browser. Since we will want to pass certain requests along to a CGI application we have to be careful to not accidentally read the body of any request we get from a browser. What we are going to do instead is read through the browser's request line by line to determine what needs to be done with the request. By reading the request one line at a time we can stop after reading the blank line that marks the end of the headers. This will leave the body unread so that the CGI application can go ahead and read the body.

To facilitate reading the request one line at a time I have written a simple readLine() function that can read one line of text from an input stream:

void readLine(int fd,char* buffer,int maxBytes) {
  char* ptr = buffer;
  int bytesRead = 0;
  while(bytesRead < maxBytes) {
    read(fd,ptr,1);
    if(*ptr == '\n')
      break;
    ptr++;
  }
  *(++ptr) = '\0';
}

This function uses the usual read() function to read one byte at a time from the socket. We read characters one at a time until we encounter the '\n' that marks the end of a line.

serveRequest() starts by reading the first line of the request, which contains the method and the URL for the request. We then set up some if-else statements that handle two simple cases:

  1. If the method is POST and the URL starts with /cgi/ this is a POST request destined for a CGI application. We make arrangements to pass the request along to the CGI application.
  2. If the method is GET we go ahead and try to handle the request in the usual way.

Here now is the part of serveRequest() that handles CGI POST requests:

// Read everything up to the blank line.
// Grab the content length as you go.
char contentLength[16];
while(1) {
  readLine(fd,lineBuffer,255);
  if(strncmp(lineBuffer,"Content-Length:",15) == 0) {
    strcpy(contentLength,lineBuffer+16);
  } else if(lineBuffer[0] == '\r')
    break;
}
if (fork() == 0) {
  char* emptylist[] = { NULL };
  setenv("CONTENT_LENGTH", contentLength, 1);
  dup2(fd, STDIN_FILENO);
  dup2(fd, STDOUT_FILENO);
  execve(url+1, emptylist, environ);
}
wait(NULL);

The code starts with a loop that reads the headers line by line until it comes to the blank line that marks the end of the headers. Along the way the loop will ignore almost all of the headers. The only header it will read is the Content-Length header: we need that value to pass along to the CGI application.

To launch the CGI application we are going to start by doing a fork(). The child process spawned by this call to fork() will launch the CGI application. Before launching the application we use the setenv() function to write the CONTENT_LENGTH environment variable. We then use the dup2() function to connect the socket from the browser to the standard input and the standard output. We do this because the CGI application will always try to read its input from standard input and write its output to standard output.

Once everything is set up, we launch the CGI application with a call to execve(), which launches the application and has the application take over the currently running child process. Once the CGI application finishes its work and exits, the child process will stop.

Back in the parent process we issue a call to the wait() function, which causes the server to wait until the CGI application finishes its work. Once that happens we can go ahead and close the connection to the browser.

You can read more about the execve()function in the section titled "Running a New Process" in chapter five of the textbook. You can read more about the wait() function in the section titled "Waiting for Terminated Child Processes".