URL shortening

A URL shortening service is a service that allows users to create short URLs that resolve to a longer URL. A typical use case for one of these services is embedding a link in a tweet. Since tweets have length restrictions, embedding a long URL in a tweet can leave little room for a message. A URL shortening service allows users to substitute a much shorter URL for a longer URL and save space in a message. When users click on the shortened URL the service that provides the short URL automatically redirects the user's browser to the original, longer URL.

For example, bitly.com offers a popular URL shortening service. Here is an example of a shortened bitly URL that resolves to the course web page for CMSC 480: https://bit.ly/3dxnRDg. Clicking the link will take your browser briefly to the bitly web server, which will immediately issue a command to your browser to go instead to the original URL for the course web page.

In this programming assignment we are going to construct a network application that implements a simple URL shortening service.

What the application will do

The application will allow users to create shortened URLs and then use those shortened URLs. When a user submits a shortened URL to the server the server will respond with a command to the browser redirecting the user to the original URL that the shortened URL is based on.

To start the process of making a shortened URL, the user will start by visiting a web page. Here is the HTML code for that page:

<!DOCTYPE html>
<head>
  <title>URL shortening</title>
</head>
<body>
  <h1>URL shortening service</h1>
  <form action="http://localhost:8888/encode" method="Post">
    <label for="url">URL:</label>
    <input type="text" id="url" name="url"><br>
    <input type="submit" value="Submit">
  </form>
</body>

The page contains a form where the user can type or paste in the URL that they want shortened. Clicking the submit button on the form sends a request to our server to generate a shortened URL for the user's URL. The server will respond by sending back a short web page that has the user's shortened URL embedded init.

When the user uses this shortened URL in a browser the application will respond with an HTTP 301 response redirecting the user's browser to the original URL that the shortened URL is based on. This will cause the user's browser to immediately go to the page that the original URL points to.

What you need to do

Start by downloading the code for version three of simple web server I showed in lecture. In this assignment you are going to make several specific modifications to turn the web server into a server that implements the URL shortening service.

You are going to use most of the original code for the web server, with the exception of the serveRequest() function: you will be modifying this function to implement the new service.

Your modified serveRequest() function should retain the code that reads the request and scans the request for the method and url in the first line of the request:

// Read the request
char buffer[1024];
int bytesRead = read(fd,buffer,1024);
buffer[bytesRead] = '\0';

// Grab the method and URL
char method[16];
char url[128];
sscanf(requestBuffer,"%s %s",method,url);

After determining the method the browser is requesting we have to handle two broad cases. The first case handles the POST request. The only POST request that we will have to handle is the request to encode a URL. Likewise, the only GET request we will have to serve is a request containing one of our shortened URLs.

Handling the POST

Here is what the browser will send us when the user types a URL in the form and clicks submit:

POST /encode HTTP/1.1
Host: localhost:8888
Connection: keep-alive
Content-Length: 64
Cache-Control: max-age=0
sec-ch-ua: " Not A;Brand";v="99", "Chromium";v="96", "Google Chrome";v="96"
sec-ch-ua-mobile: ?0
sec-ch-ua-platform: "Linux"
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36
Origin: null
Content-Type: application/x-www-form-urlencoded
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Sec-Fetch-Site: cross-site
Sec-Fetch-Mode: navigate
Sec-Fetch-User: ?1
Sec-Fetch-Dest: document
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9

url=http%3A%2F%2Fwww.lawrence.edu%2Ffast%2Fgreggj%2Fcmsc480.html

The only part of this POST request that we will need to use is the last line, which contains the body of the request. The body contains a single form parameter, url. The value of this parameter is the URL that we will want to grab and store. To locate this part of the request we can use the strstr() function on the buffer. The strstr() can search a string for the first occurance of a pattern. In this case the pattern you should search for is "\r\n\r\nurl=", which includes the blank line before body and the start of the body.

When the browser sends you a URL it will encode the text of the URL using a special encoding called URL encoding. In this encoding certain special characters such as ':' and '/' with be replaced with a hexidecimal character code such as "%3A" or "%2F". Here is some C code you can use to decode the url string:

int fromHex(char ch) {
  if(ch >= '0' && ch <= '9')
    return (int) ch - '0';
  return (int) ch - 'A' + 10;
}

void decodeURL(char* src,char* dest) {
  while(*src != '\0') {
    if(*src == '%') {
      ++src;
      int n1 = fromHex(*src++);
      int n2 = fromHex(*src++);
      *dest++ = (char) n1*16+n2;
    } else {
      *dest++ = *src++;
    }
  }
  *dest = '\0';
}

The decodeURL() function takes two parameters, a pointer to a source string containing an encoded URL and a pointer to a character array where you want the decoded URL to be written.

Once you have received a URL from the form you should append the URL to a text file that stores all the URLs for your service. Also write a newline character after the URL, as we will want to store one URL per line in the text file. Before writing the URL to the file you should use the ftell() function to determine the current file position.

To construct the new URL for the user, pass the file position that ftell() gave you to the base64 encoding function that I provided in the lecture on base64 encoding.

To construct the response for the POST, read the following template code from a text file that you have prepared ahead of time:

HTTP/1.1 200 OK
Connection: close
Content-Type: text/html
Content-Length: 155

<!DOCTYPE html>
<head>
    <title>URL shortening</title>
</head>
<body>
    <h1>Your URL</h1>
    <p>Your URL is http://localhost:8888/s/XXXXXX</p>
</body>

Load this file into your buffer and then use the strstr() function to locate the XXXXXX placeholder. Replace these placeholder characters with the code you just generated, then send the buffer to the client.

Handling GET requests

The only GET request we will have to serve in this application are shortened URLs of the form

http://localhost:8888/s/XXXXXX

When your server receives a GET request you should check the URL you received to see whether or not it starts with the /s/ pattern. If it does not, you should just return a 404 response.

If you receive a shortened URL request starting with /s/ you should use the base64 decoding function to decode this four character code sequence into an int. This int is a position in the text file where you have stored the original URLs. Use the fseek() function to move to that location in the text file and then read the string you find at that location. This is the URL that you will want to redirect the client to.

To do the redirect, send the client a reply that contains

HTTP/1.1 301 Permanently Moved
Location: <URL>

where <URL> is the original URL that you read from the URL file. Make sure the URL you send here starts with http:// and remember to include a blank line at the end of the response. This tells the client that your response is complete.