CS21 Lab2: Web Server Part 1

Due: Monday Feb 8 before 11:59pm
For this assignment you and your partner will implement a webserver. This lab is designed to give you some practice writing client-server socket programs. For part 1 you will write a single-threaded webserver that will handle one client request at a time. In the next part you will add features so that your webserver can simultaneously handle multiple client requests.

Contents:
Project Requirements
Project Details
Getting Started
Useful Functions and Links to more Resources
Submission and Demo

Project Requirements
Note that if you start your webserver on one of our lab machines, you can only connect to it with clients that are also running on our lab machines.
Project Details

Web server

The basic design of your webserver is the following:
  1. create a listen socket on port 8888
  2. enter an infinite loop:
    1. accept the next connection
    2. call a handler function, passing it the socket returned by accept, that will handle the client's request.
    3. after the client's request is handled, close the socket and return to the main loop.
  3. it should exit its infinite loop and quit when it gets appropriate error return values from accept, send, recv, read, write, ...

Web clients

You can use multiple programs to connect to your webserver and send it commands:
  1. telnet server_IP port_num, then type in a GET command (make sure to enter a blank line after the GET command). For example:
    $ telnet 130.58.68.62 8888
    
      GET /index.html HTTP/1.0
    
    
    telnet will exit when it detects that your webserver has closed its end of the socket

  2. firefox: Enter the url of the desired page specifying your web server using its IP:port_num (e.g. http://130.58.68.62:8888/index.html)

  3. wget: wget -v 130.58.68.62:8888/index.html

  4. modify the example client program to send http requests to your server. I don't think this is necessary (since the other three clients are already written for you), but you could modify the web_client program given with the starting point code, to send GET requests to your webserver and receive the responses.

HTTP

Start by reading HTTP Made Really Easy by Jim Marshall.

It is very important that you can interpret the format of a client request correctly, and that you send correctly formated responses to clients. Many parts of a correctly formatted message involve sequences of carriage return and newline characters ("\r\n"). These are used to signify the end of all or part of a "message". Here is the general format of a server request:

   initial line
   Header1: value1
   Header2: value2
   Header3: value3

   (optional message body goes here)
For example, a GET response for a very simple page may look like:
   HTTP/1.1 200 OK
   Date: Sun, 10 Jan 2010 18:17:43 GMT
   Content-Type: text/html
   Content-Length: 53

   <html>
   <body>
   <h1>CS 87 Test Page</h1>
   </body></html>

It is very important that each header line ends with a "\r\n" and that there is a blank line (another "\r\n") between the headers and the message body. The message body, however is sent without a trailing "\r\n". Instead the header Content-Length is used to tell the client the size of the message body.

GET requests and mapping urls to files

There is one format of url that you do not need to handle for this assignment. These are ones where the server would respond with a "301 Moved Permanently" response vs. responding with OK and the file contents. This case is described below in more detail.

Here are some example GET requests that you need to handle, and their corresponding file name(s) ("directory names" correspond to files named either index.html or index.php in the named directory):

GET  /   HTTP/1.0                           /scratch/cs87webpages/index.html 
                                            /scratch/cs87webpages/index.php 

GET /index.html  HTTP/1.0                   /scratch/cs87webpages/index.html 

GET /index.php   HTTP/1.0                   /scratch/cs87webpages/index.php 

GET /search.html HTTP/1.0                   /scratch/cs87webpages/search.html

GET /courses/ HTTP/1.0                      /scratch/cs87webpages/courses/index.html 
                                            /scratch/cs87webpages/courses/index.php 

GET /~newhall/  HTTP/1.0                    /home/newhall/public_html/index.html
                                            /home/newhall/public_html/index.php

GET /~newhall/newcluster.jpg  HTTP/1.0      /home/newhall/public_html/newcluster.jpg

You do not need to correctly handle GET requests of the following format (i.e. GET requests with no trailing '/' when the last name corresponds to a directory):
GET /~newhall  HTTP/1.1
GET /courses  HTTP/1.1
The way a web server would handle requests like this is to send a "301 Moved Permanently" response to the client with the real url of the page ("Location: http://IP:portnum/~newhall/"). The client would resend the GET request using the url returned by the server:
GET /~newhall/  HTTP/1.1
When your webserver receives a request of this form, you can choose to either have it respond with an error response or with OK. If your webserver sends an OK response, then the client may make subsequent GET requests for any files included in the page, and these GET requests will not have the correct url (the client doesn't know that newhall is a directory and instead of requesting /~newhall/foo.jpg will request /foo.jpg, if my homepage includes the foo.jp file). Just handle these as you would any bad url (there is no file associated with /foo.jpg).

You do, however, need to correctly handle GET requests with the trailing '/' (e.g. /~newhall/).

You are welcome to add support for 301 responses if you'd like, but you are not required to do so for this assignment, so I'd suggest only adding this after the rest of your webserver works.

Getting Started
You can grab a copy of my starting point files for this lab. They are in ~newhall/public/cs87/lab2. The starting point contains a sample makefile for building a web_client and web_server executables, and the very beginings of both (mostly just #includes).

I strongly encourage you to implement and test incrementally. Also, it is very important to check return values from all functions and to handle error return values correctly. For example, if a call to read on a socket returns before the requested number of bytes have been read, this could mean that the other end of the socket was closed. When this is the case, you want to stop continuing to try to read from this socket (an infinite loop).

Here is one suggestion for proceeding:

  1. Starting with the starting point code, finish a simple client and server program where the client connects to the server and sends it a simple message and waits for a response. The server should receive the message, print it out, and close the socket. The client should exit when it detects the server has closed its end of the socket.
  2. See if you can connect to your webserver from wget, firefox and telnet and send it an http request (in the correct format). Your server could just print out the message and close the socket.
  3. Next, modify your server to send a fake response to a client GET request (don't really parse the requested page and fetch the corresponding file, but send a 200 response with a very short web page message body. If all goes well, firefox should display your bogus web page after receiving your response. If things don't go well, connect to your server using telnet as you can more easily see what the client is receiving from your server.
  4. Next, add support for finding the correct web page to return for a GET response. Add support for handling different errors (file not found, etc.).
  5. Make sure your program is free of valgrind errors (it would not hurt to run in it on valgrind as you develop different parts too).
  6. Remove (or comment out) any debug output before submitting your solution.
Your program should use good modular design, be well-commented, robust, and correct. See my C Style Guide off my C resource page.

Useful Functions and Resources
Submission and Demo
Create a tar file containing:
  1. All your webserver source files, and makefile to build server (and client if applicable).
  2. A README file with: (1) you and your partner's names; (2) an example of how to run your webserver (a command line); and (3) a description of any features you have not fully supported and/or any errors you were unable to fix.
I'd suggest creating a handin directory and copying all these things into it. It is good to check that you have your full solution in the handin directory (type 'make' to check that everything builds, try running it, then type 'make clean' to remove executables and .o's from what you submit). Then tar up your handin directory.

One of you or your partner should submit your tar file by running cs87handin.

Demo

You and your partner will sign up for a 15 minute demo slot to demo your web server. Think about, and practice, different scenarios to demonstrate both correctness and good error handling.