CS21 Lab3: A Multithreaded Web Server

Due: Thursday Feb 16 before 1 am (late Wednesday night)

Lab 3 Partners
  Luis Ramirez and Nick Felt   Elliot Weiser and Steven Hwang
  Jordan Singleton and Phil Koonce   Ames Bielenberg and Niels Verosky
  Kyle Erf and Sam White   Choloe Stevens and Katherine Bertaut
See the git howto for information about how you can set up a git repository for your lab 3 project.


Project Introduction
For this assignment you and your partner will implement a web server. This lab is designed to give you some practice writing client-server socket programs, writing a multi-threaded server, using signals, and learning about the HTTP protocol.

This is a larger and more involved programming assignment than the first two labs. I strongly encourage you to get started on it right away.

There is a lot of information about getting started and about helpful resources on this page (including information about where to get starting point code and sample code). Read through this entire page before you get started, and refer back to it as you go...if you have a question about how to do something, there may be an answer or hint here.

Contents:
Project Requirements
Project Details
Getting Started
Useful Functions and Links to more Resources
Submission and Demo

Project Requirements
Note if you start your web server on one of our lab machines, you can only connect to it with clients that are also running on our lab machines.

Project Details

Web server

The basic design of your web server is the following:
  1. create a listen socket on port 8888
  2. enter an infinite loop:
    1. accept the next connection
    2. if there are already max connections, kill the oldest thread by sending it a SIGALRM signal.
    3. create a new thread to handle the new client's connection, passing it the socket returned by accept.
    4. the worker thread main function should be an infinite loop that only exits if there is an error condition returned by a system call, or if the thread receives a SIGALRM from the main thread and kills itself. Otherwise, the worker threads continue to handle HTTP requests from the client.

      Before a thread dies, it should close its end of the socket and clean up any other global state necessary for correct functioning of your web server.

The main server thread should be in an infinite loop, waiting to accept the next client connection. It should exit only when it gets appropriate error return values from accept, send, recv, read, write, ...

Signals and Sockets and Threads

Threads share the same address space so they can coordinate using shared memory, and synchronize using locks, barriers, or semaphores. Threads also share the same copy of open files and the signal table associated with the process in which they are contained. This means that if one thread opens a file, all threads can read or write to it using the file descriptor returned by open. Similarly, if one thread closes a file, it is closed for all threads in the process.

In Unix, sockets have a file interface and threads can close sockets just like they would close a file by calling close:

int fd = socket();
...
close(fd); 

Your web server will use signals as a way to notify a worker thread that it should die when there are too many open connections. A signal is a software interrupt, that can by synchronous or asynchronous. One process or thread can send (or post) a signal to another one, and when the other one receives the signal it stops doing what it is currently doing and runs a special signal handler function. Processes (and threads) can block some signals, register their own handler functions on some signals, or just use the operating system's signal handler functions (this is the default). For example, when you type CNTL-C in the terminal that is running a program, the running process is sent a SIGKILL signal telling it to die. SIGKILL is an example of a non-blockable signal, meaning that a process cannot choose to ignore a SIGKILL...it must die.

Your web server's main thread will send a worker thread a SIGALRM signal when it wants the worker thread to exit (and close its connection to the client). To do this, do the following:

  1. The main thread will register a signal handler function on the SIGALRM signal before entering its main loop (this sets up the signal handler on SIGALRM for all threads):
      struct sigaction sa;
    
      // set all field values in sa to zero using memset:
      memset((void *)(&sa), 0, sizeof(sigacts)); 
      sigemptyset(&sa.sa_mask);
      // name of my signal handler function:
      sa.sa_handler = my_sigalrm_handler;  
      sa.sa_flags = 0;
    
      // register my signal handler with the SIGALRM signal: 
      val = sigaction(SIGALRM, &sa, NULL);
    
  2. When the main listener thread receives a new connection, it will check to see if there are already a maximum number of connections, and if so, it will send the oldest thread a SIGALRM signal by calling:
    pthread_signal(workers_pthread_tid, SIGALRM);  
    
  3. The signaled worker thread will call the handler function registered on SIGALRM:
    void my_sigalrm_handler(int s) {
      // clean up any shared state associated with me
      // close my socket 
      // and call pthread_exit to die
    }
    
You will have to determine how a signaled thread knows which socket is its own to close.

Threads will also need to detect and handle other cases when they should exit, and clean up any global state associated with them, including closing their socket before exiting. One place where this may occur is if the client side disconnects and closes its end of the socket.

HTTP 1.1 and multiple simultaneous connections

You should use the pthread library to spawn a new server thread each time a client connects to your server. The server thread has a dedicated connection to this client and will keep this connection open and continue to handle GET and HEAD requests from the client. Your main server thread should return back to its accept loop after spawning the server thread so that it can handle a connection from another client. This way your server can simultaneously handle requests from different clients. Test that this works by connecting to your server from different clients simultaneously and sending multiple requests from these clients.

Remember to link in the pthreads library to compile a pthreads program If you are using the Makefile from my client server example code, it is already included here:

LIBS =  $(LIBDIRS) -pthread
If you aren't using my Makefile, include -pthread at the end of the gcc or g++ command line in your makefile.

The main listener thread should repeat its main loop after spawning a new worker thread (and perhaps killing an old one), and call accept on the listener socket to wait for another client connection.

If your solution requires any use of shared state among threads, make sure to use a pthread synchronization primitive (likely a pthread_mutex_t) to synchronize the accesses to this shared state. Also, think about scope very carefully: threads can only share memory associated with global variables or that is on the heap. Technically, a thread can share state on another thread's stack too (if they have a pointer to it) but I strongly suggest not doing this because the state can be overwritten and modified by the other thread's execution.

Web clients

You can use multiple programs to connect to your web server and send it HTTP commands:
  1. telnet server_IP port_num, then type in a GET command (make sure to enter a blank line after the GET command). For example:
    $ telnet 130.58.68.62 8888
    
      GET /index.html HTTP/1.0
    
    
    telnet will exit when it detects that your web server has closed its end of the socket (or you can kill it with CNTL^C, or if that doesn't work use kill or pkill: pkill telnet). Use ifconfig to get a machine's IP address (described in Useful Utilities section).

  2. firefox: Enter the url of the desired page specifying your web server using its IP:port_num (e.g. http://130.58.68.62:8888/index.php)

    You can also just use localhost or the host name on our system:

    localhost:8888/index.php
    tomato:8888/~cfk/
    

  3. wget: wget -v 130.58.68.62:8888/index.html
    wget copies the html file returned by your web server into a file with a matching name (index.html) in the directory from which you call wget.

  4. modify the example client program to send http requests to your server. I don't think this is necessary (since the other three clients are already written for you), but you could modify the web_client program given with the starting point code to send GET requests to your web server and receive the responses.

HTTP

Start by reading HTTP Made Really Easy by Jim Marshall.

It is very important that you can interpret the format of a client request correctly, and that you send correctly formated responses to clients. Many parts of a correctly formatted message involve sequences of carriage return and newline characters ("\r\n"). These are used to signify the end of all or part of a "message". Here is the general format of a server request:

   initial line
   Header1: value1
   Header2: value2
   Header3: value3

   (optional message body goes here)
For example, a GET response for a very simple page may look like:
   HTTP/1.1 200 OK
   Date: Sun, 10 Jan 2010 18:17:43 GMT
   Content-Type: text/html
   Content-Length: 53

   <html>
   <body>
   <h1>CS 87 Test Page</h1>
   </body></html>

It is very important that each header line ends with a "\r\n" and that there is a blank line (another "\r\n") between the headers and the message body. The message body, however is sent without a trailing "\r\n". Instead the header Content-Length is used to tell the client the size of the message body.

GET requests and mapping urls to files

There is one format of url that you do not need to handle for this assignment. These are ones where the server would respond with a "301 Moved Permanently" response vs. responding with OK and the file contents. This case is described below in more detail.

Directory names in urls correspond to files named either index.html or index.php in the named directory. Your web server should first look for a file named index.html and if that doesn't exist look for index.php when handling these requests.

Here are some example GET requests that you need to handle, and their corresponding file name(s):

GET  /   HTTP/1.1                           /scratch/cs87/cs/index.html 
                                       or   /scratch/cs87/cs/index.php 

GET /index.html  HTTP/1.1                   /scratch/cs87/cs/index.html 

GET /index.php   HTTP/1.1                   /scratch/cs87/cs/index.php 

GET /search.html HTTP/1.1                   /scratch/cs87/cs/search.html

GET /courses/ HTTP/1.1                      /scratch/cs87/cs/courses/index.html 
                                            /scratch/cs87/cs/courses/index.php 

GET /~newhall/  HTTP/1.1                    /home/newhall/public_html/index.html
                                            /home/newhall/public_html/index.php

GET /~newhall/newcluster.jpg  HTTP/1.1      /home/newhall/public_html/newcluster.jpg

You do not need to correctly handle GET requests of the following format (i.e. GET requests with no trailing '/' when the last name corresponds to a directory):
GET /~newhall  HTTP/1.1
GET /courses  HTTP/1.1
The way a web server would handle requests like this is to send a "301 Moved Permanently" response to the client with the real url of the page ("Location: http://IP:portnum/~newhall/"). The client would resend the GET request using the url returned by the server:
GET /~newhall/  HTTP/1.1
When your web server receives a request of this form, you can choose to either have it respond with an error response or with OK. If your web server sends an OK response, then the client may make subsequent GET requests for any files included in the page, and these GET requests will not have the correct url (the client doesn't know that newhall is a directory and instead of requesting /~newhall/foo.jpg will request /foo.jpg, if my homepage includes the foo.jp file). Just handle these as you would any bad url (there is no file associated with /foo.jpg).

You do, however, need to correctly handle GET requests with the trailing '/' (e.g. /~newhall/).

You are welcome to add support for 301 responses if you'd like, but you are not required to do so for this assignment, so I'd suggest only adding this after the rest of your web server works.

Getting Started
You can grab a copy of my starting point files for client and server TCP/IP socket programs in C. They are in ~newhall/public/cs87/socket_startingpt/. The starting point contains a sample Makefile for building a web_client and web_server executables, and the very beginnings of both implementations (mostly just #includes for the server).

In addition, I have a example program for sending and handling signals in pthread programs. It is available here: ~newhall/public/cs87/pthreads_signals_example/

I strongly encourage you to implement and test incrementally. Also, it is very important to check return values from all functions and to handle error return values correctly. For example, if a call to read on a socket returns before the requested number of bytes have been read, this could mean that the other end of the socket was closed. When this is the case, you want to stop continuing to try to read from this socket (an infinite loop).

Here is one suggestion for proceeding:

  1. Starting with the starting point code, finish a simple client and server program where the client connects to the server and sends it a simple message and waits for a response. The server should receive the message, print it out, and close the socket. The client should exit when it detects the server has closed its end of the socket.
  2. See if you can connect to your web server from wget, firefox and telnet and send it an http request (in the correct format). Your server could just spawn a worker thread whose main function just prints out the message, closes the socket and calls pthread_exit (no infinite worker thread loop, and no response sent to the client).
  3. Next, modify your server to send a fake response to a client GET request (don't really parse the requested page and fetch the corresponding file, but send a 200 response with a very short web page message body. If all goes well, firefox should display your bogus web page after receiving your response. If things don't go well, connect to your server using telnet as you can more easily see what the client is receiving from your server.
  4. Next, add support for finding the correct web page to return for a GET response. Add support for handling different errors (file not found, etc.).
  5. Next, add in full support multiple pthread worker threads that keep the connection open until they are killed or detect an error and kill themselves. Add support for the main thread killing the oldest connections when max connections are reached and a new connection comes in.
  6. Make sure your program is free of valgrind errors (it would not hurt to run in it on valgrind as you develop different parts too).
  7. Remove (or comment out) any debug output before submitting your solution.
Your program should use good modular design, be well-commented, robust, and correct. See my C Style Guide off my C resource page.

Useful Functions and Resources
Submission and Demo
Create a tar file containing:
  1. All your web server source files, and makefile to build server (and client if applicable).
  2. A README file with: (1) you and your partner's names; (2) an example of how to run your web server (a command line); and (3) a description of any features you have not fully supported and/or any errors you were unable to fix.
I'd suggest creating a handin directory and copying all these things into it. It is good to check that you have your full solution in the handin directory (type 'make' to check that everything builds, try running it, then type 'make clean' to remove executables and .o's from what you submit). Then tar up your handin directory.

One of you or your partner should submit your tar file by running cs87handin.

Demo

You and your partner will sign up for a 15 minute demo slot to demo your web server. Think about, and practice, different scenarios to demonstrate both correctness and good error handling. You will want to demonstrate concurrent client connections, persistent connections, what happens when the client side closes it end of the socket (maybe via killing the client), and show that older server connection are closed when the max number of connections has been reached.