CS 43 — Lab 2: A Concurrent Web Server
Due: Thursday, February 24 @ 11:59 PM
1. Overview
Having built a web client, for this lab we’ll look at the other end of the HTTP protocol — the web server. As real web clients (e.g., browsers like Firefox) send requests to your server, you’ll be finding the requested files and serving them back to the clients.
1.1. Goals
-
Implement the server side of (non-persistent) HTTP over a TCP connection.
-
Apply socket system calls (
bind,listen, andaccept) on a server process to interact clients. -
Use threading to serve multiple concurrent clients.
-
More practice with sockets,
send, andrecv.
1.2. Handy References:
-
RFC 1945: HTTP 1.0 Specification. Sections 4, 5, and 6 are probably the most helpful.
-
Manual pages for
pthread_create, andpthread_detach.
2. Requirements
Your server program, lab2, will receive two arguments:
-
the port number it should listen on for incoming connections, and
-
the directory out of which it will serve files (typically called the document root).
For example:
./lab2 8080 test_documents
This command will tell your web server to listen for connections on port 8080
and serve files out of the test_documents directory. That is, the
test_documents directory is considered / when responding to requests. If
you’re asked for /index.html, you should respond with the file that resides
in test_documents/index.html. If you’re asked for /dir1/dir2/file.ext, you
should respond with the file test_documents/dir1/dir2/file.ext.
|
On most UNIX systems, only users with administrative (root) privileges are allowed to bind to ports below 1024. Users without such privileges often test web services on ports 8080 or 8000 because they sound "close" to port 80. When connecting your web browser to your lab2 server, you’ll need to explicitly
specify the port number in the URL with a colon ( http://localhost:8080/index.html or equivalently, http://127.0.0.1:8080/index.html |
You may find the chdir system call helpful
when dealing with file paths. It will change your process’s "working
directory", and making your working directory the document root will help in
locating files within it.
2.1. Workflow of Your Program
Roughly, your server should follow this sequence:
-
Read the arguments, find your document root, bind to the specified port, and begin listening for incoming connections.
-
Accept a connection, and:
-
week 1: hand the socket off to a function that handles the remaining steps.
-
week 2: pass the socket to a new thread for concurrent processing.
-
-
Receive and parse a request from the client.
-
Look for the path that was requested, starting from your document root (the second argument to your program). One of four things should happen: You might want to make each of these cases a separate function!
-
If the path exists and it’s a regular file, formulate a response (with the
Content-Typeheader set) and send it back to the client. -
If the path exists and it’s a directory that contains an
index.htmlfile, respond with that file. -
week 2: If the path exists and it’s a directory that does NOT contain an
index.htmlfile, respond with a directory listing. -
If the path does not exist, respond with a
404code with a basic HTML error page. The 404 HTML page can be static and very simple — it just needs to be enough for a user to see a 404 message in a real browser.
-
-
Close the connection and continue serving other clients.
2.2. Server Behavior Expectations
For full credit:
-
Your server should send byte-for-byte identical copies of files to clients. Use
wgetorcurlto fetch files andmd5sumordiffto compare the fetched file with the original. I will do this when grading! -
A variety of file formats should display properly in a real web browser (e.g.,
firefox), including both text and binary formats. You’ll need to return the proper HTTPContent-Typeheader in your response. You don’t need to handle everything on that list, but you should at least be able to handle files with.html,.txt,.jpeg,.jpg,.gif,.png,.pdf, and.icoextensions. You may assume that the file extension is correct (e.g., I’m not going to name a PDF file with a.txtsuffix). -
If asked for a file that does not exist, you should respond with a 404 error code with a readable error page, just like a web server would. It doesn’t need to be fancy, but it should contain some basic HTML so that the browser renders something and makes the error clear.
-
Some clients may be slow to complete a connection or send a request. Your server should be able to serve multiple clients concurrently, not just back-to-back. For this lab, use multithreading with pthreads to handle concurrent connections. (We’ll try an alternative to threads, event-based concurrency, in a future lab assignment.)
-
If the path requested by the client is a directory, you should handle the request as if it was for the file
index.htmlinside that directory, if such a file exists. Hint: use thestatsystem call to determine if a path is a directory or a file. Using theS_ISDIRmacro on thest_modefield of the stat struct will help you to identify directories. -
The web server should respond with a list of files when the user requests a directory that does not contain an
index.htmlfile. You can read the contents of a directory using theopendirandreaddircalls. Together they behave like an iterator. That is, you can open aDIR *withopendirand then continue callingreaddir, which returns info for one file, on thatDIR *until it returnsNULL. Note that there should be no additional files created on the server’s disk to respond to the request. The response should mimic result of running:python -m SimpleHTTPServer
-
Your program should generate no warnings from
valgrind. Ifvalgrindever tells you something is wrong DON’T IGNORE IT! Fix it before moving on.
2.3. Assumptions
-
You may assume that file suffixes correctly correspond to their type (e.g., if a file ends in ".pdf" that it really is a PDF file).
-
You may assume that requests sent to your server are at most 4 KB in length.
-
You may assume that if the user requests a path that is a directory, the path will end in a trailing
/. When generating the list of files in a directory, make sure your server also sends back URLs that end in/for directories. This is for the benefit of your browser, which keeps track of its current location based on the absence or presence of slashes. -
You may assume that you will only receive GET requests from clients.
-
If you receive an
HTTP/1.1request, you should respond back with anHTTP/1.0response.
|
You should NOT assume anything about the size of the file that a client requests. Rather than trying to read the entire file into memory at once, you can read a chunk of the file (e.g., 4096 bytes) and then send just that chunk (in loop!) before reading the next chunk. |
2.4. Checkpoint
To be on track, by the start of the next lab session, you should have finished:
-
Your server can accept client connections and hand them to a function for further processing.
-
Your processing function can:
-
receive a full request from the client, using the presence of a double CRLF to determine that it has received the full request.
-
parse the request and extract the requested path.
-
generate a response (both header and body) for requested regular files and directories that contain an
index.htmlfile.
-
A good stretch goal is:
-
Sending back a simple, static HTML document for 404 errors (requested file not found).
It’s fine to defer until next week:
-
Handling multiple clients concurrently, with threading.
-
Producing directory listings for directories that do not contain an
index.htmlfile.
3. Examples and Testing
You should test your server in two ways:
-
Using a real web browser like
firefox, request files and ensure that they render properly. Note: browsers are very forgiving in what they receive and will do their best to render properly, even when they aren’t given correct data. -
To verify correctness, you should use a tool like
wgetto request and save copies of files from your server. You can then use the tools likediffandmd5sumthat we used to verify correctness in lab 1.
4. Tips & FAQ
-
Use HTTP version 1.0 — version 1.1 can get a lot more complicated. The subset of the HTTP 1.0 protocol you’ll need to implement for this assignment is quite small, but you may find the full protocol specification to be helpful.
-
All HTTP headers are ASCII string characters, so you can use the
strfamily of functions to manipulate them safely. -
Always, always, always check the return value of any system calls you make!
4.1. File types
When setting the Content-Type header, use the following file suffix to
content type mappings:
-
html:text/html -
txt:text/plain -
jpeg:image/jpeg -
jpg:image/jpg -
gif:image/gif -
png:image/png -
pdf:application/pdf -
ico:image/x-icon
It’s fine to hard-code knowledge of these specific types into your server.
4.2. File paths
-
chdir: use this function to change your server process’s "current working directory" totest_documents. You probably want to do this at the very beginning of your program so that all paths can be relative to the document root. -
stat: use this system call to determine if a path is a directory or a file. Allocate a variable of typestruct statand pass the address of the struct tostat(along with the path string). On success, thestatcall will fill in the struct, and you can access the fields:-
Use the macro
S_ISDIR()and pass in thest_modefield of yourstruct statvariable.S_ISDIR()will return true (non-zero) if the path is a directory or false (zero) otherwise. -
You don’t need to worry about all the other fields of
struct statincludingS_ISCHR,S_ISBLK, etc.)
-
4.3. String Parsing and File I/O
-
Many of the tools you used in lab 1 for manipulating strings will also be helpful in lab 2.
-
If you need to copy a specific number of bytes from one buffer to another, and you’re not 100% sure that the data will be entirely text, use
memcpyrather thanstrncpy. The latter terminates early if it finds a null terminator (\0), whereasmemcpywill always copy the requested number of bytes. -
Similar to lab 1, you will likely find
fopento be helpful for opening files. This time, use a mode of"r", since you’ll only be reading files. Afterward, you can read the contents withfread. Don’t forget tofclosewhen done.
5. Other Reference Material
5.1. Socket Programming
The server side of socket programming has a few more system calls than a
client. Use man bind, man listen, and man accept to read through each of
these functions. Look through your starter code on github, and follow along
with the description of each of the system calls.
-
socket(): Like the client side, first create a socket. This time, we name itserver_socksince it’s going to serve a special purpose. Useserver_sockonly to accept new connections. Never useserver_sockwith calls tosendorrecv. -
setsockopt: The default behavior of TCP (implemented by the OS) is that if you bind to a port and terminate your program, the OS makes you wait for a minute before anyone else can bind to that port again. Setting theSO_REUSEADDRsocket options disables the waiting, which makes rapid debugging easier. -
bind(): Associate a socket with the IP address and port on which it should listen for incoming connections. A machine can have more than one network interface or IP address, usually if it connects to two different networks. Assign theINADDR_ANYmacro to thesockaddr's address to serve content on all the server’s IP interfaces. -
listen(): After binding to an address and port, uselistento begin allowing client connections. This function essentially opens the socket for business. Thebacklogparameter defines how many clients are allowed to wait in a queue for your server to accept them. -
while(1): A server is always on: enter an infinite loop, where the main body of the work is going to happen. We declare a secondsockinteger that will eventually represent a new client connection. -
accept(): finally, callacceptto connect to a new client. You pass the server socket as a parameter to accept. On success, it returns a new socket that represents your connection to the new client. Use that newly returned socket to communicate with the client viasendandrecv.
5.2. Threading
Some clients may be slow to complete a connection or send a request. To prevent
all other clients waiting on one slow client, your server should be able to
serve multiple clients concurrently, not just back-to-back. For this part of
the lab, we’ll use multithreading with pthreads to handle concurrent
connections.
-
Use pthread_create and pthread_detach after calling
acceptfor each new client. -
Unlike many of your prior experiences with threading (e.g., parallel GOL in CS 31), the threads in this assignment don’t need to coordinate their actions. This makes the threading relatively easy, and it’s something that can be added on after the main serving functionality is implemented. When starting out, organize your code such that it calls a function on any newly-accepted client sockets, and let that function do all the work for that connection. This will make adding
pthreadsupport quite simple! -
In your starter code you should see a
thread_detach_example.c. This is very similar to what you will be implementing. This function takes the number of threads as an input argument, and then it creates and detaches each thread. Each thread independently runsthread_function. The example passes one argument to each thread, an integer pointer. In your server, this will be the socket descriptor (integer) for a newly-accepted client.-
Inside of the
thread_function, you just have to cast the input back from a genericvoid *unknown type pointer to be an integer pointer. Then, you can dereference that pointer to get the value, after which you can free it. This is the main complexity in this part of the lab — wrangling pointers!
-
-
Finally, we have a call to the
pthread_detachfunction. This basically says I am creating a thread, it is going to go do something in the background, and I don’t need the thread to return a result — just exit once its done executing. Therefore the return value of ourthread_functionisNULLto satisfy avoid *return value. By detaching a thread, we are telling the OS to just clean it up once its done executing ourthread_function, without the need for callingpthread_join.
5.3. Providing a directory listing
-
Your web server should respond with a list of files when the user requests a directory that does not contain an index.html file.
-
Similar to opening a file with
fopenand reading from a file withfread, you can read the contents of a directory using theopendir,readdirandclosedircalls. -
That is, if you have a valid directory path, you can pass it to
opendirand store the result in a(DIR*)pointer. Just like a file pointer, every time you open a directory, you should close the directory withclosedir. -
Next, you can keep calling
readdir, which returns info for one file, on that(DIR*)pointer until it returnsNULL. Seeman readdirfor details. DO NOT attempt to free thestruct direntpointer thatreaddirreturns — the man page makes it very clear that you should not attempt to free that pointer! -
You can follow the following
htmlformat to create your directory listing (substitute/pathwith the actual path):<html> Directory listing for: /path/ <br/> <ul> <li><a href="your_dir_listing_with_slash/">"dir_name"</a></li> .... </ul> </html>
-
6. Submitting
Please remove any excessive debugging output prior to submitting.
To submit your code, commit your changes locally using git add and git
commit. Then run git push while in your lab directory.