CS43 Lab 4: Designing a Jukebox Protocol

Due: Tuesday, October 29, 11:59 PM

1. Handy references:

Server side:
- select() manual
- select() tutorial manual page
- Beej’s select() info.
- byte order in C (for binary protocols)
Client side:

2. Lab 4 Goals

Design and implement your own application-layer client-server Jukebox protocol.
Develop both the client and server protocol communication formats.
Develop a persistent server to connect with multiple interactive clients using event-based concurrency.
Write a client that uses producer-consumer threading model to receive song data, and play song data.

3. Overview

For this lab, you will be designing your very own music-streaming application! Since the protocol is your own, this time, you will be developing the functionality of both the client and the server.

Figure 1. The figure shows a client-server architecture, with multiple clients connected to a server. Both clients and server "speak" your jukebox protocol. Clients can make the following requests of the server: list, info <num>, play <num>, stop and exit. The server is single-threaded and maintains persistent connections with each of the incoming clients (until a client calls exit). The server keeps a list of song files, song info and client state associated with connected clients. In the figure, clients (1-4) are requesting "info 3", "play 1", "stop" and "list" respectively. The server is shown to keep this client state list as a table. Row 1 has client 1; info 3, Row 2 has client 2; play song 1, and so on.

#run your server on a different machine from the client
$ ssh lab_machine_1
$ ./server 5000 /home/chaganti/music/
$ #Song Listing#
# Display on the server side:
New connection (4)!
Client 4 request file list.
Client 4 disconnected:
  recv: Success
New connection (5)!
Client 5 now playing Cowboy Junkies - Sweet Jane.mp3
Client 5 requested stop
Ctrl+C # kill the server

# separately ssh into a client. PLEASE BE AT THIS LAB COMPUTER IF YOU PLAN TO   # PLAY SONG DATA
$ ssh lab_machine_2
$ python client.py lab_machine_1.cs.swarthmore.edu 5000
$ >> list  #enter command here
$ >> exit
$ python client.py lab_machine_1.cs.swarthmore.edu 5000
$ >>
$ >> play 1
$ >> stop

4. Requirements

Protocol Design: Thus far, you’ve seen a few different types of application-layer protocols that follow a client-server architecture (SMTP, HTTP (both text-based), DNS (binary protocol)). Having seen various protocol design trade-offs; think of the following when designing your protocol:
- using text vs. binary formats
- header information (what information should the client-server exchange for correctness?)
- header delimiters, format and size of each field
- client-state maintained by the server, to handle persistent connections
Client Commands: Your client should be interactive and it should know how to handle at least the following commands:
- list: Retrieve a list of songs that are available on the server, along with their ID numbers.
- info [song number]: Retrieve information from the server about the song with the specified ID number.
- play [song number]: Begin playing the song with the specified ID number. If another song is already playing, the client should switch immediately to the new one.
- stop: Stops playing the current song, if there is one playing.
Client Requirements:
- You may use any language you wish for the client, and you may use threads in the client. I strongly recommend Python for the client, as it makes playing audio much simpler.
- The client should not cache data. In other words, every time a client would like to either play/list/info, the item needs to be requested from the server. Each retrieved item should not be stored on the client side and repeated on subsequent requests!
Server: Event-driven Concurrency:
- You must use C to implement your server, and it must use select() (rather than threading) to provide concurrency.
- To simplify the file I/O, it’s fine for the server to keep its data, including the audio files, in memory, but the client should not store data that it isn’t actively using.

5. Server Design

Thread-based vs. Event-based Concurrency

There is a continued debate about thread vs. event-based programming in the systems community. To process a single client request, a server is mostly doing I/O rather than using CPU cycles. To keep the server busy (i.e., improve performance), we want to handle as many simultaneous client requests as possible. Fundamentally, the debate is about performance and scalability - with ease of programming (i.e., synchronization with threads harder to reason than single-threaded event-based programming).

5.1. Event-based programming and `select()`

Event-based concurrency uses:

a single thread for all client requests.
we will use select() for event-based concurrency. Using select() the program is only notified when there is space in the socket buffers to send/receive across the list of available clients.
To see an example of select please copy the following file into your github folder.
```
$ cp /home/chaganti/public/cs43/select-example.docx <your_github_folder>
```

Look up the man page on select() and take a look at the input parameters:

int select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout);

nfds: The first argument to the select function is (the largest numerical socket descriptor value in any of the subsequent FD_SET parameters) plus one. For example, if you’re populating your FD_SETs with socket descriptors 7, 9, and 10, your first argument to select should be 11 (10 + 1).
fd_set: select() takes as input variables that by convention are named, rfds and wfds of type fd_set. These are file or socket descriptors (fd stands for file descriptor). As far as the OS is concerned file and socket descriptors are equivalent.
FD_ZERO(input_var): clears out the set (input_var is usually either rfds - read/recv file descriptors or wfds - write/send file descriptors).
FD_SET: allows you to put a socket in the file descriptor set. For e.g., FD_SET(serv_sock, &rfds) puts the server socket (the one on which we will call bind, listen and accept on) into the read set. We will always put the server socket in this set.
When select() returns, make sure that as you check the FD_SET s , you differentiate between your server socket (that you accept connections on) and client sockets. If select tells you that your server socket is ready for reading, it means you can safely call accept.
When select() leaves a socket descriptor in your read/write FD_SET after it returns, it means that you can safely (without blocking) recv/send on that socket once, but not more than once.
If a client closes a connection while your server is blocked on select(), select will return that client’s socket in the set of file descriptors that are available for reading (usually named rfds). Then, when your server goes to recv() on that socket, it will get a return value of 0, which as we saw in lab 1, is recv() 's way of telling us that the connection is closed and no more data is coming (we’ve reached the EOF).
- Thus, as I’ve been harping on all semester, you should always check the return value of your system calls to check for these types of conditions so that your server can detect and account for disconnecting clients.
Please look at the Piazza posts for more information on select.

5.2. Packing/Unpacking in C

To pack your data, you can create a buffer like you normally would: char data[] or char * data = malloc(#num bytes).

To pack or unpack a single byte you can just index into the array using data[idx].

To pack or unpack more than one byte the byte-ordering functions are as follows. Look up their man pages for more info.

htons() - convert a short (2-byte int) from host byte order to network byte order

ntohs() - convert a short (2-byte int) from network byte order to host byte order

htonl() - convert a long (4-byte int) from host byte order to network byte order

ntohl() - convert a long (4-byte int) from network byte order to host byte order

For example, if you know that you have a short contained in positions 3 and 4 of a character array, you can call ntohs() on it as:
```
short result = ntohs(*((short *) &buf[3]));
```
Which says: take the address of the third offset in buf and cast the next two bytes to be a short. Then, dereference the pointer to the short data, to get a short value unpacked - i.e., 2 byte value starting at position 3.

5.3. Sending and Receiving Data

Use send() and recv() in as few places as possible. Never call either one on a socket unless select has told you that it’s safe to do so, otherwise, you’ll block (and potentially deadlock!).
Set your sockets to non-blocking mode for debugging: You can do this by calling set_non_blocking(int sock) in your starter code. This is not required (you should never block regardless because you should never try to send, recv, or accept unless select tells you it’s ok), but it will prevent deadlocking during your testing. Check their return values and errno for EAGAIN/EWOULDBLOCK, which indicate that you made a syscall that you shouldn’t have. That’s send/recv/accept 's way of telling you "I would have blocked on this call had this socket not been set for non-blocking mode."
Client closes a connection just before your server sends data: In this case, by default, two things will happen:
1. your process will receive the SIGPIPE signal, which by default, kills your process (you don’t want this to occur, since you still want to service other clients)
2. send will return an error and set errno to EPIPE to indicate the connection (pipe) was broken.
Luckily, we can easily prevent the SIGPIPE signal by using the extra flags parameter in send that we’ve been ignoring thus far. By setting the flags to MSG_NOSIGNAL, the kernel will only do (2), which is a much more convenient way for you to detect and handle a client disconnection.

6. Client Design

6.1. Sending requests to the server

You can use the Python readline module to get user-friendly command line behavior for reading commands from the user. (Done for you in provided client example.)

6.2. Producer-Consumer Model: Receiving data and Playing Music

At the client, you will have multiple forms of I/O going on at once: 1) receiving data from the server, 2) reading commands from user input, and 3) writing to the sound card (playing audio). Python has a threading module that will allow you to do these all at once. The module gives you threads, locks, and condition variables. I recommend using three threads in the client:

one thread for calling raw_input in a loop to get commands from the user (this is already there in the example code)
one thread for receiving data from the server. This thread sits in a tight loop that receives until it has gotten a full message, processes the messages, and then goes back to receiving.
one thread for playing received music.

Threads 2 and 3 will need to share a buffer (the song data). The receiver thread will be appending to the end of that buffer, while the player thread will be removing and playing data from the front of it.

6.3. Python `ao` and `mad` audio libraries

If you want your client to begin playing a new file, you need to create a new MadFile object. This tells mad (our audio library) to interpret the next bytes as the beginning of a new file (which includes some metadata) rather than the middle of the previously playing file.

7. Grading Rubric

This assignment is worth six points.

1 point for designing a reasonable protocol to solve the problem. Please include a very brief protocol reference in your submission.
1 point for successfully streaming an audio file over the network.
1.5 points for the server handling multiple concurrent connections, without blocking, via select(). Your server should NOT use threading. Using threads in your client is fine.
1.5 points for interleaving different messages (play, list, info, stop) between the client and server. That is, a client should be able to request and receive info while it is currently playing a song.
1 point for server resiliency - the server should be able to cope with clients joining and departing at any time. Be sure to check that your server does not crash when trying to both receive from and send to a disconnected client.

8. Tips / FAQ

START EARLY! The earlier you start, the sooner you can ask questions if you get stuck. Test your code in small increments. It’s much easier to localize a bug when you’ve only changed a few lines.
Wireshark will not help you for this lab, since you’re designing the protocol this time. Wireshark knows nothing about how to decode your protocol!

9. Submitting

Please remove any debugging output prior to submitting.

Please do not submit audio files to GitHub!

To submit your code, simply commit your changes locally using git add and git commit. Then run git push while in your lab directory.