CS45 Lab 1: Unix Shell Redux

Due Sunday Feb. 2 before 11:59pm
Checkpoint: due Monday Jan 27, demo'ed during lab

This lab will be done with your assigned lab partner: Lab 1 partners

For the checkpoint: you should have a running shell program that can execute simple commands run in the foreground and background and that implements the built-in commands cd and exit. You also should have support for parsing at least one of: (1) commands with I/O re-direction (handling at least parsing one of the three, but all three would be better) or (2) commands with a pipe. You will demo your checkpoint to me during lab week 2. Prior to your demo create a test suite of a few simple commands that you can use to demo your shell's meeting the checkpoint requirements (you can just cut and past commands from your test suite during your demo). You only get 3 minutes for your demo, so plan it out and practice it before lab. I also encourage you to get farther than this by the checkpoint due date, but I only want to see the required checkpoint functionality during your short demo.


Content:


Lab 1 Introduction
This lab is an more advanced version of the shell program you wrote in CS31. In particular, it can be viewed as an extension to the Fall 2013 version of the CS31 Unix shell lab. In this version, you will add some advanced shell features (pipes, I/O redirection) and a richer command line parsing interface using environment variables.

Required Features:
Your shell will supports the following required features:

  1. running simple commands in the foreground (e.g. ls -l -A)
  2. running a command in the background (using &)
  3. the built-in commands cd, and exit
  4. the built-in command history, and !! and !num
  5. commands with a single pipe
  6. commands with I/O re-direction
    (using this re-direction syntax: 1>, 2>, <).
  7. using the environment variable PATH to search for the command executable in the user's path
  8. using a signal handler on SIGCHLD to reap zombie children

Additional Requirements:
In addition to implementing the required shell functionality:


Getting Started
Setting up a shared git repo with starting point code:
First, one of you and your partner should set up a git repo for your lab 1 assignment and copy over the lab 1 starting point code. Follow these directions: setting up starting point for lab 1

After setting up your git repo and copying over my starting point files (which you are welcome to use or not), I suggest revisiting your CS31 shell solution and using it as a starting point or using parts of it as a guide for how to implement similar parts of this shell (be very careful about how this shell and the 31 shell differ). If you took 31 in Fall 2013, you can more easily use your CS31 shell program as a starting point for this one. If you took CS31 last year, use your shell program more as a guide for this one; start the design of this from scratch to match the specifications of this lab assignment, applying top-down design rather than trying to follow the structure of the CS31 shell program from last year and cramming in these new features. In either case you already have written shell main loop code for running commands in the foreground and background and some signal handling code that you may make use of in this shell.

Main Parts of this Lab:

There are two well defined pieces of code to write. The first is a set of command line parsing routines that take as input the command line entered by the user and parse it, and among other things create the argv string list for execv. The parsing routines must be able to identify and correctly parse command lines that include I/O redirection, running in the background, pipes, and built-in commands. The second is a set of functions for executing commands starting with a simple command, then incrementally adding and testing more features: built-in commands, commands with I/O redirection, commands with pipes.

Even though there are two well defined pieces of functionality to write, dividing the work up into these two pieces is not suggested; these are not necessarily both the same amount of work, and it is important that you both know how to write shell code to execute a command with a pipe, for example, and that you both know how to parse a complicated command line and use environment variables to find executable files in the user's path. The most effective way to work on this lab is for you and your partner to work together in the lab on both pieces.

You and your partner should start by sketching out the design of your solution (use top-down design, use good modular design, and design function prototypes). Implement and test your code incrementally, and test individual functions in isolation as much as possible. For example, start with exec'ing a simple command with command path and argv strings for a specific command hard-coded into your shell program. Once this works, move on to adding the next piece of functionality, test and debug it, then move on to adding the next piece, and so on.

You may want to add assert statements during testing to test pre and post conditions of your functions (see the man page for assert), and make use of gdb and valgrind to help you find and fix bugs. Be sure that any printf's or assert statements you add for debugging purposes are removed from (or commented out of) the code you submit.


Implementation Details

Shell Overview

(See the CS31 lab 7 assignment for more details on anything described in this section) A shell executes a loop:
  1. print the shell prompt
  2. read the next command line (terminated with '\n')
  3. parse the command line (get each command name and its command line arguments, determine if it is a built-in, and if there is I/O redirection, running in the background, or a pipe in the command line)
  4. create the argv lists for execv if it is not a built-in
  5. execute the command(s) in the command line. For a built-in execute the command directly. For non-built-ins, fork one or more child processes to execute the command and for wait for it/them to finish (unless it is run in the background.)
  6. repeat, until the user enters the exit command

In general commands are in the form:

	commandname arg1 arg2 arg3 ...
To execute this command, the shell first checks if it is one of its built-ins, and if so invokes a functions to execute it. If it is not a built-in, the shell parses the command line and creates a child process(es) to execute the command.

Creating a new child process is done using the fork system call, and waiting for the child process to exit is done using the waitpid system call. fork creates a new process that shares its parent's address space (both the child and parent process continue at the instruction immediately following the call to fork. In the child process, fork returns 0, in the parent process, fork returns the pid of the child. The child process will call execv to execute the command. For example:

  int child_pid = fork();
  if(child_pid == -1) {
    // fork failed...handle this error

  } else  if(child_pid == 0) { 
    // child process will execute code here to exec the command 
    ...
    execv(command_path, command_argv_list);

  } else {	
    // parent process will execute this code 
    ...
  }
The execv system call overlays the calling process's image with a new process image and begins execution at the starting point in the new process image (e.g. the main function in a C program). As a result, exec does not return unless there is an error (do you understand why this is the case?). The parent can call waitpid to block until a child exits:
  pid = waitpid(childpid, &status, 0);
or the parent can call waitpid in a signal handler on SIGCHLD to reap zombie children run in the background.

Parsing the command line and finding the command path name

Reading and Parsing the Command line

You are welcome to use any valid C method for reading in and parsing the command entered by the user. However, it is important that your method is robust to bad user input such as entering an empty command (your shell should just print the prompt again) or entering a badly formed command (your shell should print out an error message).

Your parsing code should be implemented as a separate library (in .h and .c files) that is used by the main shell program. See the CS31 lab 8 assignment for some information on how to structure this (your parsing functions for this lab will be different then these).

Part of the parsing process involves creating the arguments to execv:

    execv(command_path, command_argv_list);
The first argument is a string containing the path name to the command, the second is a list of strings, one for each command line argument, the list terminated by NULL. For example, the arguments to execute the command ls -l would look like this: execv("/bin/ls", arg_list); , where arg_list is:
  arg[0]: "/bin/ls"
  arg[1]: "-l"
  arg[2]: NULL

See the "File I/O" and "strings" parts of my C help pages for some basic information about C strings and input functions. A couple functions that may be useful are readline and strtok. If you use readline, you need to link with the readline library:

gcc -g -o myshell myshell.c -lreadline
Here is some information about using the readline library. Also, look at the man pages for C library functions and system calls, and be careful about who is responsible for allocating and freeing memory space passed to (and returned by) these routines.

Finding the Path Name to entered commands

If the user enters a command using its path name, like any of the following examples:
./a.out   # a relative path name 
/bin/ls   # an absolute path name
../foo    # a relative path name
then the path name to the command (that is the first argument to execv) is part of the command line that you read in.

However, if the user enters a command using only its name:

ls
foo
then you need to search the user's path to find the command executable file and construct the a full path name string to pass to execv:
    execv(full_path_to_command_executable, command_argv_list);
To do this, you will use user's PATH environment variable to try to locate the executable file in the user's path, and once found you will also test that the file is executable by the user before constructing the full path name string to pass to execv. Here are some details:

Shell Built-ins

Before implementing a built-in function, first test out the command in bash to see what bash does: this will help you figure out functionality and correctness for your implementation.

The only shell built-in functions you need to handle are cd, exit, and history (and !! and !num syntax for running commands from the history). Shell built-in functions are not executed by forking and exec'ing an executable, instead the shell process implements the command itself.

cd

To implement the cd command, your shell needs to change the value of the process' current working directory (cwd). Here are some examples:
cd /usr/bin  # should change cwd to /usr/bin
cd           # should change cwd to /home/you/
cd ..        # should change cwd to the parent of current directory
After your shell executes a cd command, subsequent executions of pwd and ls by your shell should reflect the change in the cwd that is the result of executing the cd command.

Here are some functions you may find helpful in implementing cd (see their man pages for more details):

history, !!, and !num

  myshell$ history   # list the n most previous commands  (10 in this example)
     4  14:56   ls
     5  14:56   cd texts/
     6  14:57   ls
     7  14:57   ls
     8  14:57   cat hamlet.txt
     9  14:57   cat hamlet.txt | grep off
    10  14:57   pwd
    11  14:57   whoami
    12  14:57   ls
    13  14:57   history

  myshell$ !8  # executes command 8 from the history (cat hamlet.txt)
  myshell$ !!  # executes the last command again (cat hamlet.txt)
Implement history for a reasonable sized, but smallish, number of previous commands (10-20 would be good, but design your program so that you can easily change this size), and note that the command number is always increasing. Commands from the history list could be any type (built-ins, run in the foreground, commands with pipe, ...), thus running a command using !num or !! syntax can result in your shell running any of these different types of commands.

Note that when !num !! are entered, the command line matching that command from the history is added as the most recent command line in the history list (e.g. the string "!9" would never show up in a listing when the user runs history). Try these out in bash to see how they work.

You may not use the readline library to implement history. Instead, implement a data structure for storing a command history, and use it when implementing the built-in history command and !num syntax to execute previous commands. Look at the CS31 shell lab for more information and hints on implementing the history command.


I/O Redirection and Pipes

I/O Redirection

Your shell needs to handle I/O re-direction using the 1>, 2>, and < syntax for specifying sdout, sderr, and stin re-direction. Here are some examples:
cat foo 1> foo.out       # re-direct cat's stdout to file foo.out
cat foo 2> foo.err       # re-direct cat's stderr to file foo.err
cat foo 1> out1 2> out2  # re-direct cat's stdout to out1 and stderr to out2
cat foo <  foo.in                 # re-direct cat's stdin from file foo.out
cat foo <  foo.in 1> foo.out      # re-direct cat's stdin and stdout
I/O re-direction using '>' or '>&' need not be supported. For example, the following command can be an error in your shell even though it is a valid Unix command:
  $ cat foo > foo.out2 

Each process that is created (forked), gets a copy of its parent's file descriptor table. Every process has three default file identifiers in its file descriptor table, stdin (file descriptor 0), stout (file descriptor 1), and stderr (file descriptor 2). The default values for each are the keyboard for stdin, and the terminal display for stdout and stderr. A shell re-directs its child process's I/O by manipulating the child's file identifiers (think carefully about at which point in the fork-exec process this needs to be done). You will need to use the open, close and dup system calls to redirect I/O.

Pipes

A pipe is an interprocess communication (IPC) mechanism for two Unix process running on the same machine. A pipe is a one-way communication channel between the two processes (one process always writes to one end of the pipe, the other process always reads from the other end of the pipe). A pipe is like a temporary file with two open ends, writes to one end by one process can be read from the other end by the another process. However, unlike a regular file, a read from a pipe results in the removal of data that was written by the other process. The pipe system call is used to create a new pipe. It creates two file descriptors, one for the read end of the pipe, the other for the write end of the pipe. One of the communicating processes use the write end of the pipe to send a message to the other communication process who reads from the read end:

When your shell program executes a command with a single pipe like the following:

	cat foo.txt | grep -i blah   
cat's output will be pipe'ed to grep's input. The shell process will fork two process (one that will exec cat the other that will exec grep), and will wait for both to finish before printing the next shell prompt. Use pipe and I/O redirection to set up communication between the two child processes.

The process reading from the pipe will automatically exit after it reads EOF on its input; any process blocked on a read will unblock when the write end is closed and "read" EOF. If multiple processes have the same file open, only the last close to the file will trigger this EOF behavior. Therefore, when you write programs that create pipes (or open files), you need to be very careful about how many processes have the files open, so that EOF conditions will be triggered.

Useful Unix System Calls
Below are some Unix system calls that you may find useful, and some examples of how to use them (note: the examples may not match exactly how you need to use them in your shell program). See their man pages for more details.
Submit

You will submit a single tar file containing your lab 1 solution code and the other specific contents listed below, using cs45handin Only one of you or your partner should submit your tar file via cs45handin. If you accidentally both submit it, send me email right away letting me know which of the two solutions I should keep and which I should discard. You can run cs45handin as many times as you like, and only the most recent submission will be recorded.

Your lab1 tar file should include the following (see Unix Tools for more information on script, dos2unix, make, and tar):

  1. A README file with:
    1. Your name and your partner's name
    2. If you have not fully implemented all shell functionality then list the parts that work (and how to test them if it is not obvious) so that you can be sure to receive credit for the parts you do have working.
    3. Tell me how to find comments in your sample output file (ex. They are prefixed by the character string "###")

  2. All the source files needed to compile, run and test your code (Makefile, .c files, .h files, and optional test scripts). Please do not submit object (.o) or executable files (I can build these using your Makefile).

  3. A file containing the output from your testing of your shell program. Make sure to demonstrate:
    1. simple Unix commands
    2. built-in commands
    3. I/O redirection
    4. pipes
    5. error conditions
    You can capture the screen output of shell program using script (script takes an optional filename arg. Without it, script's output will go to a file named typescript)
    $  script			
    Script started, file is typescript
     % ./myshell
      myshell$  ls
      foo.txt        myshell   
      myshell$  exit
      good bye
     % exit
    exit
    Script done, file is typescript
    
    Then clean-up the typescript file by running dos2unix
    $  dos2unix -f typescript 		
    # you may need to run dos2unix more than once to remove all control chars
    $   mv typescript outputfile      
    
    Finally, edit the output file by inserting comments around the specific parts that you tested that are easy for me to find and that explain what you are testing. The idea is for you to ensure that if you shell correctly implements some feature, that I can test it for that feature. By showing me an example of how you tested it for a feature and making sure that I can easily find your test it will make it more likely that I am able to verify that a feature works in your shell program. For example, you could put special characters before your comments so that they are easy for me to find (like using '#' in the following example):
    
    #
    # Here I am showing how my shell handles the built-in commands 
    #
    myshell$  cd
    ...
    
    #
    # Here I am showing how my shell handles commands with pipes
    #
    myshell$  cat foo.txt | grep blah
    ...