CS45 Lab 1

Lab 1: Shell Program
Due: Monday, September 19, before 1 am (late Sunday night)

This, and all, lab assignments should be done with a partner. See the Wed lab page for information about how you can set up a git repository for your joint lab 1 project.

Lab 1 Partners
Peter Ballen and Greg Rawson	Katherine Bertaut and Greg Taschuk	Sam Clark and Elliot Padgett
Chritina Duron and Kevin Li	Nick Felt and Nick Rhinehart	Jonathan Gluck and Kevin Pytlar
Phil Koonce and Steven Hwang	Jacob Lewin and Luis Ramirez	Ben Lipton and Allen Welkie
Niels Verosky and Steve Dini	Catie Meador and Becca Roelofs	Andrew Stromme and Sam White

Content:

Problem Introduction
Getting Started
Implementation Details
Useful Unix System Calls useful system calls for implementing your shell
Extra Credit
What to Hand in
Project Demo

Introduction

For this lab, you will implement a Unix shell that supports:

simple commands
the built-in commands cd, and exit (not all built-in commands)
commands with a single pipe
running a command in the background (using &)
commands with I/O re-direction (see below for details of which I/O re-direction syntax you should implement).

In addition to implementing the required shell functionality:

your shell must be implemented in C (no C++)
you must use the execvp exec function
you must use a makefile to build your shell program
your shell should be robust to bad user input and gracefully handle errors
your shell must be free of all memory access errors (run valgrind to verify this)
your code should use good modular design, be efficient, and well commented.

See my C language links for information about C programming style, debugging tools, etc.

Getting Started

There are two well defined pieces of code to write. The first is a set of command line parsing routines that take as input the command line entered by the user and parses its, and among other things creates the argv string list for execvp. The parsing routines must be able to identify and correctly parse command lines that include I/O redirection, running in the background, pipes, or built-in commands. The second piece is a set of functions for executing commands starting with a simple command, then incrementally adding and testing more features: built-in commands, commands with I/O redirection, commands with pipes.

Even though there are two well defined pieces of functionality to write, dividing the work up into these two pieces is not suggested; these are not necessarily both the same amount of work, and it is important that you both know how to write shell code to execute a command with a pipe, for example, and that you both know how to parse a command line. I think the most effective way to work on this lab is for you and your partner to work together in the lab on both pieces. You may go off and do some parts independently, but come back together often to work on figuring out portions of the task, and to do incremental testing and debugging (two pairs of eyes are better than one.)

You and your partner should start by sketching out the design of your solution (use top-down design, use good modular design, and design function prototypes). Implement and test your code incrementally, and test individual functions in isolation as much as possible. For example, start with exec'ing a simple command with command path and argv strings for a specific command hard-coded into your shell program. Once this works, move on to adding the next piece of functionality, test and debug it, then move on to adding the next piece, and so on.

You may want to add assert statements during testing to test pre and post conditions of your functions (see the man page for assert), and make use of gdb and valgrind to help you find and fix bugs. Be sure that any printf's or assert statements you add for debugging purposes are removed from (or commented out of) the code you submit.

You could use the .c, .h and Makefile from Wednesday's lab as a starting point for your shell program:
~newhall/public/cs45/lab1

Implementation Details

A shell executes a loop:

print the shell prompt
read the next command line (terminated with '\n')
parse the command line (get each command name and its command line arguments, determine if it is a built-in, and if there is I/O redirection, running in the background, or a pipe in the command line)
create the argv lists for exec
execute the command(s) in the command line and wait for it to finish (unless it is run in the background.)
repeat, until the user enters the exit command

In most cases, the shell forks (creates) one or more child processes to execute the command entered by the user, and waits waits for the child processes to finish before printing the next command prompt A shell also has support for several built-in functions that it executes directly instead of forking a child process to execute them. In the following sequence of shell commands (myshell$ is my shell prompt):

	myshell$  ls -la		# long listing of curr directory
		-rw-------    1 newhall  users         628 Aug 14 11:25 Makefile
		-rw-------    1 newhall  users          34 Aug 14 11:21 foo.txt
		-rw-------    1 newhall  users       16499 Aug 14 11:26 main.c

	myshell$  cat foo 1>  foo.out	# cat foo's contents to file foo.out
	myshell$  pwd			# print current working directory
		/home/newhall/public
	myshell$  cd			# cd to HOME directory
	myshell$  pwd
		/home/newhall
	myshell$  firefox &		# run firefox in the background
	myshell$

cd is a built-in commands, and pwd, ls, and cat are commands executed by a child forked by the shell process.

In general commands are in the form:

	commandname arg1 arg2 arg3 ...

To execute this command, the shell first checks if it is one of its built-ins, and if so invokes a functions to execute it. If it is not a built-in, the shell parses the command line creates a child process to execute the command, and waits for the child to complete the execution of the command.

Creating a new child process is done using the fork system call, and waiting for the child process to exit is done using the wait system call. fork creates a new process that shares its parent's address space (both the child and parent process continue at the instruction immediately following the call to fork. In the child process, fork returns 0, in the parent process, fork returns the pid of the child. The child process will call execvp to execute the command. For example:

  int child_pid = fork();
  if(child_pid == -1) {
    // fork failed...handle this error

  } else  if(child_pid == 0) { 
    // child process will execute code here to exec the command 
    ...
    execvp(command_name, command_argv_list);

  } else {	
    // parent process will execute this code 
    ...
  }

The parent can call wait or waitpid to block until a child exits:

  // block until one of my child processes exits (any one):
  pid = wait(&status);	  // block until child process exits

  // OR to wait for a specific child process to exit:
  pid = waitpid(childpid, &status, 0);

The execvp system call overlays the calling process's image with a new process image and begins execution at the starting point in the new process image (e.g. the main function in a C program). As a result, exec does not return unless there is an error (do you understand why this is the case?).

Reading and Parsing the Command line

You are welcome to use any valid C method for reading in and parsing the command entered by the user. However, it is important that your method is robust to bad user input such as entering an empty command (your shell should just print the prompt again) or entering a badly formed command (your shell should print out an error message).

Part of the parsing process involves creating the arguments to exec:

    execvp(command_name, command_argv_list);

The first argument is a string containing the command name, the second is a list of strings. The first string int the list is the command name, followed by one string per command line argument, followed by the NULL string. For example, the arguments to execute the command ls -l would look like this: execvp("ls", arg_list); , where arg_list is:

  arg[0]: "ls"
  arg[1]: "-l"
  arg[2]: NULL

See the "File I/O" and "strings" parts of my C help pages for some basic information about C strings and input functions. A couple functions that may be useful are readline and strtok. If you use readline, you need to link with the readline library:

gcc -g -o myshell myshell.c -lreadline

Here is some information about using the readline library. Also, look at the man pages for C library functions and system calls, and be careful about who is responsible for allocating and freeing memory space passed to (and returned by) these routines.

Shell Built-ins

The only shell built-in functions you need to handle are cd, and exit. For more information on shell built-in functions look at the builtins man page. Shell built-in functions are not executed by forking and exec'ing an executable. Instead, the shell process performs the command itself.

To implement the cd command, your shell should get the value of its current working directory (cwd) by calling getcwd on start-up. When the user enters the cd command, you must change the current working directory by calling chdir(). Subsequent calls to pwd or ls should reflect the change in the cwd as a result of executing cd.

I/O Redirection

Your shell needs to handle I/O re-direction using the 1>, 2>, and < syntax for specifying sdout, sderr, and stin re-direction:

  foo 1> foo.out       # re-direct foo's stdout to file foo.out
  foo 2> foo.err       # re-direct foo's stderr to file foo.err
  foo 1> foo.out2 2> foo.out2   # re-direct foo's stdout & stderr to foo.out2
  foo <  foo.in                 # re-direct foo's stdin from file foo.out
  foo <  foo.in 1> foo.out      # re-direct foo's stdin and stdout

I/O re-direction using '>' or '>&' need not be supported. For example, the following command can be an error in your shell even though it is a valid Unix command:

  $ cat foo > foo.out2

Each process that is created (forked), gets a copy of its parent's file descriptor table. Every process has three default file identifiers in its file descriptor table, stdin (file descriptor 0), stout (file descriptor 1), and stderr (file descriptor 2). The default values for each are the keyboard for stdin, and the terminal display for stdout and stderr. A shell re-directs its child process's I/O by manipulating the child's file identifiers (think carefully about at which point in the fork-exec process this needs to be done). You will need to use the open, close and dup system calls to redirect I/O.

Pipes

A pipe is an interprocess communication (IPC) mechanism for two Unix process running on the same machine. A pipe is a one-way communication channel between the two processes (one process always writes to one end of the pipe, the other process always reads from the other end of the pipe). A pipe is like a temporary file with two open ends, writes to one end by one process can be read from the other end by the another process. However, unlike a regular file, a read from a pipe results in the removal of data that was written by the other process. The pipe system call is used to create a new pipe. It creates two file descriptors, one for the read end of the pipe, the other for the write end of the pipe. One of the communicating processes use the write end of the pipe to send a message to the other communication process who reads from the read end:

When your shell program executes a command with a single pipe like the following:

	cat foo.txt | grep -i blah

cat's output will be pipe'ed to grep's input. The shell process will fork two process (one that will exec cat the other that will exec grep), and will wait for both to finish before printing the next shell prompt. Use pipe and I/O redirection to set up communication between the two child processes.

The pipe reading process knows when to exit when it reads EOF on its input; any process blocked on a read will unblock when the file is closed. However, if multiple processes have the same file open, only the last close to the file will trigger an EOF. Therefore, when you write programs that create pipes (or open files), you need to be very careful about how many processes have the files open, so that EOF conditions will be triggered.

Useful Unix System Calls

Here are some Unix system calls that you may find useful, and some examples of how to use them (note: the examples may not match exactly how you need to use them in your shell program):

chdir: change the current working directory (use this to implement cd)

fork-join: create a new child process and wait for it to exit:

  int cpid = fork();
  if (cpid == -1) { // fork failed, handle the error 

  } else if (cpid == 0) { // the child process 

  } else { // the parent process 
    pid = waitpid(cpid, &status, 0); 
  }

execvp: overlay a new process image on calling process
```
    execvp( command_name, command_argv_list);
```

open, close, dup: for I/O Redirection For example to re-direct stdout to file foo:

    int fid = open("foo", O_WRONLY | O_CREAT, 0666);  // open file foo
		close(1);  // close stdout's file descriptor (slot 1 in open file table)
		dup(fid);  // dup file descriptor fid, the duplicate open entry will go
               // into the first free slot in the open file table (slot 1)
		close(fid); // we don't need fid open any more

Now when the process writes to stdout (to file descriptor 1), the output will go to the file foo instead of to the terminal.

pipe: IPC mechanism for iter-machine processes

    int pipe_id[2];
    pipe(pipe_id);    
    read(pipe[0], in_buf, len);
    write(pipe[1], out_buf, len);

Test Commands

In testing your shell, if you are ever unsure about the output of a command line, try running the same command line in bash and see what it does.

Here are some examples of commands that your shell should handle (this is not a complete test suite; you should come up with a much more complete test suite for your shell):

myshell$ 
myshell$  ls 
myshell$  ls -al 
myshell$  cat foo.txt 
myshell$  cd /usr/bin 
myshell$  ls 
myshell$  cd ../ 
myshell$  pwd 
myshell$  cd 
myshell$  find . -name foo.txt  
myshell$  wc foo.txt
myshell$  wc blah.txt
myshell$  /usr/bin/ps
myshell$  ps
myshell$  firefox 
myshell$  exit

myshell$  cat foo.txt | more 
myshell$  cat foo.txt | grep blah 
myshell$  cat foo.txt blah.txt 1> out.txt 2> out.txt  
myshell$  wc out.txt    
myshell$  cat < foo.txt 1> out2.txt   
myshell$  diff out.txt out2.txt   
myshell$  ls -la yeeha.txt 2> errorout.txt   
myshell$  exit 

## test some error conditions
## your shell should gracefully handle 
## errors by printing a useful error message and not crash or exit (it
## should just restart its main loop: print shell prompt, ...)
myshell$  | 
myshell$  ls 1 >  out 
myshell$  cat foo1> out   
myshell$  1> < 2>

Extra Credit

Try these mostly for fun and for a few extra credit points. However, do not try these until your basic shell program is complete, correct, robust, and bug free; an incomplete program with extra credit features will be worth much less than a complete program with no extra credit features.

Here are some additional features to try adding if you have time (some are much more difficult than others):

Modify your shell to use the execv instead of execvp
```
    execv(full_path_to_command_executable, command_argv_list);
```
execv's first argument is the full path name to the command's executable file. This change requires that your shell searches for the executable file using the user's PATH environment variable, and that it tests permissions on the executable file to ensure that it is executable by this user. Some details:
- You will need to change command line parsing in your shell to find the command executable in the user's path. For commands entered using the path name to the executable (e.g. /bin/cat or ../myprog ), locating the command is easy (e.g. it is just /bin/cat or ../myprog), for commands entered without the pathname (e.g. cat), you need to find the full path name to the cat executable using the user's PATH environment variable. For example, if a user enters a command such as the following:
```
  % cat foo.txt
```
  the shell program needs to locate the cat executable file in the user's path.
- Use the getenv system call to get the value of the PATH environment variable:
```
    path = getenv("PATH");
```
  path is an ordered list of directory paths in which to search for the command. It is in the form:
```
    "first_path:second_path:third_path: ..."
```
  For example, if the user's path is:
```
   /usr/swat/bin:/usr/local/bin:/usr/bin:/usr/sbin
```
  the shell should first look for the cat executable file in /usr/swat/bin/cat. If it is not found there, then it should try /usr/local/bin/cat next, and so on.
  To see your path: echo $PATH. To list the value of all your environment variables: env.
- access: check to see if a file is accessible in some way.
```
  // check if is this file exists and is executable
  access(full_path_name_of_file, X_OK);  
```
- Your shell needs to detect and handle the error of a non-existing command
Add support for the built-in command 'history' and for executing previous commands using '!num' syntax:
```
  myshell< history   # list the n most previous commands  (10 in this example)
     4  14:56   ls
     5  14:56   cd texts/
     6  14:57   ls
     7  14:57   ls
     8  14:57   cat hamlet.txt
     9  14:57   cat hamlet.txt | grep off
    10  14:57   pwd
    11  14:57   whoami
    12  14:57   ls
    13  14:57   history
  myshell< !8                # will execute command 8 from the history  
	
```
Implement history for a reasonable sized, but smallish, number of previous commands (50 would be good). And note that the command number is always increasing. Don't use the readline functionality to do this. Instead, implement a data structure for storing a command history, and use it when implementing the built-in history command and !num syntax to execute previous commands.
Add support for tab completion or other nice windowing features. This likely will involve using the readline and/or ncurses libraries and linking them in when you build your shell:
```
   gcc -g -o myshell myshell.c -lncurses -lreadline
```
Add support to kill all child processes started in the background on your shell's exit (remember the default in Unix is orphaned children become children of init). Look at the man page for atexit.

add support for any number of pipes in a command

   myshell$ cat foo | grep blah | grep grrr | grep yee_ha

Come up with other additional shell features to add. Try out tcsh or bash functionality and see if you can add it to your shell.

What to Hand in

Submit a single tar file with the following contents using cs45handin (see Unix Tools for more information on script, dos2unix, make, and tar):

A README file with:
1. Your name and your partner's name
2. The number of late days you have used so far
3. If you have not fully implemented all shell functionality then list the parts that work (and how to test them if it is not obvious) so that you can be sure to receive credit for the parts you do have working.
4. Tell me how to find comments in your sample output file (ex. They are prefixed by the character string "###")
All the source files needed to compile, run and test your code (Makefile, .c files, .h files, and optional test scripts). Please do not submit object (.o) or executable files (I can build these using your Makefile).
A file containing the output from your testing of your shell program. Make sure to demonstrate:
1. simple Unix commands
2. built-in commands
3. I/O redirection
4. pipes
5. error conditions
You can capture the screen output of shell program using script (script takes an optional filename arg. Without it, script's output will go to a file named typescript)
```
$  script			
Script started, file is typescript
 % ./myshell
  myshell$  ls
  foo.txt        myshell   
  myshell$  exit
  good bye
 % exit
exit
Script done, file is typescript
```
Then clean-up the typescript file by running dos2unix
```
$  dos2unix typescript 		
# you may need to run dos2unix more than once to remove all control chars
$   mv typescript outputfile      
```
Finally, edit the output file by inserting comments around the specific parts that you tested that are easy for me to find and that explain what you are testing. The idea is for you to ensure that if you shell correctly implements some feature, that I can test it for that feature. By showing me an example of how you tested it for a feature and making sure that I can easily find your test it will make it more likely that I am able to verify that a feature works in your shell program. For example, you could put special characters before your comments so that they are easy for me to find (like using '#' in the following example):
```
#
# Here I am showing how my shell handles the built-in commands 
#
myshell$  cd
...

#
# Here I am showing how my shell handles commands with pipes
#
myshell$  cat foo.txt | grep blah
...
```

Project Demo

You and your partner will sign-up for a 15 minute demo of your shell program. The purpose of your demo is to:

show me what your shell can do (i.e. that it is complete and correct) and to show me any special features you added.
answer questions about the implementation of your shell.

It is up to you to determine how to organize your demo. You and your partner should practice your demo before you give it; come up with a list of commands that you will run that demonstrate that your shell is correct, complete, and robust and make sure to do a practice run. At the beginning of your demo, you should be logged in and ready to go.

Please come see me if you have questions about preparing for your demo.

Lab 1: Shell Program Due: Monday, September 19, before 1 am (late Sunday night)