Lab 1: Shell Program
Due: Monday, September 22, before 1 am (late Sunday night)

Problem Introduction
Getting Started
Implementation Details
Useful Unix System Calls useful system calls for implementing your shell
Extra Credit
What to Hand in
Project Demo

This, and all, programming assignments in this class should be done with a partner. See the "Safe File Sharing" link under the Collaboration Tools section of my help pages for information about how you and your partner can safely share code.

Introduction

For this assignment, you will implement a Unix shell program. Your shell should support:

simple commands
the built-in commands cd and exit (not all built-in commands)
commands with a single pipe
running a command in the background (using &)
commands with I/O re-direction (see below for details of which I/O re-direction syntax you should implement).

In addition, your shell must gracefully handle errors (i.e. badly formed commands should not crash your shell program).

You must implement your shell in C (no C++). See C language links for information about C programming style, debugging tools, etc. You should use valgrind to help you remove all memory access errors from your program; I will check that your code is free of memory access errors and memory leaks when I grade it.

Getting Started

There are two well defined pieces of code to write. The first is a set of command line parsing routines that take as input the command line entered by the user and parse the command line into argv lists for exec. This includes using the value of PATH environment variable to locate the command if it is is not entered using its absolute path name, and identifying and/or handling special cases of command line parsing (commands with I/O redirection, run in the background, pipes, or built-in commands). The second is a set of functions for executing commands starting with a simple command entered as an absolute path name, then incrementally adding and testing more features: commands without absolute path name, pipes, I/O redirection, and built-in commands.

You and your partner should start by sketching out the design of your solution (use top-down design and think good modular design). Implement and test your code incrementally, and test individual functions in isolation as much as possible. For example, start with exec'ing a simple command given an absolute path name to the command. Once this works, move on to adding the next piece of functionality, test and debug it, then move on to adding the next piece, and so on. Use assert statements to test pre and post conditions of functions, and use gdb and valgrind to help you find and fix bugs.

In addition to being correct and robust to bad user input, your shell code should use good modular design, be efficient, and be well commented.

Implementation Details

A shell executes a loop:

print the shell prompt
read the next command line (terminates with '\n')
parse the command line (get each command name and its command line arguments, determine if there is I/O redirection, background, or a pipe in the command line)
locate the command's executable file
For commands entered using their absolute path name (eg. /bin/cat) locating the command is easy. Otherwise, you need to get the value of the PATH environment variable to get a list of locations in which to search for the command.
create the argv lists (for exec) for each command in the command line
execute the command(s) in the command line and wait for it to finish (unless it is run in the background)
repeat, until the user enters the exit command

In most cases, the shell program forks (creates) one or more child processes to execute the command line entered by the user. The shell program then waits for the child processes to finish before printing the next command prompt (if the command is run in the background, then the shell doesn't wait for its child process to finish executing the command). A shell also has support for several built-in functions that it executes directly instead of forking a child process to execute them. An example of a typical sequence of shell commands might be:

	myshell>  ls -la		# long listing of curr directory
		-rw-------    1 newhall  users         628 Aug 14 11:25 Makefile
		-rw-------    1 newhall  users          34 Aug 14 11:21 foo.txt
		-rw-------    1 newhall  users       16499 Aug 14 11:26 main.c

	myshell> cat foo 1>  foo.out	# cat foo's contents to file foo.out
	myshell> pwd			# print current working directory
		/home/newhall/public
	myshell> cd			# cd to HOME directory
	myshell> pwd
		/home/newhall
	myshell> firefox &		# run firefox in the background
	myshell>

Here cd is a built in command, and ls, pwd and cat are commands executed by a child of the shell process.

In general a shell command is in the form:

	commandname arg1 arg2 arg3 ...

To execute this command, the shell first checks if it is one of its built-ins, and if so invokes a functions to execute it. If it is not a built-in, the shell parses the command line into the command_argv_list, creates a child process to execute the command, and waits for the child to complete the execution of the command.

Creating a new child process is done using the fork system call, and waiting for the child process to exit is done using the wait system call. fork creates a new process that shares its parent's address space (both the child and parent process continue at the instruction immediately following the call to fork. In the child process, fork returns 0, in the parent process, fork returns the pid of the child. The child process will call execv to execute the command. For example:

  int child_pid = fork();
  if(child_pid == -1) {
    // fork failed...handle this error

  } else  if(child_pid == 0) { 
    // child process will execute code here to exec the command 
    ...
    execv(full_path_name, command_argv_list);

  } else {	
    // parent process will execute this code 
    ...
  }

The parent can call wait or waitpid to block until a child exits:

  // block until one of my child processes exits (any one):
  pid = wait(&status);	  // block until child process exits

  // OR to wait for a specific child process to exit:
  pid = waitpid(childpid, &status, 0);

The execv system call overlays the calling process's image with a new process image and begins execution at the starting point in the new process image (e.g. the main function in a C program). As a result, exec does not return unless there is an error (do you understand why this is the case?).

Reading and Parsing the Command line

You are welcome to use any valid C method for reading in and parsing the command entered by the user. However, it is important that your method is robust to bad input entered by the user such as entering an empty command (your shell should just print the prompt again) or an invalid command (your shell should print out an error message); crashing on bad input is not okay.

See the "File I/O" and "strings" parts of my C help pages for some basic information about C strings and input functions. In addition, a couple functions that may be useful are readline and strtok. See their man pages for more information about how to use these (be careful about who is responsible for allocating and freeing memory used by these routines). If you use readline, you need to link with the readline library:

gcc -g -o myshell myshell.c -lreadline

Getting the Value of Environment Variables

If a user enters a command such as the following:

	% cat foo.txt

the shell program needs to locate the cat executable file in the user's path.

Use the getenv system call to get the value of the PATH environment variable. PATH is an ordered list of paths in which you should search for the command. It is in the form:

    first_path:second_path:third_path: ...

For example, if the user's path is:

	 /usr/swat/bin:/usr/local/bin:/usr/bin:/usr/sbin

the shell should first look for the cat command file in /usr/swat/bin/cat. If it is not found there, then it should try /usr/local/bin/cat next, and so on.

To see your path: echo $PATH. To list the value of all your environment variables: env.

Note: There are versions of exec that do the shell path search for you. I do not want you to use these. Instead, your shell code should get the PATH environment variable and search for the command, and then construct the full path name of the command that it will pass to execv.

Shell Built-ins

The only shell built-in functions you need to handle are cd, and exit. For more information on shell built-in functions look at the man page for builtins. Shell built-in functions are not executed by forking and exec'ing an executable. Instead, the shell process executes them itself.

To implement the cd command, your shell should get the value of its current working directory (cwd) by calling getcwd() on start-up. When the user enters the cd command, you must change the current working directory by calling chdir(). Subsequent calls to pwd or ls should reflect the change in the cwd as a result of executing cd.

I/O Redirection

Your shell needs to handle I/O re-direction using the 1> 2> and < syntax for specifying sdin, sderr, and stout re-direction:

  myshell> foo 1> foo.out       # re-direct foo's stdout to file foo.out
  myshell> foo 2> foo.err       # re-direct foo's stderr to file foo.err
  myshell> foo 1> foo.out2 2> foo.out2   # re-direct foo's stdout & stderr to foo.out2
  myshell> foo <  foo.in                 # re-direct foo's stdin from file foo.out
  myshell> foo <  foo.in 1> foo.out      # re-direct foo's stdin and stdout

I/O re-direction using '>' or '>&' need not be supported. For example, the following command can be an error in your shell even though it is a valid Unix command:

  myshell> foo < foo.in >& foo.out2

Each process that is created (forked), gets a copy of its parent's file descriptor table. Every process has three default file identifiers in its file descriptor table, stdin (file descriptor 0), stout (file descriptor 1), and stderr (file descriptor 2). The default values for each are the keyboard for stdin, and the terminal display for stdout and stderr. A shell re-directs its child process's I/O by manipulating the child's file identifiers (think carefully about at which point in the fork-exec process this needs to be done). You will need to use the open, close and dup system calls to redirect I/O. For example, to re-direct a process's stdout to a file named foo.out, I'd do the following:

	fid = open("foo.out", O_WRONLY | O_CREATE, 0666);	// open output file
	close(1);		// close stdout
	dup(fid);		// dupicate file descriptor fid, the duplicate
				// will go in the 1st free slot in the process's
				// file descriptor table (i.e. slot 1, the one
				// just closed, which is the file descriptor 
				// to stdout) 
	close(fid);		// we don't need fid file descriptor for this file

Now when the process writes to stdout (file descriptor 1), the output will go to the file foo.out instead of to the terminal.

Pipes

A pipe is an interprocess communication (IPC) mechanism for two Unix process running on the same machine. It is a one-way communication channel between the two processes (one process always writes to one end of the pipe, the other process always reads from the other end of the pipe). Unix implements a file interface to pipes. A pipe is like a temporary file with two open ends, writes to one end by one process can be read from the other end by the another process. However, unlike a regular file, a read from a pipe results in the removal of data that was written to the pipe. The pipe system call is used to create a new pipe. It creates two file descriptors, one for the read end of the pipe, the other for the write end of the pipe. One of the communicating processes use the write end of the pipe to send a message to the other communication process who reads from the read end:

When your shell program executes a command with a single pipe like the following:

	myshell>  cat foo.txt | grep -i blah

cat's output will be pipe'd to grep's input. The shell process will fork two process (one that will exec cat the other that will exec grep), and will wait for both to finish before printing the next shell prompt. Use pipe and I/O redirection to set up communication between the two child processes.

The second process knows when to exit when it reads EOF on its input; any process blocked on a read will unblock when the file is closed. However, if multiple processes have the same file open, only the last close to the file will unblock processes blocked reading the file; only the last close really closes the file. Any time a process exits, all its open files are closed.

When you write programs that create pipes (or open files) and that fork processes, you need to be very careful about how many processes have the files open, so that EOF conditions can be detected by processes.

Useful Unix System Calls

getting the value of an environment variable

    path = getenv("PATH");
    cwd  = getenv("PWD");

chdir: change the current working directory (use this to implement cd)

fork-join: create a new child process and wait for it to exit:

    int cpid = fork();
    if (cpid == 0) { // the child process

    } else { // the parent process

        pid = waitpid(cpid, &status, 0);
    }

execv: overlay a new process image on calling process
```
    execv( full_path_name, command_argv_list);
```
access: check to see if a file is accessible in some way.
```
    access(full_path_name_of_file, X_OK | F_OK);
```

open, close, dup: for I/O Redirection

    // to re-direct stdout to file foo
    int fid = open("foo", O_WRONLY|O_CREAT, 0666);
    close(1);
    dup(fid);
    close(fid);

pipe: IPC mechanism for iter-machine processes

    int pipe_id[2];
    pipe(pipe_id);    
    read(pipe[0], in_buf, len);
    write(pipe[1], out_buf, len);

Test Commands

In testing your shell, if you are ever unsure about the output of a command line, try running the same command line in bash or tcsh and see what it does.

Here are some examples of commands that your shell should handle. I will likely test your shell programs using a more complete test suite, so I recommend testing your shell with more than these commands.

myshell> 
myshell> ls 
myshell> ls -al 
myshell> cat foo.txt 
myshell> cd /usr/bin 
myshell> ls 
myshell> cd ../ 
myshell> pwd 
myshell> cd 
myshell> find . -name foo.txt  
myshell> wc foo.txt
myshell> wc blah.txt
myshell> /usr/bin/ps
myshell> /usr/bin/../bin/ps
myshell> firefox 
myshell> exit

myshell> cat foo.txt | more 
myshell> cat foo.txt | grep blah 
myshell> cat foo.txt blah.txt 1> out.txt 2> out.txt  
myshell> wc out.txt    
myshell> cat < foo.txt 1> out2.txt   
myshell> diff out.txt out2.txt   
myshell> ls -la yeeha.txt 2> errorout.txt   
myshell> exit 

## test some error conditions
## your shell should gracefully handle 
## errors by printing a useful error message and not crash or exit (it
## should just restart its main loop: print shell prompt, ...)
myshell> | 
myshell> ./hello 	# assuming there is no hello executable
myshell> hello 
myshell> cat foo1> out   
myshell> 1> < 2>

Extra Credit

Try these mostly for fun and for a few extra credit points. However, do not try these until your basic shell program is complete, correct, robust, and bug free; an incomplete program with extra credit features will be worth much less than a complete program with no extra credit features.

Here are some additional features to try adding if you have time (some are much more difficult than others):

add support for the built-in command 'history' and for executing previous commands using '!num' syntax:
```
  myshell< history   # list the n most previous commands  (10 in this example)
     4  14:56   ls
     5  14:56   cd texts/
     6  14:57   ls
     7  14:57   ls
     8  14:57   cat hamlet.txt
     9  14:57   cat hamlet.txt | grep off
    10  14:57   pwd
    11  14:57   whoami
    12  14:57   ls
    13  14:57   history
  myshell< !8                # will execute command 8 from the history  
	
```
Implement history for a reasonable sized, but smallish, number of previous commands (50 would be good). And note that the command number is always increasing. Don't use the readline functionality to do this. Instead, implement a datastructure for storing a command history, and use it when implementing the built-in history command and !num syntax to execute previous commands.
Add support for tab completion or other nice windowing features. This likely will involve using the readline and/or ncurses libraries and linking them in when you build your shell:
```
   gcc -g -o myshell myshell.c -lncurses -lreadline
```
I have no idea how difficult it may be to add this feature.
add support to kill all child processes started in the background on your shell's exit (remember the default in Unix is orphaned children become children of init). Look at the man page for on_exit.

add support for any number of pipes in a command

   myshell< cat foo | grep blah | grep grrr | grep yee_ha

What to Hand in

Submit a single tar file with the following contents using cs45handin (see Unix Tools for more information on script, dos2unix, make, and tar):

A README file with:
1. Your name and your partner's name
2. The number of late days you have used so far
3. If you have not fully implemented all shell functionality then list the parts that work (and how to test them if it is not obvious) so that you can be sure to receive credit for the parts you do have working.
4. Tell me how to find comments in your sample output file (ex. They are prefixed by the character string "###")
All the source files needed to compile, run and test your code (Makefile, .c files, .h files, optional test scripts). Do not submit object or executable files. You should use a Makefile. See makefile howto for some examples.
A file containing the output from your testing of your shell program. Make sure to demonstrate:
1. simple Unix commands
2. built-in commands
3. I/O redirection
4. pipes
5. error conditions
You can capture the screen output of shell program using script (script takes an optional filename arg. Without it, script's output will go to a file named typescript)
```
> script			
Script started, file is typescript
 % ./myshell
  myshell> ls
  foo.txt        myshell   
  myshell> exit
  good bye
 % exit
exit
Script done, file is typescript
```
Then clean-up the typescript file by running dos2unix
```
> dos2unix typescript 		# you may need to run dos2unix more than
				# one time to remove all control chars
> mv typescript outputfile      
```
Finally, edit the output file by inserting comments around the specific parts that you tested that are easy for me to find and that explain what you are testing. The idea is for you to ensure that if you shell correctly implements some feature, that I can test it for that feature. By showing me an example of how you tested it for a feature and making sure that I can easily find your test it will make it more likely that I am able to verify that a feature works in your shell program. For example, you could put special characters before your comments so that they are easy for me to find (like using '#' in the following example):
```
#
# Here I am showing how my shell handles the built-in commands 
#
myshell> cd
...

#
# Here I am showing how my shell handles commands with pipes
#
myshell> cat foo.txt | grep blah
...
```

Project Demo

You and your partner will sign-up for a 15 minute demo of your shell program. The purpose of your demo is to:

show me what your shell can do (i.e. that it is complete and correct) and to show me any special features you added.
answer questions about the implementation of your shell.

It is up to you to determine how to organize your demo. You and your partner should practice your demo before you give it; come up with a list of commands that you will run that demonstrate that your shell is correct, complete, and robust and make sure to do a practice run. At the beginning of your demo, you should be logged in and ready to go.

Please come see me if you have questions about preparing for your demo.

Lab 1: Shell Program Due: Monday, September 22, before 1 am (late Sunday night)