CS 45 — Lab 1: Shell Redux

Checkpoint: Friday, January 31 (3-minute demos during lab)

Due: Thursday, February 6 @ 11:59 PM

1. Overview

For this lab, we’ll be revisiting your CS 31 shell assignment to add some new features like I/O redirection, pipes, and reading from the PATH environment variable. Rather than providing starter code, you’ll start from a previous implementation that meets the CS 31 requirements (e.g., your old submission).

1.1. Goals

Implement inter-process communication using pipes and manipulate process file descriptor resources.
Read from and make use of environment variables.
Work with a variety of system calls and adopt best practices for their use.
Practice/refresh C programming skills, particularly memory management.

1.2. Handy References

CS 31 Shell Lab
Manual pages: access, chdir, dup2, exec, fork, getenv, open, pipe, waitpid

1.3. Lab Recordings

Week 1

Week 2

2. Requirements

2.1. Parser

Similar to CS 31, you should implement a command line parser as a separate library that your shell will #include. The parser should build an ARGV array of tokens and identify any non-token elements of a command line. Note that you WILL need a more complex parser than you used for CS 31 to identify commands that are separated by |'s and to record I/O redirections without returning them as ARGV tokens.

For example, if the user inputs:

ls -l 2> error.txt | sort &

Your parser should identify that there are two commands, separated by a pipe, which should be run in the background:

ARGV array: ["ls", "-l", NULL]. Meaning: redirect standard error to file error.txt, pipe standard out to command #2.
ARGV array: ["sort", NULL]. Meaning: pipe standard in from command #1.

Note that the 2> error.txt | and & portions of the command are instructions to the shell and are NOT command ARGV tokens.

To simplify parsing, you may assume that nothing other than white space (spaces, newlines, etc.) will appear after an ampersand (&), there will be at most one ampersand, and that ampersands will not appear for any reason other than to specify background tasks.

2.2. Command Execution

Your shell should be able to execute simple commands in the foreground and background by reading from the PATH environment variable and using execv(), which expects a full path to the program in its first argument. If the user’s command starts with /, ./, or ../ (e.g., "/bin/ls"), you can pass it to execv as-is. Otherwise, (e.g., "ls") you need to search the PATH environment variable to find and construct the full path. A call to getenv("PATH") will return to you a string of directories separated by ':' characters. You should look through those directories, in order, until you find a matching program name. You can use the access() function to check if a path contains an executable program (e.g., access("/bin/ls", X_OK)).

You should NOT modify the string result of getenv. If you want to change it (e.g., to chop it up on ':' characters), make a copy first and hack up the copy. If you modify getenv’s result directly, you may be changing the process’s environment! (See the NOTES section of man getenv.)

2.3. Built-in Commands: cd and exit

If the user executes "cd path", your shell should change the working directory to the specified path using chdir(). You can execute the pwd command to print the current working directory (to test if cd is working). Your shell should also terminate if the user inputs exit.

2.4. Built-in Commands: history and !num

Your shell should keep a history of the most recent commands the user has entered using a circular queue (same as CS 31). If the user enters a command of "history", you should print the stored history of commands, each preceded by a number. If the user enters a command of !num (where num is an integer), it should re-execute the command that corresponds to that number.

Note that if the user inputs a !num command, the original command should be stored in the history, NOT "!num". For example, if I execute this sequence:

> ls
Makefile  cs45shell  cs45shell.c  parsecmd.c  parsecmd.h  parsecmd.o  sleeper  sleeper.c  tester  tester.c

> history
0 ls
1 history

> !0
Makefile  cs45shell  cs45shell.c  parsecmd.c  parsecmd.h  parsecmd.o  sleeper  sleeper.c  tester  tester.c

> history
0 ls
1 history
2 ls              <-- This is NOT "!0"
3 history

You may assume that built-in commands (history, !num, exit, and cd) will NOT appear in the same command as a pipe or an I/O redirect. That is, despite real shells handing commands like "history | grep …", your shell will not be asked to mix built-in commands with those features.

If the user enters a command that uses I/O redirection or pipes, the entire command should be saved in the history (the same as it was typed).

2.5. Piping

Your shell should support piping the output (standard out) from one command to the input (standard in) of another. A pipe is an inter-process communication (IPC) mechanism for two Unix processes running on the same machine. It’s a one-way communication channel between the two processes (one process always writes to one end of the pipe, the other process always reads from the other end of the pipe). The interface to a pipe is like a temporary file with two open ends — writes to one end by one process can be read from the other end by the another process. However, unlike a regular file, a read from a pipe results in the removal of data that was written by the other process.

The pipe() system call will create a pipe when given an array of two integers. It creates two file descriptors, one for the read end of the pipe (array element 0), the other for the write end of the pipe (array element 1).

Child processes inherit file descriptors from their parent, so you’ll want to create a pipe before calling fork() so that any newly-created processes will get a copy of the pipe when they’re forked. After the fork, a child can set up its file descriptors before calling execv() by using dup2() to associate the read or write end of the pipe with the appropriate file descriptor.

Placing a pipe between two commands causes the output of the first to feed into the input of the second. For example:

ls | sort

Says to take the output from ls and rather than printing it, instead send it as input to sort, which will sort it and then print it.

cat file.txt | grep blah

Says to take the output from cat (the contents of file.txt) and send it to the input of grep, which searches for all lines containing the string "blah". This is a silly way to run grep, since grep can read files on its own without cat, but it works well for testing pipes.

You are not required to support multiple pipes in a single command line, although I would encourage you to think about how you might make that happen (it’s not that different).

2.5.1. Creating a Pipe

Calling the pipe() system call creates two new file descriptors for a process: a read-end and a write-end. Any data written to the write-end will be available for reading from the read-end. When all of the FDs representing the write-end of a pipe are closed, any attempt to read from the pipe will return end-of-file (EOF) to the reader, signifying that no more data is coming.

When working with pipes in your shell, it’s important to close any parts of the pipe that you aren’t using. For example, if a child process is planning to write into a pipe, it should close the read-end, since the read-end isn’t useful to it. Similarly, a pipe reader should close the write-end.

Your parent shell will also need to close both ends after it has forked all of the child processes that need access to a shared pipe. If you don’t close all the write ends of a pipe, your shell will hang because the pipe’s reader will never get an EOF!

2.6. I/O Redirection

Your shell should support I/O redirection whereby files replace a process’s standard in (0), standard out (1), or standard error (2) file descriptor streams. The user can ask to replace standard in by supplying "< filename". For example:

grep blah < input_file.txt 1> output_file.txt

Says to use the file "input_file.txt" as file descriptor 0 rather than reading from terminal input and the file "output_file.txt" as file descriptor 1 rather than printing to standard out.

For output streams, the user must specify which I/O stream(s) to replace:

1> filename redirects the standard out stream to a file.
2> filename redirects the standard error stream to a file.

The user may also choose to redirect both streams, either to separate files or to the same one. To implement this functionality, most of the complexity is in the parser, since you’ll need to recognize that an I/O redirect is happening and treat the next token as the file name rather than an ARGV token. After parsing is complete and you’ve created a child process, you can use open() and dup2() to place the file at the appropriate file descriptor.

When opening files for writing, pay attention to the flags and modes you pass to open(), since you’ll need to create files with usable access permissions that don’t already exist. See the Tips section below.

Note that most shells will assume you mean "1>" if you just say ">", but you are not required to do that. For output streams, you may assume there will be an explicit 1 or 2 preceding the ">". You may also assume that the user won’t give you nonsense numbers (e.g., 3>).

2.7. Error Reporting

Your shell should report errors using perror() and continue executing whenever possible. If the main shell process (parent) makes a system call that fails (e.g., signal, fork, pipe) it’s fine to report the error and terminate, since you can’t really recover from those. For minor errors though (e.g., exec fails because the user’s command is invalid) a child process should terminate, but the shell should continue.

Always always always check the return value of any system call you make!

2.8. Cleanup

Your shell should reap all child processes when they terminate, close all file descriptors it’s no longer using, and get a clean bill of health from valgrind: no invalid memory accesses, uninitialized variables, or memory leaks.

3. Checkpoint

For the checkpoint, you should have a running shell program that can execute simple commands in the foreground and background and that implements the built-in commands cd and exit. You also should have support for parsing:

Commands with I/O re-direction.
Commands with a pipe.

You will demo your checkpoint to me during your lab section on Friday, Feb 1. Prior to your demo, create a list of a few simple commands that you can use to demo your shell’s meeting the checkpoint requirements (you can just cut and paste commands from your test suite during your demo). You only get a few minutes for your demo, so please plan it before lab. I also encourage you to go beyond just the checkpoint requirements in the first week, but I only need to see the required checkpoint functionality during your short demo.

4. Examples & Testing

4.1. Parser

During the lab meeting, I suggested a struct that looks like:

struct cmd {
    char **cmd1_argv;
    char **cmd2_argv;
    char *cmd1_fds[3];
    char *cmd2_fds[3];
}

This struct is just a suggestion, and it resembles what I chose to do in my implementation. You’re welcome to add extra fields or change the way you record parsing information. Ultimately, what you need to know is:

How many commands are there: just one, or two commands separated by a pipe?
For each command, which file descriptors are we asking to redirect, and what are the names of the files we’re redirecting to/from?
For each command, what is the ARGV array? That is, the same thing your CS 31 shell’s parser was returning: an array whose first element is the name of the program to execute, the middle elements are the arguments to that program, and the last element is NULL to signify the end.
Should we be running in the foreground or background?

If you find it helpful, it would be reasonable for you to add a flag to your struct that you can set to 0/1 depending on whether or not you find a pipe. Alternatively, you can simply set cmd2_args to NULL if you don’t find a pipe. It doesn’t really matter how you signal it as long as you know what you’ve chosen so that your shell can check for it later. You may also prefer to add a background flag to the struct rather than passing it in as a special argument. It doesn’t matter to me, and you have some flexibility in the design.

As far as the checkpoint goes, I’ll suggest something that will be useful for your debugging and for convincing me that your parser works correctly: add a function that prints a summary of the contents of your struct, whatever it happens to contain. I’ll give you some examples of what that might look like using the example struct above:

Plain command: ls -l

cmd1_args: ["ls", "-l", NULL]  /* three-element dynamically-allocated array */

cmd2_args: NULL  /* There is no second command here. */

cmd1_fds[0]: NULL  /* With no I/O redirects, all descriptors are NULL */
cmd1_fds[1]: NULL
cmd1_fds[2]: NULL

cmd2_fds[0]: NULL
cmd2_fds[1]: NULL
cmd2_fds[2]: NULL

Pipe only command: ls | sort

cmd1_args: ["ls", NULL]  /* two-element dynamically-allocated array containing */

cmd2_args: ["sort", NULL]  /* two-element dynamically-allocated array containing */

cmd1_fds[0]: NULL  /* With no I/O redirects, all descriptors are NULL */
cmd1_fds[1]: NULL
cmd1_fds[2]: NULL

cmd2_fds[0]: NULL
cmd2_fds[1]: NULL
cmd2_fds[2]: NULL

Redirects only: ls -l 1> out.txt 2> error.txt

cmd1_args: ["ls", "-l", NULL]  /* three-element dynamically-allocated array */

cmd2_args: NULL  /* There is no second command here. */

cmd1_fds[0]: NULL
cmd1_fds[1]: "out.txt"
cmd1_fds[2]: "error.txt"

cmd2_fds[0]: NULL  /* There is no second command here. */
cmd2_fds[1]: NULL
cmd2_fds[2]: NULL

Combined: grep -i blah < input.txt | sort 1> output.txt

cmd1_args: ["grep", "-i", "blah", NULL]  /* four-element dynamically-allocated array */

cmd2_args: ["sort", NULL]  /* two-element dynamically-allocated array containing */

cmd1_fds[0]: "input.txt"
cmd1_fds[1]: NULL
cmd1_fds[2]: NULL

cmd2_fds[0]: NULL
cmd2_fds[1]: "output.txt"
cmd2_fds[2]: NULL

Note: you can replace "blah" with the string you’d like to search for in the text.

You don’t need to worry about printing the "dynamically-allocated array" stuff, I just added that to make it more clear what those things are (ARGV arrays). If you can print your structs this way, it will make it MUCH easier for you to debug your parser and convince yourself/me that it’s working correctly.

4.2. Shell Testing

You may find it helpful to test with a program that writes to both stdout and stderr (already included in your repo). You can compile it with:

gcc -Wall -o standard_out_error standard_out_error.c

If you run it normally, (with no pipes or redirects), you’ll see that all of the output goes to your terminal, regardless of which stream is specified in fprintf. For more interesting tests, you can do things like:

./standard_out_error 1> out.txt 2> err.txt

The above should give you two files, each containing only one of the output streams.

You can also use pipes too:

./standard_out_error 2> err.txt | grep 5

The above should only print standard out’s line 5 to the terminal (due to grep’s text search behavior), but all of standard error should be captured to the file.

Note that you can also ask for things that don’t really make sense:

./standard_out_error 1> out.txt 2> err.txt | grep 5

It’s not well-defined what should happen in the above case, since you’re giving two conflicting instructions about what to do with the standard out stream. This example is a user error and is not a good test case.

5. Tips & FAQ

Spend a little time thinking about the design requirements before you dive into coding. In particular, what sort of helper functions might you want to have available? In the parser, for example, I found it very helpful to have a function to get the start of the next token and the token’s length. Designing with abstract functions like that in mind and then implementing them as small modules later is typically MUCH easier than trying to build one massive chunk of code all at once.
Run valgrind as you go, rather than waiting until the end. It will help you identify problems sooner!
If a system call fails, the perror() function will typically tell you why, in a nice, human-readable way. Take advantage of it, and don’t assume why a system call might be failing!
In C, you can only return one value from a function, but you can pass in as many pointers as you want. You can use pointer arguments to simulate multiple return values by having a function dereference an input pointer to set its value for the caller.
Your parser will need to distinguish between command line tokens that belong in the ARGV array vs. meta characters that are telling your shell what to do (e.g., |, &, and I/O redirects). I would suggest dealing with those special meta characters (and their arguments in the case of I/O redirection) first. Then, once you’ve learned what you need to know from them, you can overwrite them to be blank spaces and use the old code you had from CS 31 to parse ARGV lists.

In the case of pipes, you might consider overwriting '|' with a null-terminator ('\0') character, so that it splits the string into two halves, with each representing one of the two commands on either side of the pipe. Then, you might find it easier to re-use your existing CS 31 parser on each sub-command.

When opening files for I/O redirection, the flags and mode parameters are important. For input redirects, the file is read-only, so you can set flags to O_RDONLY and omit mode. For output redirects, you’ll need to OR together several flags:
```
O_WRONLY | O_CREAT | O_TRUNC
```
This collection says to open the file for writing only, create a new file if it doesn’t exist, and truncate it (overwrite) if it does exist. You’ll also need to set a mode when writing:
```
S_IRUSR | S_IWUSR
```
Setting the mode that way will grant you (the owner of the file) read and write permissions. For more details, run man 2 open.
If you end up with processes that aren’t going away, you can terminate them with the kill or pkill command. kill expects a process ID, whereas pkill searches for a process by program name. If need be, you can pass the -9 argument to send SIGKILL rather than the default SIGTERM.
If you’re adapting your CS 31 solution and you’re told that important functions or constants aren’t defined, make sure that you have all the necessary header files included. The CS 45 shell requires a few extra headers. The full list I used is:

#include <errno.h>
#include <fcntl.h>
#include <signal.h>
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/wait.h>
#include "parsecmd.h"

6. Submitting

Please remove any excessive debugging output prior to submitting.

To submit your code, commit your changes locally using git add and git commit. Then run git push while in your lab directory.