CS31 Lab 8:
Implementing a library in C

Due before 11:59pm, Thursday Dec. 4
Note: Due to Thanksgiving break, this lab is not due until the Thursday after break.

This lab should be done with a partner of your choosing.

Lab 7 Goals:

Lab 8 Introduction
In this assignment you and your partner will implement a C library (the .c and .h parts) and code to test the functions in your library. You will be implementing the parsecmd library, one function of which you used in your shell lab program. Your compiled library (parsecmd.o), should be able to be used in place of the one I gave you in lab 7, and linked into your cs31shell executable file.

The setup procedure for this lab will be similar to previous labs. First, both you and your partner should run setup31 to grab the starting point code for this assignment. Suppose users molly and tejas which to work together. Molly (mdanner1) can start by running

  [~]$ setup31 labs/08 tdanner1
Once the script finishes, Tejas (tdanner1) should run
  [~]$ setup31 labs/08 mdanner1

For the next step only one partner should copy over the starting code

  [~]$ cd ~/cs31/labs/08
  [08]$ cp -r ~lammert/public/cs31/labs/08/* ./
  [08]$ ls
  Makefile  parsecmd.c  parsecmd.h  tester.c
Now push the changes to your partner
[08]$ git add *
[08]$ git commit -m "lab 8 start"
[08]$ git push
Your partner can now pull the changes. In this case if Tejas wishes to get files Molly pushed, he would run
[~]$ cd ~/cs31/labs/08
[08]$ git pull

Starting Point Code:


Project Details, Requirements and Hints
You will implement a parsecmd library that contains functions to parse a command line string into its individual command line arguments, and construct an argv list of strings from the command line args. Your library functions can then be used by other programs by #including your parsecmd.h file and linking in your parsecmd.o binary on the gcc command line:
gcc -g -o tester tester.c parsemd.o

Information on building and using C libraries

Read over the "CREATING AND USING YOUR OWN LIBRARY CODE" section of the following (this is also available off Prof. Newhall's C help pages): Building and Using libraries in C. This gives an introduction to writing .h files, implementing library code .c files, and compiling and linking library code into C application code that uses the library. For this assignment, you will build your library code as a single object file (.o file). The Makefile provided with the starting point code already does this for you.

Using the parsecmd library (ex. tester.c)

The parsecmd.h file contains the interface to your library. Applications using the parsecmd library should #include it:
#include "parsecmd.h"
parsecmd.h contains two function prototypes for the functions you will implement in parsecmd.c.

Implementing the parsecmd library (parsecmd.c)

Both functions in the parsecmd library take in a command line string (like in the shell lab), and parse it into a argv list (an array of strings one per command line argument). They both test for an ampersand in the command line indicating a run in the background command, and "return" a value indicating if the command should be run in the background or not. For example, if the user enters the follow command line string:
$ cat foo.tex
These functions will be passed the string:
"cat foo.tex\n"   
And will parse the command line string into the argv array:
argv [0] ---->"cat"
argv [1] ---->"foo.tex"
argv [2] ----|  (NULL)
The main difference between the two functions is that the first uses a single statically declared char array into which will be each each argv[i] string, and the second function dynamically allocates space for both the argv array and for each string of command line argument.

The parse_cmd function

/* 
 * parse_cmd - Parse the command line and build the argv array.
 *    cmdline: the command line string entered at the shell prompt
 *             (const means that the function will not modify the cmdline string)
 *    argv:  an array of size MAXARGS of char *
 *           parse_cmd will initialize its contents from the passed
 *           cmdline string.
 *    returns: non-zero if the command line includes &, to
 *             run in the background, or zero if not 
 */
int parse_cmd(const char *cmdline, char *argv[]);
This function will initialize the passed argv array to point into substrings that it creates in a global char buffer (initialized to a copy of the passed command line string). The buffer is already declared as static global char array in parsecmd.c:
static char cmdline_copy[MAXLINE];
The parse_cmd function will:
  1. make a copy of the cmdline string in its copy buffer
  2. process its copy of the string to find tokens, modifying the cmdline_copy buffer to create substrings for each token. A token is a sequence of non-white space chars, each separated by at least one whitespace character (or by & ). Tokens should not include &, which has special meaning in command lines.
  3. assign each argv[i] bucket to point to its corresponding substring token in the buffer. Remember that the a NULL value in an argv[i] bucket is used to signify the end of the list of argv strings.

For example, if the command line entered is the following

  ls -1 -a &
The command line string associated with this entered line is:
"    ls   -l   -a  &\n"
the copy of it in the cmdline_copy buffer looks like:
cmdline_copy 0 | ' ' | 
             1 | ' ' | 
             2 | 'l' | 
             3 | 's' | 
             4 | ' ' | 
             5 | ' ' | 
             6 | '-' | 
             7 | 'l' | 
             8 | ' ' | 
             9 | ' ' | 
            10 | '-' | 
            11 | 'a' | 
            12 | ' ' | 
            13 | '&' | 
            14 | '\n'| 
            15 | '\0'| 
Your function will TOKENIZE this string and set each argv array bucket to point into the start of its associated token string in the char buffer (cmdline_copy array):
                                0     1     2     3  
                             ------------------------
                        argv |  *  |  *  |  *  |  * |
                             ---|-----|-----|-----|--
cmdline_copy 0 | ' ' |          |     |     |     | 
             1 | ' ' |          |     |     |     | 
             2 | 'l' |<----------     |     |    ---- 
             3 | 's' |                |     |    (NULL)  
             4 | '\0'|                |     | 
             5 | ' ' |                |     | 
             6 | '-' |<----------------     | 
             7 | 'l' |                      | 
             8 | '\0'|                      | 
             9 | ' ' |                      | 
            10 | '-' |<----------------------- 
            11 | 'a' | 
            12 | '\0'| 
            13 | '&' | 
            14 | '\n'| 
            15 | '\0'| 
Note the changes to the cmdline_copy string contents and the assignment of argv bucket values into different starting points in the char buffer. Printing out the argv strings in order will list the
ls
-l
-a
The function should return 1 if there is an ampersand in the command line or 0 otherwise (so, 1 in the above example)

The parse_cmd_dynamic function

There are two main problems with the previous function:
  1. It assumes fixed-size max values for the command line string and the argv list. If a user enters a longer command line string than MAXLINE or with more than MAXARGS, bad memory access errors will ensue.
  2. It uses a single global character buffer into which the tokenized version of the command line string is parsed. This means that the caller has to use the argv return strings before another call to the parse_cmd function is made (since it will overwrite the buffer with the new command line string that it tokenizes). For use in the shell program this version is okay (do you understand why?), but it limits the "general purpose-ness" of this function.
The parse_cmd_dynamic function solves these two problems by dynamically allocating and returning the argv array of strings, one for each command line argument.
/*
 * parse_cmd_dynamic - parse the passed command line into an argv array
 *
 *    cmdline: the command line string entered at the shell prompt
 *             (const means that this function cannot modify cmdline)
 *         bg: sets the value pointed to by bg 1 if command line is run in 
 *             background, 0 otherwise (a pass-by-reference parameter)
 *
 *    returns: a dynamically allocated array of strings, each element
 *             stores a string corresponding to a command line argument
 *             (the caller is responsible for freeing the returned
 *             argv list).
 */
char **parse_cmd_dynamic(const char *cmdline, int *bg);
This function will find tokens much like the previous version. However, it must also determine how many tokens are in the cmdline string, malloc EXACTLY the right number of argv buckets for the particular cmdline string (remember an extra bucket at the end for NULL), and then fore each token it will malloc up exactly enough space for a a char array to store the string corresponding to a command line argument (remember an extra bucket for the terminating '\0' character).

For example, if the cmdline string is:

"   ls   -l   -a   \n"
This function will malloc up an argv array of char * values, and then malloc up three arrays of char values, one for each command line string (each of exactly the right size to store the string)
   // local var to store dynamically allocated args array of strings
   char **args;


      args --------->[0]-----> "ls"
                     [1]-----> "-l"
                     [2]-----> "-a"
                     [3]-----|  (NULL)
Your function cannot modify the cmdline string that is passed in to it But, you may malloc up space for a local copy of the cmdline string to tokenize if this helps. If you do this, however, your function must free this copy before it returns; the returned args list should not point into this copy like the parse_cmd function does, but each command line argument should be malloced up separately as a distinct string of exactly the correct size).

This function is more complicated to implement and will likely require at least more than a single passes through the chars of the command line string.

Requirements

  1. Your two functions should meet the specifications described above.

  2. You may only have the single global variable already defined for you in parsecmd.c. All other variables should be local, and values should be passed to functions.

  3. You may not change any of the function prototypes in the parsecmd library. Your library code must work with our test code that makes calls to these functions as they are defined above. You really should not need to make any changes to the .h file.

  4. You should use good modular code. The two library functions should not be static, but you can add helper functions that are private to the .c file, and thus should be declared static.

  5. All system and function calls that return values, should have code that detects and handles error return values.

  6. Your functions should work for command lines entered with any amount of whitespace between command line options (but there should be at least one whitespace char between each). For example, all these should yield identical argv lists returned by your functions:
    cat foo.txt  blah.txt      &
    cat foo.txt  blah.txt&
                 cat          foo.txt           blah.txt          &
    
    TEST that your code works for command lines with any amount of whitespace between command line arguments

  7. Your code should be well commented. See Prof Newhalls's C style guide for examples of what this means.

  8. Your code should be free of valgrind errors. You will need to add code to tester.c to free the space allocated and returned by the dynamic version of the function. Any other space you malloc internally in your library functions (that it does not explicitly return to the caller), should be freed by them.

Useful C functions and Hints

  • Implement and test incrementally! Start with the parse_cmd function first before trying parse_cmd_dynamic. And break its functionality into parts that you implement and test incrementally. Use valgrind as you go to catch memory access errors as you make them.

  • Review strings, char, and pointers in C. Here are some C programming references. See Prof. Newhall's "char in C", "strings in C", and "pointers in C" in particular.

  • Use string library and ctype functions (see Prof. Newhall's string and char documentation for some examples, and look at their man pages for how to call and use). Some that may be useful include:
    strlen, strcpy, strchr, strstr, isspace
    
    Here is an example of using strstr and modifying a string to create a substring:
      int i;
      char *ptr, *str;
      str = malloc(sizeof(char)*64);
      if(!str) { exit(1); }
      ptr = strcpy(str, "hello there, how are you?");
      if(!ptr) { exit(1); }
      ptr = strstr(str, "how");
      if(ptr) {
         printf("%s\n", ptr);  // prints: how are you?
         ptr[3] = '\0';
         printf("%s\n", ptr);  // prints: how 
      } else {
        printf("no how in str\n");
      }
    
    strstr may or may not be useful in this assignment, but you will need to create token strings in a way that has some similarities to this example.

  • Command lines with ampersands in the middle can be handled like bash handles them (bash ignores everything after the &):
    "hello there & how are you?"
    
    gets parsed into an argv list as:
    argv[0]---->"hello"
    argv[1]---->"there"
    argv[2]----|   (NULL)
    

  • You do not need to implement a solution using pointer arithmetic, but if you'd like to use it, look at the pointer arithmetic examples from this week's lab for some examples.

  • Use gdb (or ddd) and valgrind. Here are some C debugging guide

  • Writing this type of string processing code can be very tricky. Use the debugger to help you see what your code is doing. Stepping through individual C statement execution using next may be helpful. If you do this and want to see the results of instructions on program variables, you can use the display command to get gdb to automatically print out values every time it gains control. Here is an example of printing out three variables (ptr, i, buffer):
    (gdb) display ptr
    (gdb) display i
    (gdb) display buffer
    
  • Think very carefully about type. Draw some pictures to help you figure out what you need to access, and what type it is.


Sample Output
Here is Sample Output from a run of my solution. Notice how it handles whitespace chars and parsing commands with & in them. Also note that each argv string is printed between # characters so that you can see if you are incorrectly including any whitespace characters in an argument string result.

Submit

To submit your code, simply commit your changes locally using git add and git commit. Then run git push while in the labs/08 directory. Only one partner needs to run the final push, but make sure both partners have pulled and merged each others changes. See the section on using a shared repo on the git help page.