CS31 Lab 9:
  Implementing the parsecmd library

Due before 11:59pm Tuesday, Nov. 28
This is an Extra Credit Lab Assignment (no late days are allowed on this)

For this lab, you will continue working with your shell lab partner: the partner list

Expectations for Working with Partners


Extra Credit Rules

NOTE: Even if you do not submit Lab 9 for extra credit, we strongly encourage you to try it out. You should also complete the in-lab string programs from Wednesday of week 11. Having practice manipulating C strings will be useful in upper-level CS courses, particularly Group 2 courses.

Lab Goals:

Contents:

Lab 9 Starting Point Code

Both you and your partner should do:

  1. Get your Lab09 ssh-URL from the GitHub server for our class: CS31-F17
  2. On the CS system, cd into your cs31/labs subdirectory
    cd ~/cs31/labs
    
  3. Clone a local copy of your shared repo in your private cs31/labs subdirectory:
    git clone [your_Lab09_URL]
    
    Then cd into your Lab09-you-partner subdirectory.
If all was successful, you should see the following files when you run ls:
Makefile  parsecmd.c  parsecmd.h  tester.c  QUESTIONNAIRE

If this didn't work, or for more detailed instructions see the the Using Git page.

As you and your partner work on your joint solution, you should be frequently pushing changes to and pulling changes from the master into your local repos.


Assignment Overview

In this assignment you and your partner will implement a C library and write code to test the functions in your library. You will be implementing the parsecmd library, one function of which you used in your shell lab program. You should be able to use your compiled library (parsecmd.o) in place of the one I gave you in lab 7.

About the Starting Point Code:

Project Details, Requirements and Hints

You will implement a parsecmd library that contains functions to parse a command line string into its individual command line arguments, and construct an argv list of strings from the command line args. Your library functions can then be used by other programs by #including your parsecmd.h file and linking in your parsecmd.o binary on the gcc command line:

gcc -g -o tester tester.c parsecmd.o

Information on building and using C libraries

Read over the "CREATING AND USING YOUR OWN LIBRARY CODE" section of the following (also available off my C help pages): Building and Using libraries in C.

Using the parsecmd library (ex. tester.c)

The parsecmd.h file contains the interface to your library. Applications using the parsecmd library should #include it:
#include "parsecmd.h"

Implementing the parsecmd library (parsecmd.c)

Both functions in the parsecmd library take in a command line string (like in the shell lab), and parse it into a argv list (an array of strings, one per command line argument). They both test for an ampersand in the command line, which, when present, indicates a background command. For example, if the user enters the follow command line string:
$ cat foo.tex
These functions will be passed the string:
"cat foo.tex\n"
And will parse the command line string into the argv array:
argv [0] ---->"cat"
argv [1] ---->"foo.tex"
argv [2] ----|  (NULL)
The main difference between the two functions is that the first uses a single statically declared char array, while the second dynamically allocates space for both the argv array and the strings it points to.

The parse_cmd function

/*
 * parse_cmd - Parse the command line and build the argv array.
 *
 *    cmdline: the command line string entered at the shell prompt
 *    argv:  an array of size MAXARGS of char *
 *           parse_cmd will initialize its contents from the passed
 *           cmdline string.
 *           The caller should pass in a variable delcared as:
 *              char *argv[MAXARGS];
 *              (ex) int bg = parse_cmd(commandLine, argv);
 *
 *           argv will be filled with one string per command line
 *           argument.  The first argv[i] value following the last
 *           command line string will be NULL.  For example:
 *              ls -l     argv[0] will be "ls"  argv[1] will be "-l"
 *                        argv[2] will be NULL
 *          for an empty command line, argv[0] will be set to NULL
 *    bg:   pointer to an int that will be set to 1 if the last
 *          argument is '&', 0 otherwise. So bg will be set to 1
 *          if the command is meant to be run in the background.
 *
 *    returns: -2 if cmdline is NULL
 *             -1 if the command line is empty
 *              0 to indicate success
 */
int parse_cmd(const char *cmdline, char *argv[], int *bg);
This function will initialize the passed-in argv array to point to substrings that it creates in a global char array (initialized to a copy of the passed-in command line string). This global array is already declared in parsecmd.c:
static char cmdline_copy[MAXLINE];
The parse_cmd function will:
  1. make a copy of the cmdline string in its cmdline_copy array
  2. process its copy of the string to find tokens, modifying cmdline_copy to create substrings for each token. A token is a sequence of non-whitespace chars, each separated by at least one whitespace character (or by &). Tokens should not include &, which has special meaning in command lines.
  3. assign each argv[i] bucket to point to its corresponding substring token in cmdline_copy. Remember that a NULL value in an argv[i] bucket is used to signify the end of the list of argv strings.

For example, if the command line entered is the following:

  ls -1 -a &
The command line string associated with this entered line is:
"    ls   -l   -a  &\n"
And the copy of it in cmdline_copy looks like:
cmdline_copy 0 | ' ' | 
             1 | ' ' | 
             2 | 'l' | 
             3 | 's' | 
             4 | ' ' | 
             5 | ' ' | 
             6 | '-' | 
             7 | 'l' | 
             8 | ' ' | 
             9 | ' ' | 
            10 | '-' | 
            11 | 'a' | 
            12 | ' ' | 
            13 | '&' | 
            14 | '\n'| 
            15 | '\0'| 
Your function will TOKENIZE this string and set each argv array bucket to point to the start of its associated token string in cmdline_copy:
                                0     1     2     3  
                             ------------------------
                        argv |  *  |  *  |  *  |  * |
                             ---|-----|-----|-----|--
cmdline_copy 0 | ' ' |          |     |     |     | 
             1 | ' ' |          |     |     |     | 
             2 | 'l' |<----------     |     |    ---- 
             3 | 's' |                |     |    (NULL)  
             4 | '\0'|                |     | 
             5 | ' ' |                |     | 
             6 | '-' |<----------------     | 
             7 | 'l' |                      | 
             8 | '\0'|                      | 
             9 | ' ' |                      | 
            10 | '-' |<----------------------- 
            11 | 'a' | 
            12 | '\0'| 
            13 | '&' | 
            14 | '\n'| 
            15 | '\0'| 
Note the changes to the cmdline_copy string contents and the assignment of argv bucket values into different starting points in the char array.

The parse_cmd_dynamic function

There are two main problems with the previous function:
  1. The user is limited to command line strings that are at most MAXLINE characters and have at most MAXARGS arguments.
  2. The function uses a single global character array. This means that the caller has to use the argv return strings before another call to the parse_cmd function is made (since it will overwrite cmdline_copy with the new command line string that it tokenizes). For use in the shell program this version is okay, but it limits the "general purpose-ness" of this function.
The parse_cmd_dynamic function solves these two problems by dynamically allocating and returning the argv array of strings, one string for each command line argument.
/*
 * parse_cmd_dynamic - parse the passed command line into an argv array
 *
 *    cmdline: the command line string entered at the shell prompt
 *         bg: will set value pointed to 1 if command line is run in
 *             background, 0 otherwise (a pass-by-reference parameter)
 *
 *    returns: a dynamically allocated array of strings, exactly one
 *             bucket value per argument in the passed command line
 *             the last bucket value is set to NULL to signify the
 *             end of the list of argument values.
 *             or NULL on an error
 *
 *             The caller is responsible for freeing the returned
 *             argv list.
 */
char **parse_cmd_dynamic(const char *cmdline, int *bg);
This function will find tokens much like the previous version does. However, it must also determine how many tokens are in the cmdline string and malloc EXACTLY the right number of argv buckets for the particular cmdline string (remember an extra bucket at the end for NULL). For each token, it must malloc exactly enough space for a char array to store the token as a string (remember an extra bucket for the terminating '\0' character).

For example, if the cmdline string is:

"   ls   -l   -a   \n"
This function will malloc an argv array of char * values, and then malloc three arrays of char values, one for each command line string.
   // local var to store dynamically allocated args array of strings
   char **argv;


      argv --------->[0]-----> "ls"
                     [1]-----> "-l"
                     [2]-----> "-a"
                     [3]-----|  (NULL)
Your function cannot modify the cmdline string that is passed in to it. But you may malloc space for a local copy of the cmdline string to tokenize if this helps. This may allow you to reuse some of the code you wrote for the parse_cmd function. If you use this approach, your function must free this copy before it returns; the returned args list should not point into this copy like with parse_cmd. Instead each command line argument should be malloced separately as a distinct string of exactly the correct size.

This function is more complicated to implement and will likely require more than a single pass through the chars of the command line string.


Lab Requirements
  1. Your two functions should meet the specifications described above.

  2. You may only have the single global variable already defined for you in parsecmd.c. All other variables should be local, and values should be passed to functions.

  3. You may not change any of the function prototypes in the parsecmd library. Your library code must work with our test code that makes calls to these functions as they are defined above. You really should not need to make any changes to the .h file.

  4. You should use good modular code. The two library functions should not be static, but you can add helper functions that are private to the .c file, and thus should be declared static.

  5. All system and function calls that return values should have code that detects and handles error return values.

  6. Your functions should work for command lines entered with any amount of whitespace between command line options (but there should be at least one whitespace char between each). For example, all these should yield identical argv lists:
    cat foo.txt  blah.txt      &
    cat foo.txt  blah.txt&
                 cat          foo.txt           blah.txt          &
    
    TEST that your code works for command lines with any amount of whitespace between command line arguments

  7. Your code should be well commented. See my C style guide for examples of what this means.

  8. Your code should be free of valgrind errors. You will need to add code to tester.c to free the space allocated and returned by the dynamic version of the function. Any other space you malloc internally in your library functions (that it does not explicitly return to the caller) should be freed before the function returns.
Hints, Tips, Resources, Useful C functions
  • Implement and test incrementally! Start with the parse_cmd function first before trying parse_cmd_dynamic. Break its functionality into parts that you implement and test incrementally. Use valgrind as you go to catch memory access errors as you make them.

  • Review strings, char, and pointers in C. Here are some C programming references. See my "char in C", "strings in C", and "pointers in C" in particular.

  • Use string library and ctype functions. (For more info see my string and char documentation and the man pages.) Some that may be useful include:
    strlen, strcpy, strchr, strstr, isspace
    
    Here is an example of using strstr and modifying a string to create a substring:
      int i;
      char *ptr, *str;
      str = malloc(sizeof(char)*64);
      if(!str) { exit(1); }
      ptr = strcpy(str, "hello there, how are you?");
      if(!ptr) { exit(1); }
      ptr = strstr(str, "how");
      if(ptr) {
         printf("%s\n", ptr);  // prints: how are you?
         ptr[3] = '\0';
         printf("%s\n", ptr);  // prints: how 
      } else {
        printf("no how in str\n");
      }
    
    strstr may or may not be useful in this assignment, but you will need to create token strings in a way that has some similarities to this example.

  • Command lines with ampersands in the middle can be handled like bash handles them (bash ignores everything after the &):
    "hello there & how are you?"
    
    gets parsed into an argv list as:
    argv[0]---->"hello"
    argv[1]---->"there"
    argv[2]----|   (NULL)
    

  • Use gdb (or ddd) and valgrind. Here is my C debugging guide.

  • Writing string processing code can be very tricky. Use the debugger to help you see what your code is doing. It may be helpful to step through each C statement using next. If you do this and want to see the results of instructions on program variables you can use the display command to get gdb to automatically print out values every time it gains control. Here is an example of displaying two variables (ptr and i):
    (gdb) display ptr
    (gdb) display i
    
  • Think very carefully about type. Draw pictures to help you figure out what values you need to access and what their types are.


Sample Output

Here is Sample Output from a run of my solution. Notice how it handles whitespace chars and parsing commands with & in them. Also note that each argv string is printed between # characters so that you can see if you are incorrectly including any whitespace characters in an argument string result.


Lab Questionnaire

With every lab assignment is a file named QUESTIONNAIRE for you to fill out and submit with your lab solution. In this file you will answer some questions about the lab assignment. You should fill this out and submit it with your lab solution.


Submit

NOTE: also email your professors after you submit to let them know that you have submitted a solution to this extra credit lab.

Before the Due Date

Only one of you or your partner needs to push your solution from your local repo to the GitHub remote repo. (It doesn't hurt if you both push, but the last pushed version before the due date is the one we will grade, so be careful that you are pushing the version you want to submit for grading.)

From one of your local repos (in your ~you/cs31/labs/Lab9-partner1-partner2 subdirectory):

git push

Troubleshooting

If git push fails, then there are likely local changes you haven't committed. Commit those first, then try pushing again:
git add parsecmd.c
git add QUESTIONNAIRE
git commit
git push
Another likely source of a failed push is that your partner pushed, and you have not pulled their changes. Do a git pull. Compile and test that your code still works. Then you can add, commit, and push.

If that doesn't work, take a look at the "Troubleshooting" section of the Using git page. You may need to pull and merge some changes from master into your local. If so, this indicates that your partner pushed changes that you have not yet merged into your local. Anytime you pull into your local, you need to check that the result is that your code still compiles and runs before submitting.