Introduction to C Programming for CS31 Students

Part 1: variables, functions, arrays, strings


Contents:

  1. Getting Started simple program, compiling
  2. Variables
  3. Input/Output
  4. Branching
  5. Loops
  6. Functions
  7. Arrays
  8. Strings
  9. Intro to Structs
Part 2 contains information about structs and pointers (it will be covered later in the semester)
Links to more C programming Resources
Overview and Resources

This page includes a brief overview to C programming for students who have taken CS21 or an equivalent introductory CS course. We will start with some of the C basics, which is much of the C programming language, and then will add more C programming features as the semester progresses. As you are implementing C programs for lab assignments, make use of:

  1. The information on this page
  2. My C programming Documentation and C Programming Resources. This contains all kinds of C programming documentation including:
    • How to compile and run C programs on our system
    • The C Code Style Guide (read this and follow it)
    • Documentation about different C types and C data structures (char, strings, file I/O, pointers, arrays, linked lists, ...).
    • Links to C programming tutorials and C language documentation.
    • How to use C debugging tools: gdb and valgrind
  3. Example C code that I give you in class and weekly lab.
Code examples on this page can be copied over from my public/cs31/C_examples directory:
   # if you don't have a cs31 subdirectory, create one first:
   mkdir cs31
   # copy over my C example files into your cs31 subdirecory:
   cd cs31
   cp -r /home/newhall/public/cs31/C_examples  .
   # cd into your copy, run make to compile
   cd C_examples
   ls
   make

Getting Started Programming in C
Below is the hello world program in C with a lot of comments. I would put it in a file named hello.c (.c is the suffix convention used for C source code files).
/* 
  The Hello World Program in C
  (this is also an example of a multi-line comment)
*/
#include <stdio.h>   // include the C standard I/O library

// Any executable program must have exactly one function called main
int main() {
  printf("Hello World\n");
  return 0;
}
Note the following features of the basic program:

To run a program, we must first save the code using vim or another editor on our system, then compile the source to an executable form and run the executable form of our program. The syntax for compiling is

 $ gcc -o <output_executable_file> <input_source_file> 
for example, gcc compiles hello.c into an executable file named hello:
 $ gcc -o hello hello.c
We run the executable program using ./hello:
 $ ./hello
If we change the source (hello.c file), we must recompile with gcc before running ./hello. If there are any errors, the ./hello file will not be created/recreated (but beware, an older version of the file from a previous successful compile may still exist). If you do not include the -o outputfile, gcc creates the executable in a file named a.out.

Variables
Variables are named containers for holding data. In C all variables must be declared before use. To declare a variable, use the following syntax:
type_name variable_name;
A variable can only have a single type. Valid basic types include int, float, double, char. Examples for declaring variables are shown below. In C, variables must be declared at the beginning of their scope (top of a { } block) before any C statements in that scope (this is not true in C++, so if you are coming from CS35, be sure to follow C variable declaration convention).
{
 /* DECLARE ALL VARIABLES OF THIS SCOPE AT THE TOP OF THE BLOCK { */
 int x;         // declaring x to be an int type variable
 int i,j,k;     // can declare multiple variables of the same type on one line
 char letter;   // a char stores a single ASCII value 
                // a char in C is a different type that a string in C
 float winpct;  // winpct is declared to be a float type 
 double pi;     // the double type is more precise than float

 /* AFTER DECLARING ALL VARIABLES YOU CAN USE THEM IN C STATEMENTS */
 x = 7;         // x stores 7, initialize all variables before using them
 k = x + 2;     // use x's value in an expression

 letter = 'A';      // a single quote is used for single character value
 letter = letter+1; // letter stores 'B' (its ascii value is one more than 'A's)

 double pi = 3.1415926; // the double type is more precise than float

 winpct = 11/2;  // winpct gets 5.5, winpct is a float type
 j = 11/2;       // j gets 5: int division truncates anything after the decimal
 x = k%2;        // % is C's mod operator, so x gets 9 mod 2 (1)
}
Note the semicolons galore. C expects one after every statement. You'll forget them. gcc almost never says "You missed a semicolon" even though that might be the only thing wrong with your program. As you program more in C, you will learn to translate gcc errors to the error in your program.

On most variable types, you may use the following operators. Some may not apply depending on the operand type.



Input/Output (printf and scanf)
C uses the printf function for printing to standard out (the terminal), and scanf is one function for reading in values (usually from the keyboard). scanf is similar to printf, and it is the first way we will do program input. However, it is not very resilient to users entering bad values, so later we will learn better ways to read in values.

printf and scanf are part of the stdio.h library that needs to be #included at the top of the .c file using them.

printf is very similar to formatted print statements in Python, where you provide a format string to print and then values to fill the placeholders in the format string. Here are some printf examples:

  int x = 5, y = 10;
  float pi = 3.14;

  // print the values of x and y followed by a newline character:
  printf("x is %d and y is %d\n", x, y);  

  // print a float value (%g) a string value (%s) and an int value (%d)
  // separated by tab characters (\t) followed by a new line character (\n):
  printf("%g \t %s \t %d\n", pi, "hello", y); 
Different types in C are different numbers of bytes, and there are signed and unsigned versions of the "integer" types.
 1 byte:          char        unsigned char
 2 bytes:         short       unsigned short
 4 bytes:         int         unsigned int           float
 4 or 8 bytes*:   long        unsigned long 
 8 bytes:         long long   unsigned long long     double

 *number of bytes for long depends on the architecture

printf formatting placeholders:

Placeholders for specifying different types
--------------------------------------------
 %f,%g:  placeholders for a float or double value
 %d:     placeholder for a decimal value (for printing char, short, int values)
 %u:     placeholder for an unsigned decimal
 %c:     placeholder for a single character
 %s:     placeholder for a string value
 %p:     placeholder to print an address value

 To print out long type values need to use l prefix:
   %lu: print an unsigned long value
   %lld: print a long long  value

Placeholders for specifying the numeric representation
-------------------------------------------------------
 %x:     print value in hexidecimal (base 16)
 %o:     print value in octal (base 8)
 %d:     print value in signed decimal  (base 10)
 %u:     print value in unsigned decimal (unsigned base 10)
 %e:     print float or double in scientific notation
 (there is no formatting option to display the value in binary)

The following are special formatting characters:
-----------------------------------------------
\t: print a tab character
\n: print a newline character

You can also specify field width for the values:
------------------------------------------------
%5.3f: print float value in space 5 chars wide, with 3 places beyond decimal
%20s:  print the string value in a field of 20 chars wide, right justified 
%-20s: print the string value in a field of 20 chars wide, left justified 
%8d:   print the int value in a field of 8 chars wide, right justified 
%-8d:  print the int value in a field of 8 chars wide, left justified 

Here is an example full program using a lot of formatting:
#include <stdio.h> // library needed for printf

int main() {
  float x=4.50001;
  float y=5.199999;
  char ch = 'a';
  printf("%.1f %.1f\n", x, y); // prints out x and y with single precision 
  // nice tabular output
  printf("%6.1f \t %6.1f \t %c\n", x, y, ch);  
  printf("%6.1f \t %6.1f \t %c\n", x+1, y+1, ch+1);  
  printf("%6.1f \t %6.1f \t %c\n", x*20, y*20, ch+2);  
  return 0;
}

scanf

scanf is one way in which your program can read in input values entered by a user. It is very picky about the exact format in which the user enters data, which makes it not very robust to badly formed user input. For now we will use it, later we will use a more robust way of reading input values from the user. For now, just remember that if your program gets into an infinite loop due to badly formed user input you can always type CNTRL-C to kill it.

A scanf call looks a lot like a printf call, it has a format string followed by variable locations into which the values read in should be stored. To specify the location of a variable, you need to use the & operator, which evaluates to "the memory location (or address) of the associated variable". Here are some examples:

  int x;
  float pi;

  // read in an int value followed by a float value ("%d%g")
  // store the int value at the memory location of x (&x)
  // store the float value at the memory location of pi (&pi)
  scanf("%d%g", &x, &pi);
The scanf will skip over leading and trailing whitespace characters (e.g. ' ', '\t', '\n') as it finds the start and end of each numeric literal. Thus, a user could enter the value 8 and 3.4 in any of the three ways listed below and the call to scanf above would assign 8 to x and 3.4 to pi:
8 3.4
         8             3.4
8
3.4
The format string for scanf is a bit different than for printf in that you often do not need to specify white space chars in the format string for reading in consecutive numeric values:
// reads in an int and a float separated by at least one white space character
  scanf("%d%g",&x, &pi);  
scanf can seem to behave very strangely for format string with different type placeholders, so if you get some odd behavior play around with the format string a bit and try different types. My documentation about file I/O has some example scanf format strings.

Branching with if/else
The syntax for branching in C is very similar to in Python. The main difference is that where Python uses indenting to indicate "body" statements, C used curly braces (but you should also use good indenting to in your C code). Here is the basic if-else syntax (the else part is optional):
//a one way branch
if ( <Boolean expression> ){
  <true body>
}

// a two way branch
if ( <Boolean expression> ){
  <true body>
}
else{
  <false body>
}

// a multibranch:
if ( <Boolean expression 1> ){
  <true body>
}
else if( <Boolean expression  2>){
  //first expression is false, second is true
  <true 2 body>
}
// can have more else if's here 
// ...
else{
  // if all previous experessions are false
  <false body>
}

Boolean Values in C

C does not have a Boolean type with true or false values, instead int values are used to represent true or false in conditional statements:

The set of operators you can use in constructing boolean expressions are the following (listed in precedence order):

Here is an example conditional statement in C (it is always good to use parens around complex boolean expressions to make them easy to read):
if (y == 10)) {
  printf("y is 10");
} else if((x > 10) && (y > x)) {
  printf("y is bigger than x and 10\n");
  x = 13;
} else if ((x == 10) || (y > x+20)) {
  printf("y might be bigger than x\n");
  x = y*x;
} else {
  printf("I have no idea what the relationship between x and y is\n");
}
Loops

for loops

For loops are different in C than they are in Python. In python for loops are iterations over sequences, in C for loops are more general looping constructs. The C for loop syntax is:
for( <initialization>; <boolean expression>; <step> ){
 <body>
}
The rules for evaluation are:
  1. Evaluate initialization one time when first evaluate the for loop.
  2. Evaluate the boolean expression, if it is false (0), then drop out of the for loop (you are done repeating the loop body statements).
  3. Evaluate the statements inside the loop body
  4. Evaluate the step expression
  5. goto step (2).

Here is a simple example for loop to print out the values 0 through 9:

for (int i=0; i<10; i++){
   printf("%d\n", i);
}
See forLoop1.c and forLoop2.c for more examples.

while loops

While loop syntax in C is similar to in Python, and is evaluated similarly:
while ( <Boolean expression> ){
  <true body>
}
The while loop checks the Boolean expression first and executes the body if true. A similar do-while loop executes the body first, then checks a condition and runs the loop again if the condition is true:
do{
  <body>
} while ( <Boolean expression> );
In C, for loops and while loops are equivalent in power (this is not true in Python), thus C would only need to provide one of these looping constructs. However, for loops tend to be a more natural language construct for definite loops (like iterating over values in a list), and while loops tend to be more natural language construct for indefinite loops (like repeating until the user enters an even number). Therefore, C provides both.

See whileLoop1.c and whileLoop2.c for examples.

Functions
Use functions to break code into manageable pieces and reduce code duplication. Functions may take parameters as input and return a single value of a specific type. A function declaration specifies the function's name, return type, and the parameter list (the number and type of all parameters). A function definition includes the code to be executed when the function is called. All functions in C must be declared before they are called. This is typically done using a function prototype, but it can also be acomplished by having the function definition appear before it is called in a file.
function definition format:
---------------------------
<return type> <function name> (<parameter list>)
{
  <function body>
}

parameter list format:
---------------------
<type> <parm1 name>, <type> <parm2 name>, ...,  <type> <last parm name> 

A function that does not return a value has a void return type.

Arguments are passed to C functions by value. Thus a copy of the variables value is made before the body of the function executes. Any modifications to the parameters in the function are not visible to the callee.

Here is an example function definition followed by a call to it:

int max(int x, int y) {
  int bigger;
  bigger = x;
  if(y > x) {
    bigger = y;
  }
  return bigger; 
}
int main() {
   int a, b;
   printf("Enter two integer values: ");
   scanf("%d%d", &a, &b);
   printf("The larger value is %d\n", max(a,b));
}

See function1.c for this and another example.

Exercise: Implement and test a power function (for positive integer exponents only).

Arrays
Arrays are like C's version of lists. Python provides a high-level list interface to the programmer that hides much of the low-level implementation details. In C, however, the programmer has to implement this low-level list functionality; arrays are just the low-level data storage without higher-level functionality like size, insert, append, etc.

Arrays can store multiple items of the same type. For now, we will use only statically declared arrays, meaning we must know the total capacity (number of buckets) of the array at compile time, and we declare the array to be of that capacity. We cannot shrink or grow the array at run time (at least not yet).

To declare an array, specify its type, name and total capacity (number of buckets):

int  arr[10];  // an array of 10 ints
char str[20]; // an array of 20 char...could be a C-style string
Individual array elements may be accessed by indexing:
int i, num;

num = 5;
for(i=0; i < num; i++) {  // initialize the first 5 buckets of arr
   arr[i] = i;
} 
arr[5] = 100;
num++;
Notice that we declared the array to have 10 buckets, but we are only using 6 of them (our current list is of size 6 not 10). It is often the case when using statically declared arrays that there is unused capacity. Thus, we need to have a program variable that keeps track of the actual size of the list (num in this example).

Arrays and Functions

To declare an array function parameter we must use the syntax int a[] (or int *a, but we will use this syntax later). Note we do not specify the capacity of the array parameter in the parameter list (the function can accept an int array of any capacity). Arrays also do not know their size, so if we want the function to know how many buckets are in use, we should also pass the size value as a parameter. For example:

 
void printArray(int a[], int size) { 
  int i;
  for(i=0; i < size; i++) {
      printf("%d\n", a[i]);
  }
}
To call a function with an array parameter, pass only the name of the array as the argument, omitting the brackets. For example:
printArray(arr, num);
The name of the array variable is equivalent to the base address of the array (the memory location of its 0th bucket). This means that the argument's array buckets are NOT passed by value to the function (i.e. the function's parameter DOES NOT get a copy of every array bucket of its argument). Instead, the parameter gets the value of the memory location of the first bucket in the argument array (the base address of the array). The implications of this are that when array buckets are modified inside the called function (e.g. a[2] = 8), they also modify the contents of the corresponding bucket in the argument (i.e. arr[2] is now 8). This is becuase the parameter REFERS TO the same array storage locations as its argument.

Question:What happens if you go beyond the bounds of an array in C?

int array[10];
array[10] = 100;  // 10 is not a valid index into the array of 10 int buckets
Answer: Unexpected program behavior. It could lead to your program crashing, it could change another variable's value, or it could have no effect on your program's behavior; it is a program bug that may or may not show up as buggy program behavior. It is up to the C programmer to ensure that index values are valid and to avoid accessing array buckets beyond the bounds of an array.

The files array1.c and array2.c have some example uses of arrays.

Exercise: complete and test the function minimum in array2.c.

Example: Here is an example function call and a stack drawing showing an example of an array parameter.



Strings
Strings in C are just arrays of characters terminated by a special null character value '\0'. Not every array of char is used as a C string, but every string is an array of char. C has a string library that contains functions for manipulating C strings. One thing to keep in mind as you use the string library is that you are responsible for allocating the space for the underlying char array, and that the terminating '\0' character needs to be included in that space. For example, to store the string "hi", you need an array of at least 3 chars (one to store 'h', one to store 'i', and one to store '\0'). The string library functions will determine the end of a string by searching for the '\0' character, they also will add that character to the end of any string they initialize for you (e.g. strcpy will null terminate the destination string). Here is a very simple example:
#include <string.h>

int main() {
  char str1[10];
  char str2[10];
  str1[0] = 'h';
  str1[1] = 'i'; 
  str1[2] = '\0'; 
  printf("%s %d\n", str1, strlen(str1));  // prints hi 2 to stdout
  strcpy(str2, str1);    // strcpy copies the contents of str1 to str2  
  printf("%s\n", str2);  // prints hi to stdout
}
See my Strings in C documentation for more string and string library examples. In particular look at the string library functions strlen, strcpy and strcmp. (note: some of the example code here use dynamically allocated strings, which we have not yet learned).

Structs Part 1
C is not an object-oriented language, and thus does not have support for classes. It does, however, have support for defining structured types (like the data part of classes).

A struct is a type used to represent a heterogeneous collection of data; it is a mechanism for treating a set of different types as a single, coherent unit. For example, a student may have a name, age, gpa, and graduation year. A struct type can be defined to store these four different types of data associated with a student.

In general, there are three steps to using structured types in C programs:

  1. Define a new struct type representing the structure.
  2. Declare variables of the struct type
  3. Use DOT notation to access individual field values

Defining a struct type

struct type definitions should appear near the top of a program file, outside of any function definition. There are several different ways to define a struct type but we will use the following:
struct <struct name> {
  <field 1 type> <field 1 name>;
  <field 2 type> <field 2 name>;
  <field 3 type> <field 3 name>;
  ...
};
Here is an example of defining a new type 'struct studentT' for storing student data:
struct studentT {
   char name[64];
   int  age;
   int  grad_yr;
   float gpa;
};

// with structs, we often use typedef to define a shorter type name
// for the struct; typedef defines an alias for a defined type
// ('studentT' is an alias for 'struct studentT')
typedef struct studentT studentT; 

Declaring variables of struct types

Once the type has been defined, you can declare variables of the structured type:
struct studentT  student1;   // student1 is a struct studentT
studentT  student2;          // student2 is also a struct studentT
                             // (we are just using the typedef alias name)

studentT cs31[50];           // an array of studentT structs: each bucket
                             // stores a studentT struct

Accessing field values

To access field values in a struct, use dot notation:
<variable name>.<field name>
It is important to think very carefully about type when you use structs to ensure you are accessing field values correctly based on their type. Here are some examples:
student1.grad_yr = 2017;
student1.age = 18 + 2;
strcpy(student1.name, "Joseph Schmoe");
student2.grad_yr = student1.grad_yr;
cs31[0].age = student1.age;
cs31[5].gpa = 3.56;
structs are lvalues, meaning that you can use them on the left-hand-side of an assignment statement, and thus, can assign field values like this:
student2 = student1;  // student2 field values initialized to the value of
                      // student1's corresponding field values
cs31[i] = student2;
Question: For each expression below, what is its type? Are any invalid? (here are the answers)
   (1) student1
   (2) student1.grad_yr
   (3) student1.name
   (4) student1.name[2]
   (5) cs31
   (6) cs31[4]
   (7) cs31[4].name
   (8) cs31[4].name[5]

Passing structs to functions

When structs are passed to functions, they are passed BY VALUE. That means that the function will receive a COPY OF the struct, and that copy is what is manipulated from within the function. All field values in the copy will have the exact same values as the field values of the original struct - but the original struct and the copy occupy different locations in memory. As such, changing a field value within the function will NOT change the corresponding field value in the original struct that was passed in.

If one of the fields in a struct is a statically declared array (like the name field in the studentT struct), the parameter gets a copy of the entire array (every bucket value). This is because the complete statically declared array resides within the struct, and the entire struct is copied over as a unit. You can think of the struct as a chunk of memory (0's and 1's) that is copied over to the parameter without anything being added to it or taken out. So, a function passed student1 CANNOT change any of the contents of the student1 variable (because the function is working with a COPY of student1, and thus the student.name array in the copy starts at a different memory location than the student.name array of the original struct). This may seem odd given how arrays are passed to functions (an array parameter does not get a copy of every array bucket of its argument, instead it REFERS to the same array as the argument array). This seemingly different behavior is actually consistent with the rule that a parameter gets THE VALUE of its argument. It is just that the value of an array argument (the base address of the array) is different than the value of an int, float, struct, ..., argument. For example, here are some expressions and their values:

 
Argument Expression      Expression's Value (Parameter gets this value)
--------------------     --------------------------------------------
student1                 {"Joseph Schmoe", 20, 2017, 3.56}
student1.gpa             3.56
cs31                     base address of the cs31 array     
student1.name            base address of the name field array
student1.name[2]         's'
Only when the value passed to a function is an address of a memory location can the function modify the contents of the memory location at that address: a function passed student1 (a struct value) CANNOT change any of the contents of the student1 variable; but a function passed student1.name (the base address of an array) CAN change the contents of the buckets of the name field - because when student1.name is passed in, what is being passed in is the memory location of the array, NOT a copy of the entire array.

Example: Here is an example function call with a stack drawing showing how different types are passed.

See struct.c for more examples.
Exercise: implement and test two functions in this file: printStudent and initStudent.

lvalues

An lvalue is an expression that can appear on the left hand side of an assignment statement. In C, single variables or array elements are lvalues. The following example illustrates valid and invalid C assignment statements based on lvalue status:
struct studentT  student1;   
studentT  student2;          
int x;
char arr[10], ch;

x = 10;                         // valid C: x is an lvalue
ch = 'm';                       // valid C: ch is an lvalue
student1 = student2;            // valid C: student1 is an lvalue
arr[3] = ch;                    // valid C: arr[3] is an lvalue
x + 1 = 8;                      // invalid C: x+1 is not an lvalue
arr = "hello there";            // invalid C: arr is not an lvalue
arr = student1.name;            // invalid C: arr is not an lvalue
student1.name = student2.name;  // invalid C: name (an array of char) is not an lvalue