CS35 Lab 4: The linked lists of life

Due by 11:59pm Wednesday, September 28, 2011
In this lab you will implement a C++ linked list and then use that linked list to simulate the behavior of DNA molecules. Your program will represent DNA as a linked list and perform common molecular operations using the linked list interface. At the end, you will discuss the performance of your implementation and suggest improvements. To get started, run update35 from a computer science lab computer. The program update35 will create a cs35/labs/04 directory in your CS home directory, and the program handin35 will subsequently submit files from that directory.

This lab will require much more time than the earlier labs in the course. Because you will use many pointers in this lab, you will probably have more trouble debugging your program than in labs 01 or 02.

You may work with one partner on this lab. Only one of you should submit your solution, but it is important that both of your names appear at the top of your program. As always, you must follow the academic integrity rules for the course. In particular, sharing/dicussing code with other groups is strictly prohibited. In addition, it is also a violation of academic integrity if both students are not equal contributors in this lab. You should be working together and both partners are responsible for understanding/verifying all code.

You will earn reasonable partial credit for this lab if you complete and test your linked list implementation. So be sure to work in a modular fashion, testing and completing your linked list first before implementing the main program



Introduction

DNA (Deoxyribonucleic acid) contains the blueprint, or genetic instructions, for the functioning of living organisms. DNA is a polymer chain consisting of four building blocks called nucleobases (or bases): adenine, cytosine, guanine, thymine. Typically, they are abbreviated A, C, G, and T respectively. Thus, we can represent the chain of bases as a sequence of characters:

ATTGTCATCCGGGAATC...

There are many behaviors/reactions that DNA undergo that we may want to simulate using a computer to study their behavior. While this lab won't win you a nobel prize, it will provide a simple computer simulation of a few interesting properties of DNA and see the linked-chain of bases matches fairly well with the linked nature of the linked list implementation of lists. (Say that 5 times fast).

Your program will simulate the following molecular reactions. Feel free to add more!



Implementation details

I have provided a DNAList interface, very similar to the StringList interface given in lecture. The difference is that you will represent a DNA base as a char instead of a string. You must define and implement a DNALinkedList class that implements the DNAList interface, without altering the DNAList interface.

I've also given the declaration and implementation of the BaseNode class to help you get started. The BaseNode represents one node in your linked list and stores the value of a base. You should not alter this definition, but feel free to analyze how pointers are handled in this class. You should first begin your lab by testing and working with the BaseNode class. Can you create a few nodes? Link them together? Print all the items? I have provided the beginning of a program to do this, testNode.cpp. Run make testNode to get started. This file will not be graded, but should be your starting point to understand how pointers and nodes work.

After testing BaseNode, you should complete and test your DNALinkedList class. HINT: you may want to start by copying and pasting the DNAList interface declaration into the DNALinkedList header file. The interface is very similar to the one in class with the addtion of a replace() method. Some requirements to note:

Once you have thoroughly tested your DNALinkedList implementation, you should implement the dnaSimulate main program. The program has been started for you. Use make to compile the program. When you run it, you will see the basic flow of the program has been implemented, it is your job to add functionality. Specifically, you should implement functions to simulate the features described in the introduction. Some tips and requirements

Here are a few example executions (this will be up soon):


An implementation strategy

You should always program incrementally such that you can test your work as you go. Here is one suggested strategy:

  1. Start by testing the DNABaseNode class -- which consists of a constructor, destructor and simple access and update functions
  2. Implement the DNALinkedList constructor, then either implement a print() method in your test file or use gdb to verify that your list is correctly constructed.
  3. Implement one of insertAtHead or insertAtTail. Then implement getSize, isEmpty, peekHead and peekTail. Use print or gdb to verify correctness.
  4. Implement getItem and verify.
  5. Implement removeHead. Then implement removeAtTail and replace and verify.
  6. Implement the destructor. Carefully consider every possible way that your class might allocate memory on the heap, and be sure that there is a corresponding call to delete for each call to new.
At this point you should have a complete working DNALinkedList class whose behavior matches the ArrayList from lecture (except that the example ArrayList has a capacity of just 10 items, whereas your LinkedList should have no artificial capacity limits). Write a small test program to verify that your DNALinkedList implementation.

You should not under any circumstances begin the DNA simulation until you have thoroughly verified the correctness of your DNALinkedList class. Any bugs in your DNALinkedList class will hinder progress on the DNA simulation.



Written Portion

Individually, each of you are responsible for also handing in the answer to the following questions. Your submissions can be hand or type-written; please bring a hard copy to class on Thursday, February 16. Your answers should be brief, explanations are not necessary except for #4.

  1. In class, we discussed the advantages and disadvantages of array lists. After having implemented linked lists, please list which methods of the DNALinkedList class are "fast" (i.e., O(1)) and which are not efficient (i.e., O(N) for a linked list of size N).

  2. Draw a memory diagram for the dnaLL variable immediately after returning from getStrandInput in your program. You can ignore all other variables in your program, but your diagram should clearly show what parts of dnaLL are in the heap and which are in the i main() stack frame. Assume you've read in the sequence TGC

  3. For the DNA simulation methods mutate, telomereDegrade, reverseSequence, cellDeath, and printSequence, identify which are more efficiently implemented using ArrayLists, which are more efficient with Linked Lists, and which would be the same.

  4. You have implemented a specific linked list called a singly-linked list. An alternative implementation, called a doubly-linked list, uses a node implementation that has a pointer to the previous node in addition to a pointer to the next node. Hypothesize which, if any, methods in the dnaList.h interface would be more efficient with this alternate representation.


Tips and Hints

Throwing and Handling Exceptions

Your linked list should handle invalid usage by throwing a runtime_error. There is an example in the BaseNode constructor. For example, if you may want to handle at attempt to get an invalid item:

throw std::runtime_error("Attempt to getItem out of bounds");

Here is an example of handling a runtime_error (not required in this lab, but suggested):

try{
  //do something dangerous
} catch (runtime_error& exc){
  cerr << "Warning: " << exc.what() << endl;
  cerr << "Ignoring illegal behavior." << endl;
}
This is similar to the example in class, but this catches a runtime_error pointer and stores the exception as a variable, exc. The what() method returns the actual string message that was thrown. Lastly, cerr is standard error output and works very similarly to cout except that a programmer usually uses it specifically to print warnings and errors. Don't forget to include the stdexcept library when handling or throwing exceptions.

Obtaining random values in C++

Technically, there is no such thing as a random value in programming. You can, at best, obtain pseudo-random numbers that seem random over a long period. C++ has a few libraries that try to be as random as possible, and we saw one of them in Lab 2 with the default Player. To use random, I suggest using the drand48() method. First, add the following libraries to your main source file:

#include <stdlib.h>
#include <time.h>
Second, seed the random method at the top of your main() function. If you use the same seed (e.g., srand48(0)), your random number generator will actually give you the same sequence of values every time you run your program. This can be helpful if you want to get predictable behavior, but if you want something closer to "random", put the following line at the top of main() instead:
srand48(time(0)); //bases the seed value off of the current time
To generate a random number in the range [0.0, 1.0) (i.e., upto but not including one), call drand48(). Let's say you want to get an integer value between 0 upto but not including 5, you can try:
int index = (int)(drand48()*5);
This multiplies the random value, so the range is [0.0, 5.0) and then truncates to an int, giving either 0, 1, 2, 3, or 4. You will need to use this for a few of your methods, including mutations (pick a base to mutate, and also pick a new random value for the base) as well as selecting which telomere end to degrade.
Extensions

NOTE: you are not required to implement any extensions, and no extra credit will be awarded if you do. You should only attempt this if you want to get extra experience with templates.

Implementing a specific list interface just to handle DNA molecules is not efficient. We could easily want to represent integers, strings, or more complex objects such as an MP3 library. The underlying data structues and algorithms are the same across each of these problems, only the type of the data changes. In the labs/04/template/ subdirectory, I have placed the equivalent starting point files for this lab but using templates to create a generic interface. Instead of a BaseNode class, there is a Node close that can take any type for a value.

Submit

Once you are satisfied with your program, hand it in by typing handin35 at the unix prompt.

You may run handin35 as many times as you like, and only the most recent submission will be recorded. This is useful if you realize after handing in some programs that you'd like to make a few more changes to them. Only one partner should hand in the code for the lab.

The written portion is due in class on Thursday and should be done individually