WG2 ITICSE99 under construction

Sorting, interfaces, and polymorphism
by Charles F. Kelemen

Under construction. Still rough.

This consists of a series of examples and exercises to introduce a simple interface and the power of abstraction and polymorphism in what is hopefully a small but not trivial example.

Suppose we are asked to help a college with a small part of its data processing needs. The college has a file containing records on students, staff, and faculty. Each record has fields as follows:

student fields: name, ID number, graduation year, advisor, dorm
staff fields: name, ID number, months employed at college
faculty fields: name, ID number, department, phone extension

Here is a small portion of the file:

stu William 183 1999 Kelemen Mertz stf Don 640 12 stf Bill 643 12 stu Gabriel 180 2002 Kelemen Parrish fac Kelemen 520 CS 8123 stu Brandon 176 2000 Kelemen ML

Each record in the file begins with a 3 letter code indicating whether it is a student(stu), staff(stf), or faculty(fac) record. The file is in no particular order.

We would like to be able to sort these records in different ways. For example, we might want them all in ascending order by name for a phone directory. we might want them in ascending order by ID number for some other purpose.

Our goal will be to develop a sort method that we have confidence in; and then to show how we can use it to achieve many different orderings without changing the sort method at all. We will introduce student objects (one for each student), staff objects, and faculty objects. We will put them all in the same array of objects. Since these objects have different shape (e.g. some have advisor fields-some do not), the array will be heterogeneous and many of the methods invoked will be polymorphic (that is a certain method name and signature may be bound to different definitons at execution time).

Why does a module like this belong in a next generation CS1 course?

I use this module to help students learn the following:

the importance of scaffolding and coding in small increments
the use of loop invariants to prove an algorithm correct
the analysis of the running time of a non-linear algorithm
the power of abstraction
the use of interfaces, subclasses, and inheritance
the idea that by the use of careful specification, interfaces, and stubs, a project can be broken down into independent modules to be coded in parallel by independent teams

I believe that this module is best taught in a class where each student is sitting at a workstation and the instructor has a workstation which can be projected onto a screen so that every student can read computer source code on the projection. There should be a white board next to the projection screen so that the instructor can explicate projected code. As I read the Web pages illustrating the idea of starting small and working in small increments, I am struck by how boring reading about this is. When I teach this, my students are making the changes and compiling, debugging and executing. I am guiding the overall flow and walking around the room helping those who are stuck. Occasionally I give mini-lectures but most of the time the students are working on their own or along with me. That is much more exciting than reading or listening to a lecture. For many this may not be new, but it is an appropriate approach to a next generation course. The key idea is to get students to take an active role both outside of class and inside class.

Depending upon your teaching facilities and curriculum this 'guide on the side' versus 'sage on the stage' may not be new. In fact, it is possible that for many, none of the bulleted points above are new. Nonetheless, they are appropriate material to include in a next generation CS1.

When given a problem for computer solution, many students (especailly smart students) rush to the computer and start entering code. They write the whole program; then try to compile and run the program. This works for easy exercises at the beginning of a CS1 course. For some students it may even work for programs that require 50-100 lines of code. But at some point it fails for all. At that point (let's say a program of 100 lines of code) programming becomes very frustrating. With 100 lines of code, even syntax errors may be difficult to find. But the more insideous logical errors could be anywhere in 100 lines. It becomes very frustrating to think you have discovered an error only to have the program fail over and over. Here is what i tell my students.

I do not code programs in one step. The first thing I do is think about the overall design of a solution away from the keyboard. I consider both top-down and bottom-up design. Once you know about object-oriented ideas, you should ask, "Wat are reasonable objects?", "What kind of communication should take place between objects?", "What should be public, private, protected?", "How can I exploit inheritance?", "What are good superclasses?", etc. Once I have an outline of the overall design, I can begin to think about coding. You can code in small steps whether you choose to code in a top-down manner (use stubs) or in a bottom up manner (use simple 'driver' programs). I take many small steps compiling and testing at each step. Each time a small step works I get a little Computer Science High (these are absolutely legal in every state). To avoid gigantic downers, develop your programs in small steps so you get lots of highs. I recommended that you start with a very small (like 5-15 lines of code) program that you compile and run. Then add at most 5-15 lines; compile debug and run. At each iteration, you should be certain that yoaur code so far is correct. In order to accomplish this you need to add 'scaffolding'. This is code that may not appear in the final product, but allows you to be certain that your current code is correct so far. Figuring out good scaffolding goes hand in hand with loop invariants and really understanding how your program works. You should never have to spend hours until you get something to compile. You should always have a program that is no more than 15 lines different from your previous version ( that you know runs perfectly and that you understand). Then when you have an error in your latest attempt, you can be pretty certain that your error is within 15 lines of code. When you get an error you must find out why? If it won't compile, try paying attention to the error messages (although they can sometimes be misleading). If necessary 'comment out' part of your code until you get something that does compile and then add things a bit at a time. If the program does compile but does not run correctly, your scaffolding should let you know where the problem occurs. If not, you need to add more scaffolding. Understand what is wrong before you try to fix it.

Periodically, leave the keyboard and think again about the overall design. Frequently, as you code, you may discover ommissions or flaws in your original design. You may also think of a new improved approach to the whole problem.

The bottom line is to get lots of computer science highs by coding in increments of 5-15 lines. You may have to make corrections but the bugs will be easy to find and patch. You get frequent highs by seeing your own code compile and work. The name 'pedagogical pattern' is new to me (the idea is not). So let's call this the HIGH 5 (to 15) pattern.

There are a number of topics that students should be exposed to in the next generation CS1 course even if they are covered in considerably more detail in later courses. In order for the coverage in later courses to 'take', some exposure in early courses is beneficial. Two of these topics are program verification and analysis of algorithms. I use loop invariants in insertion sort for ints to prove that the program is correct. In CS1, I do this in a semi-formal manner. That is, more carefully than hand waving, but less formally than might be done in an algorithms course. I point out that program verification should include both reasoning about the correctness of a program and careful testing of the program. Loop invariants almost always help one decide on appropriate scaffolding to help underestand the operation of a program (evidence of what is really happening -- not what you hope is happening). Insertion sort is also a simple but not trivial example to use to help students begin to analyze running time. I point out that a complete analysis should include both a 'theoretical' analysis (big-Oh type) and experimental timing tests. Students actually enjoy this.

Once we have understood insertion sort for ints, we can ask, "What do ints really need in order to allow this kind of sort?" This is a process of abstraction. We discover that all that is needed is a notion of precedence between any pair of ints. We can then write the sort using a method precedes and introduce interfaces and the idea of 'implementing' an interface to enforce the existence of a precedes method. While students may have already used more complicated interfaces or abstract types, the Sortable interface has only one method. By writing it and working with it themselves, students come away with a good understanding of what a formal interface can contributre to a project.

On with the module.

Before we launch into sorting a heterogeneous array, let's begin by developing a solid sorting method for ints. After we understand this, we will deal with extending it to more general objects.

Click on Insertion Sort for ints to get started.

Once you understand insertion sort for ints, click on Insertion Sort for Objects to extend our sort to more general Objects.

After you have mastered insertion sort for Sortable Objects like MyInts, click on Heterogeneous arrays and how to sort them .

WG2 ITICSE99 under construction

Sorting, interfaces, and polymorphism by Charles F. Kelemen

Why does a module like this belong in a next generation CS1 course?

On with the module.

Sorting, interfaces, and polymorphism
by Charles F. Kelemen