CS68 - Bioinformatics
Spring 2013

Schedule | Grading | Lab policy | Links


  • The Final Exam study guide is available. There is a review session on Friday from 2-4pm in Science Center 240.
  • Please complete the course evaluation on Moodle by Wednesday, May 8
  • Review Paper Guidelines are available. A paragraph abstract is due Monday, April 29 and the final paper is due by the end of exam week.

Provide anonymous course feedback here. Please be constructive in any comments

This syllabus is a living document; please be aware that many elements on this page will change throughout the semester, including the course schedule. It is the student's responsibility to review this page periodically for updates.

Course information

Course Title: CPSC 68 - Bioinformatics (cross-listed as BIO 68). 1 credit. Satisfies Group 3 major requirement.
Lecture: TR 1:15pm - 2:30pm Science Center 183
Lab: F 1:00-2:30pm Science Center 240

Required textbook: Biological Sequence Analysis by Durbin, Eddy, Krogh, and Mitchison.

Prerequisites: Data Structures and Algorithms (CS 35); interest in learning basic molecular biology and probability theory

Instructor information

Professor: Ameet Soni
Office: Science Center 253
Phone: 610-957-6288
Office hours: Mondays, 2pm - 4pm OR by appointment

Course Description

Welcome to CS68. This course is an introduction to the fields of bioinformatics and computational biology, with a central focus on algorithms and their application to a diverse set of computational problems in molecular biology.

Computational themes will include dynamic programming, greedy algorithms, supervised learning and classification, data clustering, trees, graphical models, data management, and structured data representation. Applications will include genetic sequence analysis, pairwise-sequence alignment, phylogenetic trees, motif finding, gene-expression analysis, and protein-structure prediction.

While significant time will be spent exploring the biological significance of problems, the central focus in this course will be on understanding how to develop algorithms for complex problems. In particular, the general question we will answer is "How does one reason about large amounts of complex data?" That is, how do we uncover underlying phenomena and draw conclusions in the face of large data sets with noisy, intricate relationships? While this question is presented in context of problems in molecular biology, it applies to open problems across all of the sciences. We will see that many of the algorithms we cover have applications and foundations in far-reaching domains including natural language, social network analysis, security, and search.

Course Promises

By the end of the course, you will understand:

  • the wide diversity of data produced by biological experiments
  • the inherent difficulties in analyzing/understanding "real-world" data sets that are large, complex, and noisy
  • the computational problems that arise in an effort to store, process, and analyze these data sets
  • the core set of algorithms utilized in computational biology to handle many of these problems
  • the biological and societal impact of these algorithms in being able to uncover biological phenomena and/or generate novel hypotheses
  • the basics of probabilistic modeling for handling uncertainty of information, and the idea of inference to draw conclusions under uncertainty
  • several categories of algorithms seen across computer science including dynamic programming, search, greedy algorithms, and approximate algorithms
  • the connection between the theory of algorithms covered and the real- world practical problems that arise with large data sets and limited computational resources
  • the connection from theory to bench - the bioinformatic toolkits currently used by biologists

Student Responsibilities

I have outlined the skills and objects this course promises to provide you. For this promises to be upheld, you will need to commit towards the policies outlined below.

Lecture attendance is required. While I am more than happy to help with any material in office hours, priority will be given to students who show up and participate regularly in class. Office hours are not to make up for a missed lecture.

Lab attendance is required. New lab assignments will be introduced in our Friday lab sessions, and lab sessions will sometimes contain new course material, required practice exercises, and written quizzes.

Merely showing up to class is not sufficient for success in this course. Students are expected to be active in the learning process. This includes asking and answering questions as well as working with classmates during small group break sessions. Students are expected to review the previous lecture's notes and the reading prior to showing up to class. Studies have shown that active involvement is the number one determinant of student success. Besides, your participation grade is based on your involvement in classroom discussion!

Lab assignments are meant to help you learn the material. It is very important that students approach assignments with this view. This means that labs should be started early and completed with academic integrity (see policy below). If you work with a lab partner on an assignment, you must design and complete most of the program together; failing to do so will only hurt you on exams. You are expected to contribute equally to the assignment, otherwise I will intervene and reassign lab partners.

Assessment (Grading)

5%Class participation
35%Lab assignments
25%Final exam

Academic accommodations

If you believe that you need accommodations for a disability, please contact Leslie Hempling in the Office of Student Disability Services, located in Parrish 130, or e-mail lhempli1 to set up an appointment to discuss your needs and the process for requesting accommodations. Leslie Hempling is responsible for reviewing and approving disability-related accommodation requests and, as appropriate, she will issue students with documented disabilities an Accommodation Authorization Letter. Since accommodations may require early planning and are not retroactive, please contact her as soon as possible. For details about the Student Disabilities Service and the accomodations process, visit here.

You are also welcome to contact me privately to discuss your academic needs. However, all disability-related accommodations must be arranged through Leslie Hempling in the Office Of Student Disability Services.

To receive an accommodation for a course activity, you must have an Accomodation Authorization letter from Leslie Hempling and you need to meet with me to work out the details of your accommodation at least two weeks prior to any activity requiring accommodations.

Academic integrity

Academic honesty is required in all work you submit to be graded. With the exception of your lab partner on approved lab assignments, you may not submit work done with (or by) someone else, or examine or use work done by others to complete your own work. Your code should never be shared with anyone; you may not examine or use code belonging to someone else, nor may you let anyone else look at or make a copy of your code. This includes sharing solutions after the due date of the assignment.

All code you submit must be your own with the following permissible exceptions: code distributed in class, code found in the course text book, and code worked on with an assigned partner. In these cases, you should always include detailed comments that indicates on which parts of the assignment you received help, and what your sources were.

Discussing ideas and approaches to problems with others on a general level is fine (in fact, we encourage you to discuss general strategies with each other), but you should never read anyone else's code or let anyone else read your code. You may discuss assignment specifications and requirements with others in the class to be sure you understand the problem. In addition, you are allowed to work with others to help learn the course material. However, with the exception of your lab partner, you may not work with others on your assignments in any capacity.

``It is the opinion of the faculty that for an intentional first offense, failure in the course is normally appropriate. Suspension for a semester or deprivation of the degree in that year may also be appropriate when warranted by the seriousness of the offense.'' - Swarthmore College Bulletin (2008-2009, Section 7.1.2)

Please see me if there are any questions about what is permissible.


Note: this is a tentative schedule. As this is the first time teaching the course at Swarthmore, there will be adjustments made to adjust for the pacing of the course and the interest of students in later topics

1 Jan 22   Introduction to Bioinformatics; Molecular Biology Lab 1 - Databases and Central Dogma
Jan 24  
2 Jan 29   Pairwise Sequence Alignment - Detecting homology
  • Durbin et al., 2.1-2.4
Lab 2 - Dynamic Programming and Pairwise Seq. Alignment
Jan 31 Drop/Add ends (Feb 01)
3 Feb 05   Heuristic Alignment Methods
Feb 07  
4 Feb 12   Multiple Sequence Alignment
  • Durbin et al., 6-6.4
Lab 3 - Multiple Sequence Alignment
Feb 14  
5 Feb 19   Phylogentic Trees - Inferring evolutionary relationships
  • Durbin et al., Chapter 7
Lab 4 - Inferring the Tree of Life
Feb 21  
6 Feb 26 Midterm 1 study guide
Feb 28 Midterm 1 (in lab)
7 Mar 05   Introduction Probability and Statistics
  • Durbin et al., Chapter 1
  • Durbin et al., Chapter 11 (at a high level)
Mar 07  

Mar 12

Spring break

Mar 14

8 Mar 19   Probabilistic Sequence Models - Finding patterns using Markov Models
  • Durbin et al., Chapter 3
Lab 5 - Gene Finding using Markov Models
Mar 21  
9 Mar 26   Probabilistic Sequence Models - Hidden Markov Models and its applications Lab 6 - Problem Set for HMMs
Mar 28 CR/NC and Withdraw
deadline (Mar 29)
10 Apr 02   Lab 7 - Profile HMMs for Multiple Sequence Alignment
Apr 04  
11 Apr 09 Midterm 2 study guide Introduction to Functional Genomics - Gene Expression Data
Apr 11 Midterm 2 (in lab)
12 Apr 16   Clustering Algorithms
Apr 18  
13 Apr 23   More clustering; Supervised Learning Lab 8 - Classification and Clustering of Gene Expression Data
Apr 25  
14 Apr 30   Protein Structure Prediction Review paper
May 02 Final Exam study guide

May 14

Final 2:00pm–5:00pm Sci 183