Machine Learning

Please be aware that elements on this page will change throughout the semester, particularly the course schedule. It is your responsibility to review this page periodically for updates.

Quick Links

Class Schedule
EdStem for course discussion/questions
Lecture Notes and Readings Folder on EdStem
CPSC 66 Github Organization
CIML Textbook
Guide to Remote Tools
Final Project

Course Basics

Instructor: Ben Mitchell, Sci 252C

Lecture: TTh 2:40pm-3:55pm, Sci 204

Lab:

Lab Section A

Wednesday 1:15pm-2:45pm

Science Center 240

Lab Section B

Wednesday 3:00pm-4:30pm

Science Center 240

Office hours:

Please note that the times listed here are ones I’ll definitely be available, but you are welcome to stop by my office any time the door is open. Additionally, you can always contact me to schedule a meeting at a different time.

Tuesday

12:30pm-2:30pm

Thursday

4:00pm-5:00pm

Waitlist Procedure

Due to an additional hire over the summer, we’ve been able to add an extra lab section; this section still has some open slots, which you can apply for using this special waitlist form: https://docs.google.com/forms/d/e/1FAIpQLSdRBngq_hwR0dRivAlZcbSeQYuyK6wAOdPLio36H7UIPprorw/viewform

Our hope is to fill these slots before the semester starts; once the semester has begun, we’ll go back to the normal waitlist procedure outlined below.

For Fall 2023, please use the following waitlist procedure to stay active in the course and to be considered for open positions:

See the general department lottery & waitlist information page
If you haven’t already, be sure to fill out the Department waitlist request form.
Stay current with the course lectures and labs by attending lecture and a lab for week 1. The Instructor(s) will stay in contact to provide updates. There are no guarantees that following the waitlist procedure will result in obtaining a seat; we will replace students that drop based on the department lottery priorities and with students who follow the above steps and who can fit the opened lab spot into their schedule.

Required Course Textbook

We will utilize A Course in Machine Learning by Hal Daume III as our primary text. It is an online, free ebook. We will also include additional reading material on the course schedule, most of which will also be available online for free.

Note that this book is only available in PDF form; if you want a hard copy, you should be able to get printouts made using TAP funds. Please let me know if you have any difficulties with this.

Additional References

These are all excellent books that I have read. However, some are geared more towards graduate students and researchers, so I did not choose them for our primary course textbook; we will be doing supplementary readings from some of them, but those will be freely available online. If you are looking to get deeper into the material, here are some suggestions.

Note that all of these books should be available on reserve in the library as well.

Elements of Statistical Learning by Trevor Hastie, Robert Tibshirani, and Jerome Friedman. We will be using this book (available for free online) for several supplementary readings on topics that CIML doesn’t cover in as much detail as I would like.
Machine Learning by Tom Mitchell ( Amazon link). Historically this is the gold standard; however, it is too expensive to be the required textbook, particularly given that its age means it does not cover several important topics. You may be able to find used versions for a reasonable price; there is also a reserved copy in the Cornell Library.
Introduction to Machine Learning by Ethem Alpaydin (Amazon link). If you are looking to purchase a hard copy, this is the one I recommend as it is reasonably priced and a good textbook. It is also available as an ebook available for free through the library’s ProQuest account.
Pattern Recognition and Machine Learning by Charles Bishop
Machine Learning: A Probabilistic Perspective by Kevin Murphy
Deep Learning by Ian Goodfellow, Yoshua Bengio, and Aaron Courville

Course Description

Welcome to CPSC 66. Machine learning is the study of algorithms that "learn" by finding patterns in examples provided to them. This course will introduce you to various frameworks (e.g., supervised learning) and associated algorithms for these frameworks (e.g., decision trees). The major aim of this course, however, is to develop an understanding of the entire machine learning pipeline rather than focus on the algorithm du jour. We will also spend a significant amount of time inspecting core concepts (e.g., generalization) from statistical and theoretical perspectives. With each topic, we will consider both the practical and open research questions at the heart of the field. You will be expected to implement solutions through lab assignments, but also digest and discuss readings that build off of lecture topics.

In addition to the technical side of ML, we will also be studying AI Ethics and the ways in which ML systems interact with society, with a focus on understanding the potential pitfalls of trying to generalize from real-world data. You will be expected to read, synthesize, and discuss ideas relating to ethics, data bias, and the possibility for ML systems to do harm (both through malice and through inattention).

To enroll in this course you must have completed CPSC 35. There are no other requirements, though familiarity with the basics of linear algebra, probability, discrete math, and calculus will be useful. The course will also cover a good deal of probability theory, and ethical frameworks, but much of this can be picked up from provided readings and lectures. This course is designated as a natural sciences and engineering practicum (NSEP) and qualifies as a Group 3: Applications course for the CS major/minor requirements.

Course Learning Goals

By the end of the course, you will understand:

several machine learning frameworks, including supervised learning, unsupervised learning, and hybrid approaches
various algorithms for the frameworks we explore, including the variation in data representation
how to choose and apply an appropriate framework and algorithm for a new problem
the core concept of generalization, and the associated theoretical tools for inspecting both our data and models
theoretical and empirical evaluation of performance
practical considerations for using data, including data preprocessing, feature engineering, understanding bias, and resource constraints
ethical considerations for the design, creation, and deployment of ML tools

Student Responsibilities

I have outlined the skills and objectives this course promises to provide you. For these promises to be upheld, you will need to commit to the policies outlined below. To succeed you should:

Attend class and lab The primary introduction to course material is through class lecture. Additionally, we often do learning exercises during class, which give you immediate experience with the material we are covering. Class and lab attendance is mandatory. While I am more than happy to help with any material in office hours, priority will be given to students who attend and participate in class. Office hours are not to make up for missed lecture.
Participate actively in learning process. Showing up is necessary, but not sufficient, for success in the course. To fully develop your analytical skills, you are expected to participate in class discussion. This includes active listening, asking questions during lecture portions, and engaging your peers during short class exercises. Studies show active involvement is the number one determinant of student success.
Prepare for lecture and lab You are expected to have done pre-reading before each meeting, as well as reviewing notes from the prior meetings. If you have not done so, you will be unprepared to follow the day’s material and participate in group discussions.
Start the lab assignments early If you get in the habit of doing this, you will be much better off. As the labs get longer and more difficult, starting early will give you plenty of time to mull over the lab problems even when you aren’t actively writing your solution.
Practice, practice, practice The only effective way to learn the material and pass the exams is to consistently do the labs, and to practice example problems presented in class and in the book. Forming study groups to go over practice problems and to review lecture and reading notes is a great way to prepare for exams.
Seek help early and often Because course material builds on previous material, it is essential to your success in this class that you keep up with the course material. There are a lot of sources of help: ask questions during lecture; ask your classmates (make sure you have read the Academic Integrity section for restrictions); get help during lab sessions; and come to office hours.

Resources

Class Resources

Help using git for lab assignments
CS and Unix Help Pages and Links (make, tar, git, debugging tools, editors, programming guides, screen and tmux, …)
CS Department’s Help Pages
Tools for remote lab assignments, ssh, Jupyter, …

Course Work

Assessment (Grading)

This is subject to change but is a rough estimate

30% Labs (3 to 4 total)
5% Homework assignments (3 to 4, lightly graded)
25% Final Project (with writeup and presentation)
10% Participation
30% Exams (2 Midterms, no final)

Class Participation

As discussed in responsibilities, your participation involves:

Required attendance to lecture and lab
Active participation in lecture
Active engagement in the class discussion groups
Concept quizzes and/or class prompts if and when they are used

Working With Partners

Partnerships where partners work mostly independently rarely work out well and frequently fail to produce complete, correct and robust solutions. More importantly, they don’t result in both partners learning well, which is ultimately the point of doing the work in the first place. Partnerships where partners work side-by-side for all or most of the time tend to work out very well, with both partners learning (and accomplishing) more than they would alone.

You and your partner are both equally responsible for initiating scheduling times when you can meet to work together, and for making time available in your schedule for working together.

For partnered lab assignments, you should follow these guidelines:

The expectation is that you and your partner are working together side by side in the lab for most, if not all, of the time you work on partnered lab assignments.
You and your partner should work on all aspects of the project together: initial top-down design, incremental testing and debugging, and final testing and code review.
When working together, follow the practices of pair programming (where one of you types and one of you watches and assists, ensuring you swap roles periodically, taking turns doing each part).
If meeting in person is not possible, pair programming can be done via screen-sharing and voice chat (e.g. Zoom); working together remotely isn’t as good as working together in person, but it’s better than working alone.
There may be short periods of time where you each go off and implement some small part independently. However, you should frequently come back together, talk through your changes, push and pull each other’s code from the git repository, and test your merged code together.
You should not delete or significantly alter code written by your partner when they are not present. If there is a problem in the code, then meet together to resolve it.
If there is any issue with the partnership, contact the professor.

Taking time up front to design a plan for your solution together and to doing incremental implementation and testing together may seem like it is a waste of time, but in the long run it will save you a lot of time by making it less likely that you have design or logic errors in your solution, and by having a partner to help track down bugs and to help come up with solutions to problems.

Policies

Academic Integrity

The general ethos of this policy is that actions which shortcut or avoid the learning process are forbidden, while actions which promote learning are encouraged.

Studying lecture materials together, for example, provides an additional avenue for learning and is encouraged. Using a classmate’s solution, however, is prohibited because it avoids the process of doing the work; since doing the work is how much of the learning takes place, avoiding the work inherently means avoiding the learning as well. Note that this applies to generative AI tools (e.g. chatGPT, GitHub CoPilot) just the same way it does to any other resource.

If you have any questions about what is or is not permissible, please contact your instructor.

Academic honesty is required in all your work. Under no circumstances may you hand in work done with or by someone else, or produced by an algorithm, without making the source of the work explicitly clear.

Discussing ideas and approaches to problems with others on a general level is encouraged, but you should never share your solutions with anyone else nor allow others to share solutions with you. You may not examine solutions belonging to someone else, nor may you let anyone else look at or make a copy of your solutions. This includes, but is not limited to, obtaining solutions from students who previously took the course or solutions that can be found online. You may not share information about your solution in such a manner that a student could reconstruct your solution in a meaningful way (such as by dictation, providing a detailed outline, or discussing specific aspects of the solution). You may not share your solutions even after the due date of the assignment.

In your solutions, you are permitted to include material which was distributed in class, material which is found in the course textbook, and material developed by or with an assigned partner. In these cases, you should always include detailed comments indicating on which parts of the assignment you received help and what your sources were.

The use of generative AI tools (e.g. chatGPT, GitHub CoPilot, etc.) without permission is also considered to be unauthorized collaboration with an outside source and is a violation of our academic integrity policy. If you feel these tools would be appropriate in a given context, feel free to ask the instructor. Note that even if permission is granted, these sources must be properly attributed.

When working on tests, exams, or similar assessments, you are not permitted to communicate with anyone about the exam during the entire examination period (even if you have already submitted your work). You are not permitted to use any resources to complete the exam other than those explicitly permitted by course policy. (For instance, you may not look at the course website during the exam unless explicitly permitted by the instructor when the exam is distributed.)

Failure to abide by these rules constitutes academic dishonesty and will lead to a hearing of the College Judiciary Committee. According to the Faculty Handbook:

Because plagiarism is considered to be so serious a transgression, it is the opinion of the faculty that for the first offense, failure in the course and, as appropriate, suspension for a semester or deprivation of the degree in that year is suitable; for a second offense, the penalty should normally be expulsion.

This policy applies to all course work, including but not limited to code, written solutions (e.g. proofs, analyses, reports, etc.), exams, and so on. This is not meant to be an enumeration of all possible violations; students are responsible for seeking clarification if there is any doubt about the level of permissible communication.

Absence / Late Work

For lab assignments, each student may use up to a total 5 late days across the entire semester, no questions asked. Late days do not apply to homework, quizzes, or other forms of assessment. A late day is a considered a full 24-hours (i.e., 15 minutes late is the same as 23 hours late) and late days are counted against all lab partners.

To use your late days, you must make a private Ed post to Instructors after you have completed the lab and pushed to your repository. You do not need to inform anyone ahead of time, and you do not need to provide a reason. When you use late days, you should still expect to work on the newly-released lab during the following lab section meeting. The professor will always prioritize answering questions related to the current lab assignment during lab meetings and office hours. In the rare case in which only one partner has unused late days, the partnership can use the late days, barring a consistent pattern of abuse.

If you feel that you need an extension on an assignment or that you are unable to attend class for two or more meetings due to a medical condition (e.g., extended illness, concussion, hospitalization) or other emergency, you must contact the dean’s office and your instructors. Faculty will coordinate with the deans to determine and provide the appropriate accommodations. Note that for illnesses, the College’s medical excuse policy, states that you must be seen and diagnosed by the Worth Health Center if you would like them to contact your class dean with corroborating medical information.

Academic Accommodations

If you believe you need accommodations for a disability or a chronic medical condition, please contact Student Disability Services via email at studentdisabilityservices@swarthmore.edu to arrange an appointment to discuss your needs. As appropriate, the office will issue students with documented disabilities or medical conditions a formal Accommodations Letter. Since accommodations require early planning and are not retroactive, please contact Student Disability Services as soon as possible. For details about the accommodations process, visit the Student Disability Service Website

You are also welcome to contact me privately to discuss your academic needs. However, all disability-related accommodations must be arranged, in advance, through Student Disability Services.

To receive an accommodation for a course activity you must have an official Accommodations Letter and you need to meet with me to work out the details of your accommodation at least two weeks prior to any activity requiring accommodations.

Final Project

Detailed information about the final project can be found here:

Final Project Page

Class Schedule

This is a tentative schedule. It will be updated as we go. We recommend that you review Tips for reading CS texbooks to help you determine what to focus on and how to get the most out of required readings.

WEEK	DAY	ANNOUNCEMENTS	TOPIC	ASSIGNMENT
1	Sep 05		Introduction to Machine Learning CIML: Ch. 1 (but don't worry about the details in 1.3) Supplement: Machine Learning by Tom Dietterich in Nature Encyclopedia of Cognitive Science, 2003 (skim Sect 4-8)	Lab 0 - Python and Jupyter Warmup
1	Sep 07		Nearest-Neighbor Classifiers CIML: Ch 3 (except 3.4)	Lab 0 - Python and Jupyter Warmup
2	Sep 12		Decision Trees AI: Myth & Reality article (on EdStem) CIML: Ch. 1 (focus on 1.3)	Lab 1 - KNN imprlementation Reading: AI Myth and Reality
2	Sep 14	Drop/add ends (Sep 18)		Lab 1 - KNN imprlementation Reading: AI Myth and Reality
3	Sep 19		Ensemble Learning Methods CIML: Ch. 13 Ensemble Methods in Machine Learning by Tom Dietterich Optional: Explaining the Success of AdaBoost and Random Forests as Interpolating Classifiers by Wyner, Olson, Bleich, and Mease, JMLR, 2017	Lab 2 - Decision Trees/Ensembles implementation Reading: Machine Bias article
3	Sep 21
4	Sep 26		Linear and Logistic Regression Linaer Algebra Primer Vectors, what even are they? Linear transformations and matrices Dot products and duality (focus on first 4 minutes) Notation in ISL Linear Regression: Ch 3 - 3.1.1 and 3.2 - 3.2.1 (inclusive), ISL Logistic Regression: Ch 4-4.3, ISL Optional: Sec. 3 of Naive Bayes And Logistic Regression Supplement: Sec. 2 of Andrew Ng's Notes
4	Sep 28
5	Oct 03		Regularization; Bias-Variance Tradeoff Ch 2.2.1 and 2.2.2. ISL CIML: Ch. 5.9	Lab 3 - Linear & Logistic Regression implementation Reading: Ethical Frameworks
5	Oct 05
6	Oct 10		Support Vector Machines and Kernels; Probabilistic Models Ch 9 ISL A User's Guide to Support Vector Machines by Asa Ben-Hur and Jason Weston CIML: Ch. 9 An Introduction to Graphical Models by Kevin Murphy Optional: Ch. 8 Graphical Models in Pattern Recognition and Machine Learning by Charles Bishop. A more technical treatment of the subject.
6	Oct 12		Midterm 1
	Oct 17	Fall Break
	Oct 19	Fall Break
7	Oct 24		Evaluation Methodology & Practical Considerations CIML: Ch. 2 and Ch. 5 Recommended: 8.3 and 8.4 of Manning et al. for more depth on PR, ROC, F1, etc.	Lab 4 - Algorithm Tuning, Evaluation & Comparison Reading: History of Statistics
7	Oct 26
8	Oct 31		Real-world Data and Applications CIML: Ch. 6
8	Nov 02		Anomaly Detection; Recommender Systems
9	Nov 07		Unsupervised Learning; Dimensionality Reduction; Semi-Supervised Learning CIML: Ch. 3.4, Ch. 15, Ch. 16 Semi-Supervised Learning Literature Survey by Jerry Zhu.	Final Project Reading: TBD
9	Nov 09	CR/NC and Withdraw deadline (Nov 10)
10	Nov 14		Deep Learning Readings: Deep Learning: A Critical Appraisal by Gary Marcus Deep Learning by Ian Goodfellow, Yoshua Bengio, and Aaron Courville
10	Nov 16
11	Nov 21
11	Nov 23	Thanksgiving Break
12	Nov 28		Deep Learning Readings: Deep Learning: A Critical Appraisal by Gary Marcus Deep Learning by Ian Goodfellow, Yoshua Bengio, and Aaron Courville (continued)
12	Nov 30
13	Dec 05
13	Dec 07		Midterm 2
14	Dec 12		Course Wrapup

CPSC 66: Machine Learning — Fall 2023

Quick Links

Course Basics

Waitlist Procedure

Required Course Textbook

Other Required Readings

Additional References

Course Description

Course Learning Goals

Student Responsibilities

Resources

Course Work

Assessment (Grading)

Class Participation

Working With Partners

Policies

Academic Integrity

Absence / Late Work

Academic Accommodations

Final Project

Class Schedule

Introduction to Machine Learning

Nearest-Neighbor Classifiers

Decision Trees

Ensemble Learning Methods

Linear and Logistic Regression

Regularization; Bias-Variance Tradeoff

Support Vector Machines and Kernels; Probabilistic Models

Midterm 1

Evaluation Methodology & Practical Considerations

Real-world Data and Applications

Anomaly Detection; Recommender Systems

Unsupervised Learning; Dimensionality Reduction; Semi-Supervised Learning

Deep Learning

Deep Learning

Midterm 2

Course Wrapup