CS 91.3 Lab 4: Research Datasets

Due Tuesday, February 15, by midnight (23:59, EST)

Goals

The goals for this lab assignment are:

  • Learn how to find a research dataset

  • Learn how to find the related code

  • Get familiar with code of LDA

  • Get familiar with code of SVM

1. Download Datasets (10 min)

  • Download 'Data sets 2a' from official website. I have already shown you the procedures in class.

  • Download the 'description' file from the same page.

  • Download the Pre-processed dataset

2. Run the time() function (10 min)

  • Get your programming environment settings

  • Get the execution time of a Python program

import time

start = time.time()
print("hello")
end = time.time()
print(end - start)

3. Reproductivity (50 min)

  • Paper 1: 'Fast and Accurate Multiclass …​',

    1. Follow the instructions on the GitHub page.

  • Paper 2: 'Exploring Embedding Methods …​',

    1. Follow the instructions on the GitHub page.

  • For both of the two papers from PWC above:

    1. Download the existing code

    2. Set up the coding environment

    3. Run the existing code

    4. Track the execution time and write them down in your notes.txt file. For example, 'Total running time of the script: ( 0 minutes 4.159 seconds)'.

    5. Take the screenshots of your results after running the code.

4. LDA (10 min)

  • Example 1: 'Normal, Ledoit-Wolf …​'

    1. Download Python source code: plot_lda.py

  • Example 2: 'Comparison of LDA and PCA ..'

    1. Download Python source code: plot_pca_vs_lda.py'

  • For both of the two examples above:

    1. Download the existing code

    2. Set up the coding environment

    3. Run the existing code

    4. Track the execution time and write them down in your notes.txt file.

    5. Take the screenshots of your results after running the code.

  • Write in your own words what LDA is, in four to five sentences, in your notes.txt file.

5. SVM (10 min)

  • Example 1: 'SVM: Maximum margin …​'

    1. Download Python source code: plot_separating_hyperplane.py

  • Example 2: 'Plot different SVM …​'

    1. Download Python source code: plot_iris_svc.py

  • For both of the two examples above:

    1. Download the existing code

    2. Set up the coding environment

    3. Run the existing code

    4. Track the execution time and write them down in your notes.txt file.

    5. Take the screenshots of your results after running the code.

  • Write in your own words what SVM is, in four to five sentences, in your notes.txt file.

6. Your research project (Four hours)

7. Submission Guide

  • Each team only submits one file, lab_4_lastname1_lastname2.zip, including

    1. lab_4_lastname1_lastname2.PDF for your poster and paper draft, including 'Introduction' and 'Dataset' sections.

    2. notes_lab_4_lastname1_lastname2.txt for your notes.

    3. A screenshot folder for all the screenshots files (PNG or JPEG), total size less than 10 M.

8. Notes

  • Each team only needs to submit one ZIP file, with both names on it.

  • Email 'xqu1@swarthmore.edu' your Lab 4 files as lab_4_lastname1_lastname2.zip.

  • The team members from the same team may get the same score.

  • Lab assignments will typically be released on Wednesday and will be due by midnight on the following Tuesday. This lab was released on 02/09 and will be due by midnight on 02/15.