1. Introduce Lab 5

Today we are going talk about writing a large SQL program for Lab 5: Movie Database.

It may be helpful to refer to past information about SQLite:

In addition, you will want to examine these resources before starting:

The goal of this lab is to learn how to create a large database application using Embedded SQL.

2. Implementation Strategy

We will preview and suggest the following implementation order:

  1. Begin by implementing createDB.py.

    1. Review the provided code. Refer to SQLite in Python tutorials to better understand the main steps (for example, how to connect and send instructions to an SQLite instance). The main method has been provided.

    2. Begin implementing createTables(). Look for TODO or pass statements. pass is a useful placeholder for code that needs to be written, so be sure to remove these statements once you have written the methods.

    3. We will step through how to create the Actor table in lab which will follow this format:

      db.execute("""CREATE TABLE Actor(
                    id INTEGER,
                    # define more attributes and constraints
                    )""")

      db.execute() takes a query as a string. Note that the string will match exactly with the query you would enter into SQLite. Above, we show an example of multi-line strings in Python. While this is not required, it allows you to type the query on multiple lines of code and thus makes your code easier to read. This is highly recommended.

    4. Implement the rest of the 5 schemas on your own.

    5. To verify your work, periodically open your database in SQLite or by using the sqlitebrowser application to provide a visual GUI representation. Check that your key constraints are correct.

      python3 createDB.py /local/userID/movieDB
      sqlite3 /local/userID/movieDB # open the DB you created in sqlite for testing
      sqlitebrowser /local/userID/movieDB # another option, but may not work well for slow connections
    6. Implement insertAll() to read in the provided files and populate the tables. If you haven’t done so already, you will want to read the page on tips for speeding up performance. You will not want to use your home directory to create your database - it will be very slow and eat up your quota.

    7. Insert one relation at a time and check your results to make sure everything looks correct. This method will require file parsing (use the main writeup and this page for tips).

    8. We will look at an example of inserting many tuples at once.

      tuples = []
      tuples.append(["111", "Robert", "DeNiro", "M"])
      tuples.append(["222", "Carl", "Weathers", "M"])
      db.executemany("INSERT INTO Actor VALUES(?,?,?,?)", tuples)

      Note that tuples above uses toy examples; in actuality you will create your array by parsing the provided data files. But this will allow you to insert all of the tuples for a relation at once, rather than one a time.

    9. Implement the rest of the inserts, one relation at a time. Once this is done you should not have to run createDB.py again except to (re)build indexes.

  2. You can now shift to working on queryDB.py.

    1. Read the provided code - you will need to utilize all of the provided methods at some point.

    2. Follow the same advice as before to start implementing (look for todo and pass statements to find methods you need to implement).

    3. You will need to return to createDB.py to build (B+ Tree) indexes. To reconstruct just the indexes, first add code to create an index in the buildIndexes() method and then run createDB.py while selection option 2 to rebuild indexes. This will keep all existing tables and tuples intact.

      python3 createDB.py /local/userID/movieDB
      File already exists.  Would you like to:
        0) Exit the program
        1) Remove the file and rebuild the entire DB
        2) Keep the file and rebuild the indexes only
      Enter choice: 2
      ...Removing Indexes..
      ...Building Indexes...
    4. We recommend writing your queries in SQLite directly and then copying the query to your Python code when you are satisfied with the results.

    5. Use EXPLAIN QUERY to print out the query plan for your queries. If you built an index but it is not being uses here, that means the query optimizer did not find your index to be helpful. You should only keep indexes that are being used by at least one query. An example where an index named MovieIndex on Movie(id) is used would be:

SEARCH TABLE Movie AS M USING COVERING INDEX MovieIndex (id=?)