Today we are going to continue working on Lab 5: Movie Database. Be sure to examine MovieDB Introduction for tips on speeding up your program and on helpful tips.

It may be helpful to refer to past information about SQLite:

as well as these resources:

1. Saving Query Output as a Relational Table

For each of the required queries, there exists (at least one) single query solution. However, as computer scientists, we learn that it is good to break a hard problem into multiple parts. For some of the more difficult queries, it may be useful to break the problem into multiple parts and solve the smaller subproblems. One useful strategy is to write multiple queries and use the output of one query as an input to the next (i.e., compose your queries). Below, we show an example of this can be done.

Let’s say we want to satisfy the following query: "List all actors who have been cast in at more than 4 Quentin Tarantino films." Take a minute and think about how you might solve this problem before moving on.

Waiting

Okay, here is one potential solution where we break the query into two parts. First, we want to identify all of the movies that Quentin Tarantino has directed:

queryPart1 = """
              SELECT DM.movieID
              FROM DirectsMovie DM, Director D
              WHERE DM.directorID = D.id AND D.lname = 'Tarantino' and D.fname = 'Quentin'
             """

We join the DirectsMovie and Director tables and only keep films where the director is named "Quentin Tarantino". Now we’d like to use this as the input to the second part of our solution: "find the actors that were in those movies".

First, we need to save the results from part 1. We can do this using the CREATE TABLE X AS command which defines a table based on a query result:

queryPart1 = """
              CREATE TABLE TarantinoFilms AS
                SELECT DM.movieID
                FROM DirectsMovie DM, Director D
                WHERE DM.directorID = D.id AND D.lname = 'Tarantino' and D.fname = 'Quentin'
             """
db.execute(queryPart1) #at this point, you should check the table in sqlite3

2. Using the Saved Query Output

You should load up sqlite3 and verify that the table exists and the results look reasonable (here is his IMDB page. Note that our dataset is a bit outdated so newer movies won’t show up in your query result). Now, we can use TarantinoFilms just like any other relation in our database. Here is how we can find all actors who have been in one of the Tarantino films from Part 1:

queryPart2 = """
              SELECT A.id, A.fname, A.lname, COUNT(*)
              FROM TarantinoFilms T, Casts C, Actor A
              WHERE T.movieID = C.movieID and A.id = C.actorID
              GROUP BY A.id
              HAVING COUNT(*)>3
             """

We join Casts with Actors and TarantinoFilms which gives us information about all Actors who have starred in a Tarantino movie. Then, we add a GROUP BY and HAVING command to group all of the instances for each actor and only keep actors with more than 3 instances. We can now execute, fetch results, and print results for this query.

id                   fname                lname                COUNT(*)
--------------------------------------------------------------------------------
51688                Michael              Bacall               4
82772                Lawrence             Bender               5
496647               Samuel L.            Jackson              4
647382               Michael              Madsen               4
806512               Michael              Parks                4
840200               Stevo                Polyi                4
1038507              Quentin              Tarantino            8
1141844              Bruce                Willis               4
1343554              Julie                Dreyfus              4
1794091              Uma                  Thurman              4
1810514              Venessia             Valentino            4

3. Cleaning Up Temporary Tables

At this point, you’ve done the hard work. But you will need to drop the temporary table so it doesn’t linger in the database (the DB should be in the same state as it was before you ran the query).

After your query is done, you can run a DROP TABLE command

db.execute("DROP TABLE TarantinoFilm")

Also, since your query may crash during development, you will want to preemptively drop any tables so that you don’t have to recreate the whole database. Add IF EXISTS to drop any tables that need to be cleaned up before starting your query:

db.execute("DROP TABLE IF EXISTS TarantinoFilm")

4. Result

Bringing it all together, this is the complete solution:

db.execute("DROP TABLE IF EXISTS TarantinoFilm") # in case your program crashed before cleaning this up last time


queryPart1 = """CREATE TABLE TarantinoFilms AS
                 SELECT DM.movieID
                 FROM DirectsMovie DM, Director D
                 WHERE DM.directorID = D.id AND D.lname = 'Tarantino' and D.fname = 'Quentin'
              """


queryPart2 = """
              SELECT A.id, A.fname, A.lname, COUNT(*)
              FROM TarantinoFilms T, Casts C, Actor A
              WHERE T.movieID = C.movieID and A.id = C.actorID
              GROUP BY A.id
              HAVING COUNT(*)>3
            """

db.execute(queryPart1) #create the table
db.execute(queryPart2) #process the query
# todo, fetch and print results.  This will only output results from the most
#  recent call to db.execute()

db.execute("DROP TABLE TarantinoFilm") # delete the temporary table