CS 63 - Homework Five

Due 11/17/09


1. Probability Basics (25 pts.)

(R&N 13.9) Using only the basic laws of probability theory (the three axioms of probability, the definition of conditional probability, the product rule, and/or Bayes' rule), prove the following theorems:

(a) (12 pts.) Prove the conditionalized version of the general product rule:

P(A ^ B | E) = P(A | B ^ E) P(B | E)
(b) (13 pts.) Prove the conditionalized version of Bayes' rule:

P(A | B ^ C) = P(B | A ^ C) P(A | C) / P(B | C)

 

2. Bayesian networks and probability (45 pts.)

A CS 63 student notices that people who drive SUVs (S) consume large amounts of gas (G) and are involved in more accidents (A) than the national average.  In this problem, uppercase letters denote the variable names, lowercase values denote the variables with value. For example, P(a, ~s) is the same thing as P(A = t, S = f). She has constructed the following Bayesian network:

(a) (5 pts) Compute P(a, ~s, g) using the chain rule.

(b) (10 pts.) Compute P(a) using inference by enumeration.

(c) (10 pts.) Using conditional independence, compute P(~g, a | s) and P(~g, a | ~s).  Then use Bayes' rule to compute P(s | ~g, a).

The enterprising student notices that two types of people drive SUVs: people from California (C) and people with large families (F). After collecting some statistics, the student arrives at the following form for the BN:

(d) (5 pts.) Using the chain rule, compute the probability P(~g, a, s, c, ~f).

(e) (5 pts) Write, in summation form, the formula for computing P(a, ~f) using inference by enumeration.  (You do not need to actually compute the probability.)

(f) (2 pts) What is the conditional independence assumption for a node in a BN?

(g) (3 pts) When are two variables conditionally independent of each other in a BN?

(h) (5 pts) Using the rules for determining when two variables are (conditionally) independent of each other in a BN, answer the following (T/F) for the BN given above (d):

  1. I(C,G)
  2. I(F,A | S)
  3. I(C,F)
  4. I(A,G)
  5. I(C,F | A)

3. Decision tree learning (30 pts.)

The following table gives a data set for deciding whether to play or cancel a ball game, depending on the weather conditions.

Outlook Temp (F) Humidity (%) Windy? Class
sunny 75 70 true Play
sunny 80 90 true Don't Play
sunny 85 85 false Don't Play
sunny 72 95 false Don't Play
sunny 69 70 false Play
overcast 72 90 true Play
overcast 83 78 false Play
overcast 64 65 true Play
overcast 81 75 false Play
rain 71 80 true Don't Play
rain 65 70 true Don't Play
rain 75 80 false Play
rain 68 80 false Play
rain 70 96 false Play

(a) (10 pts.) At the root node for a decision tree in this domain, what are the information gains associated with the Outlook and Humidity attributes? (Use a threshold of 75 for humidity (i.e., assume a binary split: humidity <= 75 / humidity > 75). Be sure to show your computations.

(b) (10 pts.) Again at the root node, what are the gain ratios associated with the Outlook and Humidity attributes (using the same threshold as in (a))? Be sure to show your computations.

(c) (10 pts.) Suppose you build a decision tree that splits on the Outlook attribute at the root node. How many children nodes are there are at the first level of the decision tree? Which branches require a further split in order to create leaf nodes with instances belonging to a single class? For each of these branches, which attribute can you split on to complete the decision tree building process at the next level (i.e., so that at level 2, there are only leaf nodes)? Draw the resulting decision tree, showing the decisions (class predictions) at the leaves.