Exam2 Review

CS63 Artificial Intelligence

Exam 2 Review

Introduction

For each topic there is a list of terms. You should be able to define these terms, explain their relevance to AI, and provide a concrete example.

You do not need to memorize any formulas. However, given a formula, such as the perceptron learning rule or the update rule for Q-learning, you should understand how to apply it to a specific problem.

The exam will focus more on algorithms/methods that you explored in labs. For each ML approach, what are its strengths and weaknesses? For what types of problems would it be most appropriate?

Machine Learning

Terminology

Machine learning
- Supervised learning
- Reinforcement learning
- Unsupervised learning
Training set
Testing set
Over training
Generalization
Learning rate
Error

Questions

Why would we want to give an AI system the opportunity to learn?
Why not just program in the ability we want?

Sequential Decision Problems

Terminology

Markov Decision Process (MDP)
Transition model
Value
Policy
Reward
Discount
Value Iteration

Bellman Equation

The value of a state s is the immediate reward for that state plus the discounted value of the next state s' assuming the agent chooses the optimal action.

    V(s) = reward + discount * max[ SUM( P(s'|s,a) * V(s') ) ]
                                a    s'

Reinforcement Learning

Terminology

Exploration vs Exploitation
Temporal difference learning
Q-learning, Q-table, Q-value
Approximate Q-learning

Q-learning update rule

  
     Q(s,a) += learningRate * (reward + discount * max Q(s', a') - Q(s,a))

Questions

How are MDPs and RL related?
Explain in your own words how the Q-learning update rule works to modify the Q-values based on the current action taken and the reward received.
How did we apply Q-learning to Pac-Man? What were the states and actions? How effective was standard Q-learning at solving Pac-Man?
Why did we use Approximate Q-learning? How did we implement it for Pac-Man? How effective was it?
Is RL guaranteed to find the optimal policy?

Consider the following grid-based environment, where the rewards of being in each location are shown.

  -------------
2 | 0 | 0 | +1|
  -------------
1 | 0 | 0 | -1|
  -------------
0 | 0 | 0 | 0 |
  -------------
    0   1   2

We will represent the state in column,row format. Suppose that the actions the agent can take are to go north, east, south, or west. If it tries to go a direction that leads it off the boundary of the grid then it remains in its current state, and receives the reward for that state on that action. After 500 steps of training suppose that the Q-table contains the following values.

       actions
state  n     e     s     w
  0,0  0.73  0.69  0.65  0.65 
  0,1  0.76  0.81  0.65  0.72 
  0,2  0.00  0.90  0.17  0.00 
  1,0  0.81  0.00  0.35  0.48 
  1,1  0.90  -0.97 0.62  0.67 
  1,2  0.82  1.00  0.79  0.68 
  2,0  0.00  0.00  0.00  0.21 
  2,1  0.00  0.00  0.00  0.00 
  2,2  0.00  0.00  0.00  0.00

Using the grid below, draw an arrow in each location to show the agent's current policy based on the Q-table.

  -------------
2 |   |   |   |
  -------------
1 |   |   |   |
  -------------
0 |   |   |   |
  -------------
    0   1   2

Is this policy optimal?

Artificial Neural Networks

Terminology

Unit
Weight
Layer
Feed-forward
Activation function
- Step
- Sigmoid
- ReLU
Net input
Activation value
Bias
Linearly separable
Hidden layer representations
Back-propagation learning rule: Derived by doing gradient descent on error

Perceptron learning rule:

  w += learningRate * (target - output) * input

Questions

Consider a two-layer neural network with 3 inputs and 1 output that uses the step activation function (returns 1 when the netInput is greater than 0 and otherwise returns 0). Such a model can only solve problems from the class of linearly separable functions.
For the following problems explain whether the function is linearly separable. You may want to use 3D pictures of cubes to visualize whether the functions are linearly separable. If a function is separable, determine a set of weights that solve the problem (you can do this by hand, you don't need to use the perceptron learning rule).
- The output turns on whenever more than one of the inputs is on.
- The output turns on whenever both inputs two and three are on.
- The output tursn on when exactly one input is on.

In what ways are artificial neural networks similar to and different from biological neural networks?

Is backprop guaranteed to converge on a solution?

Deep Learning

Terminology

Vanishing gradient problem
Loss
Optimizer
Convolution
- Feature/Kernel
- Shared weights
- Stride
Keras layer types: Convolution, Pool, Flatten, Dense, Dropout
MNIST data set
Deep Q-learning applied to Atari Games
- Q network
- Experience replay
- Iterative update

Questions

Explain all of the following issues with deep learning systems and give one example of each problem:
- Lack of interpretability
- Bias
- Spurious statistical correlations
- Vulnerable to adversarial attacks
What is self-supervised learning? What issues does it address with supervised learning?
What is embodiment? How does it relate to issues with convolution networks?