## Glossary

Select one of the keywords on the left…

# Bayesian Inference and Graphical ModelsBayesian networks

Consider the following probabilistic narrative about an individual's health outcome.

(i) A person becomes a smoker with probability 18%.
(ii) They exercise regularly with probability 40% if they are a non-smoker or with probability 25% if they are a smoker.
(iii) Independently of the above, with probability 15% they have a gene which predisposes them to lung cancer.
(iv) Their conditional probability of contracting lung cancer, given the indicator random variables , , and of the events described in (a), (b), and (c) respectively, is given by .

We can visualize this story with a diagram in which each event of the four indicator random variables is a node, and arrows are drawn to indicate dependencies as specified in the story.

Exercise
Is this the only such diagram consistent with the specified probability measure on the four random variables?

Solution. No, there's nothing about smoking and exercising that requires that we sample the smoking indicator and then the exercising indicator from its conditional distribution giving smoking. We could have done it the other way around.

The diagram tells us that having the gene is independent of smoking and exercising (since those nodes have no common ancestors in the diagram). If we included another descendant of the "smokes" node, like "develops premature wrinkles", then that would be communicating that premature wrinkles and lung cancel—while not independent—are conditionally independent given the smoking random variable.

## Gaussian mixture models

Consider a distribution on whose density function can be written as a linear combination of multivariate Gaussian densities:

using Plots, Distributions
f(x,y) = 0.55pdf(MvNormal([2.2, -0.4], [0.4 0.2; 0.2 0.4]), [x,y]) +
0.45pdf(MvNormal([0.1, -4.3], [1.5 -0.1; -0.1 0.5]), [x,y])
p1 = heatmap(-6:0.05:6, -6:0.05:6, f)
p2 = surface(-6:0.05:6, -6:0.05:6, f)
plot(p1, p2, size = (650, 300))

Such a distribution is called a Gaussian mixture model. We can sample from a GMM of the form by simulating a random variable which takes values in with probability for each element , and then drawing from a multivariate normal distribution with mean and covariance (where and are the mean and covariance of ).

Exercise
Explain how you might estimate the means, covariances, and values based on the observations shown. Feel free to use your own visual intuition as part of the algorithm. Solution. We identify the two clusters visually, and we associate each point with one of the clusters or the other. Then we estimate means and covariances of the sample means and covariances for the two clusters, and we estimate the 's as the proportions of points belonging to each cluster.

In the next section (on Expectation-Maximization), we'll talk about how to do this in a way that doesn't require a human to hand-pick the value for each point.

## Hidden Markov Models

The second example of a Bayesian network we'll look is the Hidden Markov Model (HMM). An HMM consists of a Markov chain together with a collection of random variables with the property that that the conditional distribution of given all of the other random variables depends only on . Represented as a Bayes net, the hidden Markov model looks like this:

Example
Simulate a hidden Markov model and plot the vector of 's and the vector of 's on the same graph.

Solution.

using Plots, OffsetArrays
P = OffsetArray([0.2 0.8
1/3 2/3], 0:1, 0:1)

n = 100

function markov_chain(P, n)
Z = [rand(0:1)]
for i in 1:n-1
current_state = Z[end]
push!(Z, rand() < P[current_state, 0] ? 0 : 1)
end
Z
end

Z = markov_chain(P, n)
X = Z + randn(n)

plot(Z, size = (500, 100), legend = false)
plot!(X)

The kinds of questions we'll want to answer for hidden Markov models include:

1. Given observations for the 's—but not the 's—which model parameters (including the transition probabilities for the Markov chain and any parameters for conditional distribution of given ) maximize the likelihood of the observed data?

2. Given values for the parameters of the model and given observations for the 's, what is the conditional distribution of the 's?

Exercise
Consider a hidden Markov model for which the transition matrix takes the form and for which the conditional distribution of given is a normal distribution with mean and variance .

Given the observed values shown, how many times would you guess the underlying Markov chain changed its state (from 0 to 1, or from 1 to 0)? Also, does it appear as though is large or small?

Solution. It looks like the sequence of 's was most likely this path (which switches 8 times):

Furthermore, it appears that is probably pretty small, since the differences between the 's and 's are small.

In the next section we'll talk about a more principled method for inferring model parameters and the conditional distribution of the 's given the observed 's.

We close this section with an example showing how to use Bayes nets to calculate likelihood values.

Example
Find the likelihood of the following data for the hidden Markov model described above, with , , and . Suppose is uniformly distributed on .

Solution. The probability of observing is . The probability of observing and is . The probability of observing all three of the given values is .

The conditional probability of seeing an value close to 0.2 given is proportional to value of the standard Gaussian density at , which is . Likewise, the likelihood gets a factor of for and a factor of for , given the values for and under consideration. All together, the likelihood is

More generally, we can compute the likelihood for any complete set of values in a Bayes net by traversing the diagram starting from a root node (a node with no incoming arrows) and including a factor for each conditional probability mass or density value encountered at each node.  Bruno