Updating Model On Pymc3 With New Observed Data

May 30, 2024 Post a Comment

I have measured the diameter of 80 fruits last year, and after checking what is the best distribution of the values, I've created a PyMC3 model with Model() as diam_model: mu =

Solution 1:

Kernel Density Estimated Updated Priors

Using the other answer suggested as a duplicate, one could extract approximate versions of the priors using the code from this Jupyter notebook.

First round

I'll assume we have data from the first round of sampling, which we can impose the mean 57.0 and standard deviation of 5.42.

import numpy as np
import pymc3 as pm
from sklearn.preprocessing import scale
from scipy import stats

# generate data forced to match distribution indicated
Y0 = 57.0 + scale(np.random.normal(size=80))*5.42with pm.Model() as m0:
    # let's place an informed, but broad prior on the mean
    mu = pm.Normal('mu', mu=50, sd=10)
    sigma = pm.Uniform('sigma', 0, 10)
    
    y = pm.Normal('y', mu=mu, sd=sigma, observed=Y0)
    
    trace0 = pm.sample(5000, tune=5000)

Extracting new priors from posterior

We can then use the results of this model to extract KDE posteriors on the parameters with the following code from the referenced notebook:

deffrom_posterior(param, samples, k=100):
    smin, smax = np.min(samples), np.max(samples)
    width = smax - smin
    x = np.linspace(smin, smax, k)
    y = stats.gaussian_kde(samples)(x)
    
    # what was never sampled should have a small probability but not 0,# so we'll extend the domain and use linear approximation of density on it
    x = np.concatenate([[x[0] - 3 * width], x, [x[-1] + 3 * width]])
    y = np.concatenate([[0], y, [0]])
    return pm.Interpolated(param, x, y)

Second round

Now, if we have more data we can run a new model using the KDE updated priors:

Y1 = np.random.normal(loc=57, scale=5.42, size=100)

with pm.Model() as m1:
    mu = from_posterior('mu', trace0['mu'])
    sigma = from_posterior('sigma', trace0['sigma'])
    
    y = pm.Normal('y', mu=mu, sd=sigma, observed=Y1)
    
    trace1 = pm.sample(5000, tune=5000)

And similarly, one could use this trace to extract updated posterior estimates for future rounds of incoming data.

Caveat Emptor

The above methodology yields approximations to true updated priors and would be most useful in cases where conjugate priors are not possible. It should also be noted that I'm not sure the extent to which such KDE-base approximations introduce errors and how they propagate in a model when used repeatedly. It's a neat trick but one should be cautious about putting this into production without further validation of its robustness.

In particular, I would be very concerned about situations where the posterior distributions have strong correlation structures. The code provided here generates a "prior" distribution using only the marginals of each latent variable. This seems fine for simple models like this, and admittedly the initial priors also lack correlations, so it may not be a huge issue here. Generally, however, summarizing to marginals involves discarding information about how variables are related, and in other contexts this could be rather significant. For example, the default parameterization of a Beta distribution always leads to correlated parameters in the posterior and thus the above technique would inappropriate. Instead, one would need to infer a multi-dimensional KDE that incorporates all the latent variables.

Conjugate Model

However, in your particular case the expected distribution is Gaussian and those distributions have established closed-form conjugate models, i.e., principled solutions rather than approximations. I strongly recommend working through Kevin Murphy's Conjugate Bayesian analysis of the Gaussian distribution.

Normal-Inverse Gamma Model

The Normal-Inverse Gamma model estimates both the mean and the variance of an observed normal random variable. The mean is modeled with a normal prior; the variance with an inverse gamma. This model uses four parameters for the prior:

mu_0  = prior mean
nu    = number of observations used to estimate the mean
alpha = half the number of obs used to estimate variance
beta  = half the sum of squared deviations

Given your initial model, we could use the values

mu_0  = 57.0nu    = 80alpha = 40beta  = alpha*5.42**2

You can then plot the log-likelihood of the prior as follows:

# points to compute likelihood at
mu_grid, sd_grid = np.meshgrid(np.linspace(47, 67, 101), 
                               np.linspace(4, 8, 101))

# normal ~ N(X | mu_0, sigma/sqrt(nu))
logN = stats.norm.logpdf(x=mu_grid, loc=mu_0, scale=sd_grid/np.sqrt(nu))

# inv-gamma ~ IG(sigma^2 | alpha, beta)
logIG = stats.invgamma.logpdf(x=sd_grid**2, a=alpha, scale=beta)

# full log-likelihood
logNIG = logN + logIG

# actually, we'll plot the -log(-log(likelihood)) to get nicer contour
plt.figure(figsize=(8,8))
plt.contourf(mu_grid, sd_grid, -np.log(-logNIG))
plt.xlabel("$\mu$")
plt.ylabel("$\sigma$")
plt.show()

enter image description here

Updating parameters

Given new data, Y1, one updates the parameters as follows:

# precompute some helpful valuesn = Y1.shape[0]
mu_y = Y1.mean()

# updated NIG parametersmu_n = (nu*mu_0 + n*mu_y)/(nu + n)
nu_n = nu + n
alpha_n = alpha + n/2beta_n = beta + 0.5*np.square(Y1 - mu_y).sum() + 0.5*(n*nu/nu_n)*(mu_y - mu_0)**2

For the sake of illustrating change in the model, let's generate some data from a slightly different distribution and then plot the resulting posterior log-likelihood:

np.random.seed(53211277)
Y1 = np.random.normal(loc=62, scale=7.0, size=20)

which yields