My data are incomplete and noisy: Information-reduction statistical methods for knowledge extraction can save your day: tools and opportunities for modelling
We review parameter inference for stochastic modelling in complex scenario, such as bad parameters initialization and near-chaotic dynamics. We show how state-of-art methods for state-space models can fail while, in some situations, reducing data to summary statistics (information reduction) enables robust estimation. Wood's synthetic likelihoods method is reviewed and the lecture closes with an example of approximate Bayesian computation methodology.
Accompanying code is available at https://github.com/umbertopicchini/pomp-ricker and https://github.com/umbertopicchini/abc_g-and-k
Readership lecture given at Lund University on 7 June 2016. The lecture is of popular science nature hence mathematical detail is kept to a minimum. However numerous links and references are offered for further reading.
Similar to My data are incomplete and noisy: Information-reduction statistical methods for knowledge extraction can save your day: tools and opportunities for modelling
Stratified Monte Carlo and bootstrapping for approximate Bayesian computationUmberto Picchini
Similar to My data are incomplete and noisy: Information-reduction statistical methods for knowledge extraction can save your day: tools and opportunities for modelling (20)
Servosystem Theory / Cybernetic Theory by Petrovic
My data are incomplete and noisy: Information-reduction statistical methods for knowledge extraction can save your day: tools and opportunities for modelling
1. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
My data are incomplete and noisy!
Information-reduction statistical methods for knowledge
extraction can save your day: tools and opportunities for
modelling
Docent/Readership lecture
Lund, 7 June 2016
Umberto Picchini
Centre for Mathematical Sciences
Lund University
www.maths.lth.se/matstat/staff/umberto/
1 / 54
2. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
PREAMBLE: WHAT IS A DOCENT/READERSHIP
LECTURE?
It’s a lecture in a popular science context.
Target are research students within the entire Faculty of Science
at Lund University.
It should cover my own subject area, but not my own research
output.
My notation and level of exposition will be subject to the above.
2 / 54
3. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
I will discuss a few methods for parameter inference (aka
calibration, inverse problem) in presence of incomplete and
noisy data.
However I will start mentioning:
what I mean with incomplete and noisy observations;
why use stochastic modelling;
issues with state-of-art inference methods for dynamic
models.
I will focus on two methods based on information reduction
that use summary statistics of data: synthetic likelihoods and
approximate Bayesian computation.
These are powerful, general and flexible methods but also very
easy to introduce to a general audience of researchers.
3 / 54
4. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
Goal is to show how state-of-art exact methods for parameter
estimation can spectacularly fail in some scenarios, including:
near chaotic systems;
noisy (stochastic) dynamical systems;
badly initialized optimization or Bayesian MCMC
algorithms.
But first, what do I mean with “incomplete data”?
4 / 54
5. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
INCOMPLETE OBSERVATIONS
There can be many situations where we only have partial
information of the system of scenario of interest.
We oversimplify and call “incomplete” or “partially observed”
those experiments where not all the variables of interest are
observable.
This could mean that at least one of the following holds:
1. some variables are completely unobserved (measurements
for those variables are unavailable)
2. we have a system evolving continuously in time but we
only observe it a discrete time points;
3. the variables we observe are perturbed measurements of
the actual variables of interest.
5 / 54
6. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
Consider a noise perturbed “signal”
Typically values of our measurements do not exclusively
represent what we are really interested in measuring.
The unperturbed signal is unavailable because measurements
are corrupted with noise.
6 / 54
7. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
Therefore we have a system (think about a physical system)
having
an observable component Y
an unobservable/latent signal X
noise ε
And of course if we model time dynamics we could think of n
discrete-time measurements
Yti
= Xti
+ εti
, i = 1, ..., n
Or more in general
Yti
= f(Xti
, εti
)
for some arbitrary yet known function f(·).
7 / 54
8. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
AN EXAMPLE: CONCENTRATION OF A DRUG
Time course of a certain drug concentration in blood. This is
measured at discrete times.
*
*
*
*
* *
*
* * *
0 20 40 60 80 100 120
020406080100
time in minutes
C12concentration
8 / 54
9. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
We may postulate a deterministic model:
dCt
dt
= −µCt
Ct = C0e−µt
, µ > 0
With measurements Yti
= Cti
+ εti
, with εti
∼ N(0, σ2
ε)
*
*
*
*
* *
*
* * *
0 20 40 60 80 100 120
020406080100
time in minutes
C12concentration
9 / 54
10. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
We may postulate a deterministic model:
dCt
dt
= −µCt
Ct = C0e−µt
, µ > 0
With measurements Yti
= Cti
+ εti
, with εti
∼ N(0, σ2
ε)
*
*
*
*
* *
*
* * *
0 20 40 60 80 100 120
020406080100
time in minutes
C12concentration
There’s some discrepancy with the
fit (residual error).
9 / 54
11. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
In previous slides you might have noticed the introduction of a
parameter µ > 0.
Even though an expert researcher might have a clue about
some reasonable value for µ, in reality this is at best nothing
more than a guess.
In fact, when we previously wrote that we can assume
Yti
= f(Xti
, εti
) = f(Cti
, εti
)
most often what we have is
Yti
= f(Xti
, θ, εti
) = f(Cti
, µ, εti
)
that is a dependence from an unknown (vector) parameter θ.
In our example θ ≡ µ.
10 / 54
12. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
My main interest:
to study and develop principled methods that return an
estimate of θ and its uncertainty analysis.
In particular, I am interested in modelling stochastic dynamics.
In the previous example Ct was evolving in a deterministic
fashion, given a fixed starting value C0 and a fixed value for µ.
In next slide I show an alternative, stochastic approach.
11 / 54
14. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
Add systemic (white) noise:
dC(t)
dt
= −µC(t) + “white noise”
And again Yti
= Cti
+ εti
, with εti
∼ N(0, σ2
ε)
*
*
*
*
* *
*
* * *
0 20 40 60 80 100 120
020406080100
time in minutes
C12concentration
Dynamics are stochastic.
Residual error can’t be eliminated.
It’s always “there”.
12 / 54
15. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
The problem with stochastic models is that:
1. realizations from the model might not really resemble the
observed data...
2. this can happen even when simulate using a θ which is
close to its true value...
3. ...and even when we are simulating from the “true model”
(simulation studies).
Although this is understandable, it is also upsetting when our
methods are based on producing realizations meant to get close
to observations.
13 / 54
16. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
A NEARLY CHAOTIC MODEL
Two realizations from the Ricker model (discussed later), without
measurement noise.
Nt = r · Nt−1 · e−Nt−1+et
, et ∼ N(0, σ2
)
Small changes in a parameter cause major departures from data.
051015
nt
Time
5 10 15 20 25
−260−220−180−140
Log−likelihood
log(r
2.5 3.0 3.5
Figure: One path generated with log r = 3.8 (black) and one generated with
log r = 3.799 (red) when σ = 0.
14 / 54
17. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
Now, a modeller (say a statistician) does not necessarily
have the expertise nor the time to study the qualitative
behaviour of solutions of a series of candidate models.
most often the modeller wants to test a range of possible
models against available data.
Simulations from postulated models should produce
trajectories approximately resembling data.
Parameter estimation should be performed to identify
“best fitting” parameters.
15 / 54
18. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
Notice for many years even simple stochastic models have been
terribly difficult to calibrate against data.
It is usually impossible to write the likelihood function p(y1:T|θ)
in closed form.
It is also impossible to write filtering densities for the states
p(x1:T|y1:T; θ).
Notation:
I am going to use lower case letters both for random
variables and for their realized values.
I use a1:T to denote a sequence (a1, a2, ..., aT) where T is the
time horizon.
I assume integer time indeces to ease notation.
16 / 54
19. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
SSM: STATE–SPACE MODEL
Graphically:
"!
#
"!
#
"!
#
"!
#
"!
#
"!
#
Yt−1 Yt Yt+1
Xt−1 Xt Xt+1
- - - -
6 6 6
... ... (Markov chain)
(Observations)
(Yt|Xt = xt) ∼ p(yt|xt) (Observation density)
(Xt+1|Xt = xt) ∼ p(xt+1|xt) (Transition density)
X0 ∼ π(x0) (Initial distribution)
Example: Yt a subject’s measured glycemia at time t. Xt the
actual glycemia at t.
17 / 54
20. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
THE LIKELIHOOD FUNCTION 1/2
It turns out that, even for such a simple construct, it is difficult
to write the likelihood function.
In a SSM data are not independent, they are only conditionally
independent→ complication!:
p(y1:T|θ) = p(y1|θ)
T
t=2
p(yt|y1:t−1, θ) =?
We don’t have a closed for expression for the product above
because we do not know how to calculate p(yt|y1:t−1, θ).
Let’s see why.
18 / 54
21. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
THE LIKELIHOOD FUNCTION 2/2
In a SSM the observed process is assumed to depend on the latent
Markov process {Xt}: we can write
p(y1:T|θ) = p(y1:T, x0:T|θ)dx0:T = p(y1:T|x0:T, θ)
use cond. indep.
× p(x0:T|θ)
use Markovianity
dx0:T
=
T
t=1
p(yt|xt, θ) × p(x0|θ)
T
t=1
p(xt|xt−1, θ) dx0:T
Problems
The expression above is a (T + 1)-dimensional integral
For most (nontrivial) models, transition densities p(xt|xt−1) are
unknown
19 / 54
22. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
However today we have quite a number of reliable Monte Carlo
solutions to the integration problem.
I am not going to introduce state-of-art methods for SSM but these
are essentially based on particle filters (or sequential Monte Carlo)
methods.
particle marginal methods and particle MCMC (Andrieu and
Roberts 2009; Andrieu et al. 2010) for Bayesian inference.
iterated filtering (Ionides et al. 2011, 2015) for maximum
likelihood inference.
As shown in Andrieu and Roberts 2009:
1. obtain an approximation of the likelihood ˆp(y1:T|θ) using particle
filters;
2. plug ˆp(y1:T|θ) into a MCMC algorithm for Bayesian inference on
θ.
3. then the MCMC returns samples exactly from π(θ|y1:T).
20 / 54
23. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
If you are interested in a quick review of particle methods for
parameters inference (not state inference) check my slides on
SlideShare http://goo.gl/4aZxL1
As I said in this presentation I focus on synthetic likelihoods
and approximate Bayesian computation.
I now consider a simple example where the celebrated particle
marginal methodology of Andrieu and Roberts1, which is
supposed to return exact Bayesian inference, does not work.
1
C. Andrieu and G. Roberts 2009, Annals of Statistics 37(2).
21 / 54
24. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
THE STOCHASTIC RICKER MODEL
yt ∼ Pois(φNt)
Nt = r · Nt−1 · e−Nt−1+et , et ∼ N(0, σ2)
It can be used to describe the evolution in time of a population
of size Nt.
r is the intrinsic growth rate of the population;
φ is a scale parameter
The process noise et can be interpreted as environmental
noise. Assume the et as iid.
This is a hidden Markov model, as the dynamics of {Nt} are
Markovian and we assume measurements y1:T to be
conditionally independent given {Nt}.
22 / 54
25. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
Here is a realization of length T = 50 from the Ricker model.
050150250
y
0510152025
N
−0.40.00.20.4
0 10 20 30 40 50
e
time
myobservedricker
Ground-truth parameters are log r = 3.8, φ = 10 and σ = 0.3 (and
imposing N0 = 7 and e0 = 0). Same settings as in S. Wood 20102
2
S. Wood 2010, Nature 466, pp. 1102–1104.
23 / 54
26. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
SOFTWARE CODE
The simulation results in the next three slides can be
reproduced using the code at
https://github.com/umbertopicchini/pomp-ricker
24 / 54
27. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
RUNNING pomp ricker − pmcmc.R FOR EXACT BAYESIAN INFERENCE
Suppose we are only interested in estimating the parameter r from
data, while remaining quantities are fixed to their ground-truth
values.
Particle MCMC works well here. So exact inference is possible.
Here is an MCMC chain of length 2000 with likelihood estimated via
bootstrap filter (1000 particles used). We let r start at r = 12.2.
True value is r = 44.7, estimated posterior mean is 44.6 [38.5,51.7]
0 500 1000 1500 2000
20304050
Iterations
25 / 54
28. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
So all good in the considered example.
We might reasonably imagine that if our model is “less
stochastic” (a smaller σ) it should be even easier to conduct
inference.
Recall that
Nt = r · Nt−1 · e−Nt−1+et
, et ∼ N(0, σ2
)
Modellers don’t know a-priori the values of underlying
parameters.
Suppose we now use data generated with σ = 0.01 (instead of
σ = 0.3)
26 / 54
29. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
Here is what happens with the same conditions as before,
except for using data generated with σ = 0.01.
−1200−1050−900
loglik
−7−6−5−4−3
log.prior
12141618
nfail
131517
r
0 500 1000 1500 2000
PMCMC iteration
PMCMC convergence diagnostics
The chain (lower panel) is stuck at the wrong mode! r ≈ 18
hence log r = 2.9.
It gets stuck even if we use more particles. 27 / 54
30. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
As beautifully illustrated in Fasiolo et al.3
, the very interesting reason
why the estimation fails for nearly deterministic dynamics is the
following:
2.0 2.5 3.0 3.5 4.0 4.5
−120−110−100−90−80
σ = 0.3
2.0 2.5 3.0 3.5 4.0 4.5
−140−120−100−80
σ = 0.1
2.0 2.5 3.0 3.5 4.0 4.5
−140−120−100−80
σ = 0.05
2.0 2.5 3.0 3.5 4.0 4.5
−180−140−100
σ = 0.01
Log−likelihood
log(r)
Figure: Black: the true likelihood for log r. Red: the particle filter
approximation. Figure from Fasiolo et al. 2016.
3
M. Fasiolo, N. Pya and S. Wood 2016. Statistical Science 31(1),pp. 96-118
28 / 54
31. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
However the filter being unable to approximate the likelihood is only
consequence of something more subtle: look at the exact loglikelihood
for non-stochastic dynamics, i.e. here σ = 0.
2.5 3.0 3.5 4.0
−15−10−5
Ricker
log(r)
271217
nt
−5 −4
−35−25−15−50
Pen
lo
−50
Varley
0−5
Mayna
Log−likelihood(103
)
Figure: Black: the true loglikelihood when σ = 0. Grey: bifurcation diagram.
Figure from Fasiolo et al. 2016.
Go here if you are unfamiliar with bifurcations.
29 / 54
32. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
The instability of some models for a small quantity of noise (σ)
produces major differences in simulated trajectories for small
perturbations in the parameters.
051015
nt
Time
5 10 15 20 25
−260−220−180−140
Log−likelihood
2.5 3.0
Figure: One path generated with log r = 3.8 (black) and one generated with
log r = 3.799 (red) when σ = 0.
30 / 54
33. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
A CHANGE OF PARADIGM
from S. Wood, Nature 2010:
“Naive methods of statistical inference try to make the model
reproduce the exact course of the observed data in a way that the real
system itself would not do if repeated.”
“What is important is to identify a set of statistics that is sensitive to
the scientifically important and repeatable features of the data, but
insensitive to replicate-specific details of phase.”
In other words, with complex, stochastic and/or chaotic model
we should try to match features of the data, not the path of the
data themselves.
31 / 54
34. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
SYNTHETIC LIKELIHOODS
y: observed data, from static or dynamic models
s(y): (vector of) summary statistics of data, e.g. mean,
autocorrelations, marginal quantiles etc.
assume
s(y) ∼ N(µθ, Σθ)
an assumption justifiable via second order Taylor
expansion (same as in Laplace approximations).
µθ and Σθ unknown: estimate them via simulations.
32 / 54
36. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
For fixed θ we simulate Nr artificial datasets y∗
1 , ..., y∗
Nr
and
compute corresponding (possibly vector valued) summaries
s∗
1 , ..., s∗
Nr
.
compute
ˆµθ =
1
Nr
Nr
i=1
s∗
i , ˆΣθ =
1
Nr − 1
Nr
i=1
(s∗
i − ˆµθ)(s∗
i − ˆµθ)
compute the statistics sobs for the observed data y.
evaluate a multivariate Gaussian likelihood at sobs
liks(θ) := exp(ls(θ)) = N(sobs; ˆµθ, ˆΣθ) ∝
1
|ˆΣθ|
e−(sobs−ˆµθ)ˆΣ−1
θ (sobs−ˆµθ)/2
This likelihood can be maximized for a varying θ or be plugged
within an MCMC algorithm for Bayesian inference
ˆπ(θ|sobs) ∝ liks(θ)π(θ).
34 / 54
37. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
For the Ricker model Wood (2010) uses 13 summaries,
including:
the sample mean of observations ¯y;
number of zeros in the dataset;
autocovariances up to lag 5;
estimates for β0 and β1 from fitting
E(y0.3
t ) = β0y0.3
t−1 + β1y0.6
t−1
and a few more summaries...(not important to be
mentioned here, but you got the idea).
So we have sobs = (¯y, #zeros, autocovars, β0, β1, ...).
35 / 54
38. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
RUNNING pomp ricker − synlik.R
We consider the dataset generated with the settings where
particle methods have failed (same starting values etc.), i.e.
σ = 0.01.
Here follow synthetic likelihood estimates:
r σ φ
starting value 12.18 1 20
true value 44.7 0.01 10
SL estimate 45.25 0.23 10.16
We maximised liks(θ) using Nelder–Mead with Nr = 5000
simulations.
Standard errors could be found via profile likelihood (not
reported).
36 / 54
39. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
BAYESIAN SYNTHETIC LIKELIHOODS
Just plug liks(θ) in a Metropolis-Hastings MCMC procedure to
sample from ˆπ(θ|sobs) ∝ liks(θ)π(θ).
Figure is from Price et al. (2016). They consider the “nicely
behaved” data obtained with σ = 0.3.
mated log SL at θ = (3.8, 10, 0.3) . A ‘-’ indicates that a result is not available for uBSL
as the value of n is too small.
n acc. rate (%) ESS log r ESS σe ESS φ sd log pA,n(y|θ = (3.8, 10, 0.3) )
20 2.4/1.8 11/8 13/10 18/13 large
30 8.2/7.7 23/22 27/25 38/31 4.1
40 12.5/12.6 29/28 34/32 44/42 2.5
50 15.9/15.7 30/31 35/35 45/45 1.9
80 21.8/21.4 29/29 34/32 47/47 1.2
100 24.1/23.7 25/27 30/30 41/39 1.0
250 29.6/29.5 13/14 15/17 20/23 0.6
3 3.5 4 4.5
0
1
2
n=30
n=100
n=250
(a) log r
8 10 12
0
0.2
0.4
0.6
0.8
n=30
n=100
n=250
(b) φ
-0.2 0 0.2 0.4 0.6
0
1
2
3
4
n=30
n=100
n=250
(c) σe
Figure 6: Posterior estimates for log r, σe and φ of the Ricker model when using BSL with
various values of n.Figure: Posteriors for several values of Nr. True parameter values are:
log r = 3.8, φ = 10, σ = 0.3 37 / 54
40. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
APPROXIMATE BAYESIAN COMPUTATION (ABC)
Another methodology, also using information-reduction via
summaries, is Approximate Bayesian Computation (ABC).
ABC has received enormous interest. A list of review papers is
provided in the references section.
You can also check my own intro to ABC.
38 / 54
41. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
LIKELIHOOD FREE REJECTION SAMPLING
Recall y is data.
We wish to sample from π(θ|y) ∝ p(y|θ)π(θ).
1. simulate from the prior θ∗ ∼ π(θ)
2. plug θ∗ in your model and simulate artificial data y∗ [this
is the same as writing y∗ ∼ p(y|θ∗)]
3. if y∗ = y store θ∗. Go to step 1 and repeat.
The above is a likelihood free algorithm: it does not require
knowledge of the expression of the likelihood p(y|θ).
Each accepted θ∗ is such that θ∗ ∼ π(θ|y) exactly.
39 / 54
42. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
ABC REJECTION (PRITCHARD ET AL.4
)
Same as before, but comparing s(y) with s(y∗) for “appropriate”
summaries s(·) and a small tolerance > 0.
1. simulate from the prior θ∗ ∼ π(θ)
2. simulate a y∗ ∼ p(y|θ∗), compute s(y∗)
3. if s(y∗) − s(y) < store θ∗. Go to 1 and repeat.
Samples are from π (θ|s(y)) with
π (θ|s(y)) ∝
Y
p(y∗
|θ∗
)π(θ∗
)IA ,y (y∗
)dy∗
A ,y(y∗) = {y∗ ∈ Y; s(y∗) − s(y) < }.
4
Pritchard et al. 1999, Molecular Biology and Evolution, 16:1791-1798.
40 / 54
43. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
EXAMPLE: g-AND-k DISTRIBUTIONS
Mixtures of Gaussians are often used to describe complex,
nonstandard distributions.
a mixture of two-Gaussians requires specifying 5
parameters.
apparently, it is sometimes challenging to estimate such
parameters due to unidentifiability (Marin and Robert5).
simulating (sampling) from mixtures is an optimization
problem (can be computer intensive).
5
Marin and Robert, Bayesian Core, Springer 2007.
41 / 54
44. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
g-and-k distributions only require 4 parameters and it’s fast to
simulate from them.
g-and-k distributions have no closed-form likelihood, but we can
simulate their quantiles.
F−1
(x; A, B, g, k) = A + B 1 + 0.8 ·
1 − exp(−g · r(x))
1 + exp(−g · r(x))
(1 + r2
(x))k
r(x)
with r(x) ∼ N(0, 1) the xth quantile from a standard Gaussian.
data
0 5 10 15 20 25 30
0
500
1000
1500
2000
2500
3000
3500
4000
4500
n = 10, 000 samples generated with A = 3, B = 1, g = 2, k = 0.5.
42 / 54
45. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
I wrote an MCMC algorithm using ABC with summaries the
20th, 40th, 60th, 80th empirical quantiles and the skewness
value
s(y) = (q20, q40, q60, q80, skew(y)).
Again, we simulate datasets y∗, compute s(y∗) and compare
them with s(y) at each MCMC iteration.
I let the threshold decrease progressively during the
algorithm.
43 / 54
46. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
Red lines are true parameter values.
A ×104
0 1 2 3 4 5 6
2.5
3
3.5
4
4.5
5
B ×104
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
0
1
2
3
4
5
6
g ×104
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
0
2
4
6
8
10
k ×104
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
0
0.5
1
1.5
2
It is evident that I reduced at iteration 10,000 and then at
iteration 20,000.
44 / 54
47. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
SUMMARY
We have discussed plug-and-play methods that only require the
ability to simulate from the model.
How to choose summaries s(·) is a delicate and open problem.
However there exist much literature available.
Information-reduction methods are very general and in many
cases are the only possible option for inference.
ABC and SL can be used for a short pilot run, to obtain a better
starting value for θ to be used in more accurate methods.
For example Owen et al.6
use ABC to obtain a starting value for
particle MCMC.
6
Owen et al. 2015. Likelihood free inference for Markov processes: a
comparison. Statistical applications in genetics and molecular biology, 14(2),
pp.189-209.
45 / 54
48. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
Our suggestion, when dealing with a new model or a new
dataset, is not to settle on a single methodology, but instead
try to perform a battery of algorithms.
This require method developers to share their software codes to
ease methods reproducibility.
This is still not common practice.
46 / 54
49. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
If you wish to review some of the presented concepts, slides for this
presentation are available on SlideShare at
http://goo.gl/gg0Cqq.
R code to reproduce the Ricker model results is at
https://github.com/umbertopicchini/pomp-ricker/
MATLAB code for ABC and g-and-k distributions is at
https://github.com/umbertopicchini/abc_g-and-k/
Feel free to write me or drop by7
for a chat.
7
umberto@maths.lth.se, office MH:321, Matematikcentrum
47 / 54
50. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
If you wish to review some of the presented concepts, slides for this
presentation are available on SlideShare at
http://goo.gl/gg0Cqq.
R code to reproduce the Ricker model results is at
https://github.com/umbertopicchini/pomp-ricker/
MATLAB code for ABC and g-and-k distributions is at
https://github.com/umbertopicchini/abc_g-and-k/
Feel free to write me or drop by7
for a chat.
Thank You
7
umberto@maths.lth.se, office MH:321, Matematikcentrum
47 / 54
51. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
REFERENCES ON SYNTHETIC LIKELIHOODS
1. Fasiolo, M., Pya, N. and Wood, S.N., 2016. A Comparison of
Inferential Methods for Highly Nonlinear State Space Models in
Ecology and Epidemiology. Statistical Science, 31(1), pp.96-118.
2. Meeds, E. and Welling, M., 2014. GPS-ABC: Gaussian process
surrogate approximate Bayesian computation. arXiv:1401.2838.
3. Price, L.F., Drovandi, C.C., Lee, A. and Nott, D.J., 2016. Bayesian
synthetic likelihood. http://eprints.qut.edu.au/92795/
4. Wood, S.N., 2010. Statistical inference for noisy nonlinear
ecological dynamic systems. Nature, 466(7310), pp.1102-1104.
(4) is where SL got first introduced. Our talk is largely based on (1).
(2) merges SL with ABC. (3) studies the performance of SL in a
(pseudo-marginal) Bayesian setting.
48 / 54
52. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
SOFTWARE FOR SYNTHETIC LIKELIHOODS
R package pomp: http://kingaa.github.io/pomp
R package synlik:
https://mfasiolo.github.io/synlik_release/
49 / 54
53. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
ABC SOFTWARE
EasyABC, R package. Research article.
abc, R package. Research article
abctools, R package. Research article. Focusses on tuning.
Lists with more options here and here .
examples with implemented model simulators (useful to
incorporate in your programs).
50 / 54
54. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
REVIEWS ON ABC
Fairly extensive but accessible reviews:
1. Sisson and Fan 2010
2. (with applications in ecology) Beaumont 2010
3. Marin et al. 2010
Simpler introductions:
1. Sunn˚aker et al. 2013
2. (with applications in ecology) Hartig et al. 2013
Review specific for dynamical models:
1. Jasra 2015
51 / 54
55. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
NON-REVIEWS ON ABC, SPECIFIC FOR DYNAMICAL
MODELS
1. SMC for Parameter estimation and model comparison:
Toni et al. 2009
2. Markov models: White et al. 2015
3. SMC: Sisson et al. 2007
4. SMC: Dean et al. 2014
5. SMC: Jasra et al. 2010
6. MCMC: Picchini 2013
52 / 54
56. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
MORE SPECIALISTIC ABC RESOURCES
selection of summary statistics: Fearnhead and Prangle
2012.
review on summary statistics selection: Blum et al. 2013
expectation-propagation ABC: Barthelme and Chopin 2012
Gaussian Processes ABC: Meeds and Welling 2014
ABC model choice: Pudlo et al 2015
53 / 54
57. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
BLOG POSTS AND SLIDES ON ABC
1. Christian P. Robert often blogs about ABC (and beyond:
it’s a fantastic blog!)
2. an intro to ABC by Darren J. Wilkinson
3. Two posts by Rasmus B˚a˚ath here and here
4. my own intro to ABC
5. Tons of slides at SlideShare.
54 / 54
59. VERIFY SUMMARIES DISTRIBUTION
The crucial assumption used in synthetic likelihoods is that
s ∼ N(µθ, Σθ).
We could check that (appendix in Wood 2010 for details):
(s∗ − ˆµθ) ˆΣ−1
θ (s∗ − ˆµθ) ∼ χ2
d with d = dim(s).
That is for Nr simulated summaries, plot the ordered
values of (s∗
j − ˆµθ) ˆΣ−1
θ (s∗
j − ˆµθ) versus the quantiles of
χ2
d, j = 1, ..., Nr.
For graphical purposes this can be done on log scale.
56 / 54
60. we could separately check that each dimension of the
simulated summaries is approximately normal (basically
produce d separate qqplots).
same as the above but for the observed summaries.
Next slide produces the three types of quantile plots for the
Ricker model when log(r) = 3.8, φ = 10, σ = 0.3.
Recall for the Ricker model we have d = 13 summaries.
57 / 54
61. 1.0 1.5 2.0 2.5 3.0 3.5
123456
log theoretical quantiles
logobservedquantiles
−3 −2 −1 0 1 2 3
−4024
Marginal Q−Q Plot
N(0,1) quantiles
marginalquantiles
−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5
−1.50.01.0
Normal Q−Q Plot
Theoretical Quantiles
SampleQuantiles
Figure: Upper left: log χ2
d quantiles. Upper right N(0, 1) quantiles
separately for the 13 summaries. Lower left: N(0, 1) quantiles for the
observed summaries. 58 / 54
62. JUSTIFICATION OF GAUSSIANITY
Assuming Gaussianity for summaries s(·) can be justified from
a standard Taylor expansion.
Say that fθ(s) is the true (unknown) joint density of s.
Expand fθ(s) around its mode µθ:
log fθ(s) ≈ log fθ(µθ) +
1
2
(s − µθ)
∂2 log fθ
∂s∂s
(s − µθ)
hence
fθ(s) ≈ const × exp −
1
2
(s − µθ) −
∂2 log fθ
∂s∂s
(s − µθ)
s ∼ N µθ, −
∂2 log fθ
∂s∂s
−1
, approximately when s ≈ µθ
59 / 54
63. ASYMPTOTIC PROPERTIES FOR SYNTHETIC
LIKELIHOODS
As the number of simulated statistics Nr → ∞
the maximizer ˆθ of liks(θ) is a consistent estimator.
ˆθ is an unbiased estimator.
ˆθ might not be in general Gaussian. It will be Gaussian if
Σθ depends weakly on θ or when d = dim(s) is large.
60 / 54