SlideShare a Scribd company logo
1 of 63
Download to read offline
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
My data are incomplete and noisy!
Information-reduction statistical methods for knowledge
extraction can save your day: tools and opportunities for
modelling
Docent/Readership lecture
Lund, 7 June 2016
Umberto Picchini
Centre for Mathematical Sciences
Lund University
www.maths.lth.se/matstat/staff/umberto/
1 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
PREAMBLE: WHAT IS A DOCENT/READERSHIP
LECTURE?
It’s a lecture in a popular science context.
Target are research students within the entire Faculty of Science
at Lund University.
It should cover my own subject area, but not my own research
output.
My notation and level of exposition will be subject to the above.
2 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
I will discuss a few methods for parameter inference (aka
calibration, inverse problem) in presence of incomplete and
noisy data.
However I will start mentioning:
what I mean with incomplete and noisy observations;
why use stochastic modelling;
issues with state-of-art inference methods for dynamic
models.
I will focus on two methods based on information reduction
that use summary statistics of data: synthetic likelihoods and
approximate Bayesian computation.
These are powerful, general and flexible methods but also very
easy to introduce to a general audience of researchers.
3 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
Goal is to show how state-of-art exact methods for parameter
estimation can spectacularly fail in some scenarios, including:
near chaotic systems;
noisy (stochastic) dynamical systems;
badly initialized optimization or Bayesian MCMC
algorithms.
But first, what do I mean with “incomplete data”?
4 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
INCOMPLETE OBSERVATIONS
There can be many situations where we only have partial
information of the system of scenario of interest.
We oversimplify and call “incomplete” or “partially observed”
those experiments where not all the variables of interest are
observable.
This could mean that at least one of the following holds:
1. some variables are completely unobserved (measurements
for those variables are unavailable)
2. we have a system evolving continuously in time but we
only observe it a discrete time points;
3. the variables we observe are perturbed measurements of
the actual variables of interest.
5 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
Consider a noise perturbed “signal”
Typically values of our measurements do not exclusively
represent what we are really interested in measuring.
The unperturbed signal is unavailable because measurements
are corrupted with noise.
6 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
Therefore we have a system (think about a physical system)
having
an observable component Y
an unobservable/latent signal X
noise ε
And of course if we model time dynamics we could think of n
discrete-time measurements
Yti
= Xti
+ εti
, i = 1, ..., n
Or more in general
Yti
= f(Xti
, εti
)
for some arbitrary yet known function f(·).
7 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
AN EXAMPLE: CONCENTRATION OF A DRUG
Time course of a certain drug concentration in blood. This is
measured at discrete times.
*
*
*
*
* *
*
* * *
0 20 40 60 80 100 120
020406080100
time in minutes
C12concentration
8 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
We may postulate a deterministic model:
dCt
dt
= −µCt
Ct = C0e−µt
, µ > 0
With measurements Yti
= Cti
+ εti
, with εti
∼ N(0, σ2
ε)
*
*
*
*
* *
*
* * *
0 20 40 60 80 100 120
020406080100
time in minutes
C12concentration
9 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
We may postulate a deterministic model:
dCt
dt
= −µCt
Ct = C0e−µt
, µ > 0
With measurements Yti
= Cti
+ εti
, with εti
∼ N(0, σ2
ε)
*
*
*
*
* *
*
* * *
0 20 40 60 80 100 120
020406080100
time in minutes
C12concentration
There’s some discrepancy with the
fit (residual error).
9 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
In previous slides you might have noticed the introduction of a
parameter µ > 0.
Even though an expert researcher might have a clue about
some reasonable value for µ, in reality this is at best nothing
more than a guess.
In fact, when we previously wrote that we can assume
Yti
= f(Xti
, εti
) = f(Cti
, εti
)
most often what we have is
Yti
= f(Xti
, θ, εti
) = f(Cti
, µ, εti
)
that is a dependence from an unknown (vector) parameter θ.
In our example θ ≡ µ.
10 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
My main interest:
to study and develop principled methods that return an
estimate of θ and its uncertainty analysis.
In particular, I am interested in modelling stochastic dynamics.
In the previous example Ct was evolving in a deterministic
fashion, given a fixed starting value C0 and a fixed value for µ.
In next slide I show an alternative, stochastic approach.
11 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
Add systemic (white) noise:
dC(t)
dt
= −µC(t) + “white noise”
And again Yti
= Cti
+ εti
, with εti
∼ N(0, σ2
ε)
*
*
*
*
* *
*
* * *
0 20 40 60 80 100 120
020406080100
time in minutes
C12concentration
12 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
Add systemic (white) noise:
dC(t)
dt
= −µC(t) + “white noise”
And again Yti
= Cti
+ εti
, with εti
∼ N(0, σ2
ε)
*
*
*
*
* *
*
* * *
0 20 40 60 80 100 120
020406080100
time in minutes
C12concentration
Dynamics are stochastic.
Residual error can’t be eliminated.
It’s always “there”.
12 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
The problem with stochastic models is that:
1. realizations from the model might not really resemble the
observed data...
2. this can happen even when simulate using a θ which is
close to its true value...
3. ...and even when we are simulating from the “true model”
(simulation studies).
Although this is understandable, it is also upsetting when our
methods are based on producing realizations meant to get close
to observations.
13 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
A NEARLY CHAOTIC MODEL
Two realizations from the Ricker model (discussed later), without
measurement noise.
Nt = r · Nt−1 · e−Nt−1+et
, et ∼ N(0, σ2
)
Small changes in a parameter cause major departures from data.
051015
nt
Time
5 10 15 20 25
−260−220−180−140
Log−likelihood
log(r
2.5 3.0 3.5
Figure: One path generated with log r = 3.8 (black) and one generated with
log r = 3.799 (red) when σ = 0.
14 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
Now, a modeller (say a statistician) does not necessarily
have the expertise nor the time to study the qualitative
behaviour of solutions of a series of candidate models.
most often the modeller wants to test a range of possible
models against available data.
Simulations from postulated models should produce
trajectories approximately resembling data.
Parameter estimation should be performed to identify
“best fitting” parameters.
15 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
Notice for many years even simple stochastic models have been
terribly difficult to calibrate against data.
It is usually impossible to write the likelihood function p(y1:T|θ)
in closed form.
It is also impossible to write filtering densities for the states
p(x1:T|y1:T; θ).
Notation:
I am going to use lower case letters both for random
variables and for their realized values.
I use a1:T to denote a sequence (a1, a2, ..., aT) where T is the
time horizon.
I assume integer time indeces to ease notation.
16 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
SSM: STATE–SPACE MODEL
Graphically:
"!
#
"!
#
"!
#
"!
#
"!
#
"!
#
Yt−1 Yt Yt+1
Xt−1 Xt Xt+1
- - - -
6 6 6
... ... (Markov chain)
(Observations)
(Yt|Xt = xt) ∼ p(yt|xt) (Observation density)
(Xt+1|Xt = xt) ∼ p(xt+1|xt) (Transition density)
X0 ∼ π(x0) (Initial distribution)
Example: Yt a subject’s measured glycemia at time t. Xt the
actual glycemia at t.
17 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
THE LIKELIHOOD FUNCTION 1/2
It turns out that, even for such a simple construct, it is difficult
to write the likelihood function.
In a SSM data are not independent, they are only conditionally
independent→ complication!:
p(y1:T|θ) = p(y1|θ)
T
t=2
p(yt|y1:t−1, θ) =?
We don’t have a closed for expression for the product above
because we do not know how to calculate p(yt|y1:t−1, θ).
Let’s see why.
18 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
THE LIKELIHOOD FUNCTION 2/2
In a SSM the observed process is assumed to depend on the latent
Markov process {Xt}: we can write
p(y1:T|θ) = p(y1:T, x0:T|θ)dx0:T = p(y1:T|x0:T, θ)
use cond. indep.
× p(x0:T|θ)
use Markovianity
dx0:T
=
T
t=1
p(yt|xt, θ) × p(x0|θ)
T
t=1
p(xt|xt−1, θ) dx0:T
Problems
The expression above is a (T + 1)-dimensional integral
For most (nontrivial) models, transition densities p(xt|xt−1) are
unknown
19 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
However today we have quite a number of reliable Monte Carlo
solutions to the integration problem.
I am not going to introduce state-of-art methods for SSM but these
are essentially based on particle filters (or sequential Monte Carlo)
methods.
particle marginal methods and particle MCMC (Andrieu and
Roberts 2009; Andrieu et al. 2010) for Bayesian inference.
iterated filtering (Ionides et al. 2011, 2015) for maximum
likelihood inference.
As shown in Andrieu and Roberts 2009:
1. obtain an approximation of the likelihood ˆp(y1:T|θ) using particle
filters;
2. plug ˆp(y1:T|θ) into a MCMC algorithm for Bayesian inference on
θ.
3. then the MCMC returns samples exactly from π(θ|y1:T).
20 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
If you are interested in a quick review of particle methods for
parameters inference (not state inference) check my slides on
SlideShare http://goo.gl/4aZxL1
As I said in this presentation I focus on synthetic likelihoods
and approximate Bayesian computation.
I now consider a simple example where the celebrated particle
marginal methodology of Andrieu and Roberts1, which is
supposed to return exact Bayesian inference, does not work.
1
C. Andrieu and G. Roberts 2009, Annals of Statistics 37(2).
21 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
THE STOCHASTIC RICKER MODEL
yt ∼ Pois(φNt)
Nt = r · Nt−1 · e−Nt−1+et , et ∼ N(0, σ2)
It can be used to describe the evolution in time of a population
of size Nt.
r is the intrinsic growth rate of the population;
φ is a scale parameter
The process noise et can be interpreted as environmental
noise. Assume the et as iid.
This is a hidden Markov model, as the dynamics of {Nt} are
Markovian and we assume measurements y1:T to be
conditionally independent given {Nt}.
22 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
Here is a realization of length T = 50 from the Ricker model.
050150250
y
0510152025
N
−0.40.00.20.4
0 10 20 30 40 50
e
time
myobservedricker
Ground-truth parameters are log r = 3.8, φ = 10 and σ = 0.3 (and
imposing N0 = 7 and e0 = 0). Same settings as in S. Wood 20102
2
S. Wood 2010, Nature 466, pp. 1102–1104.
23 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
SOFTWARE CODE
The simulation results in the next three slides can be
reproduced using the code at
https://github.com/umbertopicchini/pomp-ricker
24 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
RUNNING pomp ricker − pmcmc.R FOR EXACT BAYESIAN INFERENCE
Suppose we are only interested in estimating the parameter r from
data, while remaining quantities are fixed to their ground-truth
values.
Particle MCMC works well here. So exact inference is possible.
Here is an MCMC chain of length 2000 with likelihood estimated via
bootstrap filter (1000 particles used). We let r start at r = 12.2.
True value is r = 44.7, estimated posterior mean is 44.6 [38.5,51.7]
0 500 1000 1500 2000
20304050
Iterations
25 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
So all good in the considered example.
We might reasonably imagine that if our model is “less
stochastic” (a smaller σ) it should be even easier to conduct
inference.
Recall that
Nt = r · Nt−1 · e−Nt−1+et
, et ∼ N(0, σ2
)
Modellers don’t know a-priori the values of underlying
parameters.
Suppose we now use data generated with σ = 0.01 (instead of
σ = 0.3)
26 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
Here is what happens with the same conditions as before,
except for using data generated with σ = 0.01.
−1200−1050−900
loglik
−7−6−5−4−3
log.prior
12141618
nfail
131517
r
0 500 1000 1500 2000
PMCMC iteration
PMCMC convergence diagnostics
The chain (lower panel) is stuck at the wrong mode! r ≈ 18
hence log r = 2.9.
It gets stuck even if we use more particles. 27 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
As beautifully illustrated in Fasiolo et al.3
, the very interesting reason
why the estimation fails for nearly deterministic dynamics is the
following:
2.0 2.5 3.0 3.5 4.0 4.5
−120−110−100−90−80
σ = 0.3
2.0 2.5 3.0 3.5 4.0 4.5
−140−120−100−80
σ = 0.1
2.0 2.5 3.0 3.5 4.0 4.5
−140−120−100−80
σ = 0.05
2.0 2.5 3.0 3.5 4.0 4.5
−180−140−100
σ = 0.01
Log−likelihood
log(r)
Figure: Black: the true likelihood for log r. Red: the particle filter
approximation. Figure from Fasiolo et al. 2016.
3
M. Fasiolo, N. Pya and S. Wood 2016. Statistical Science 31(1),pp. 96-118
28 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
However the filter being unable to approximate the likelihood is only
consequence of something more subtle: look at the exact loglikelihood
for non-stochastic dynamics, i.e. here σ = 0.
2.5 3.0 3.5 4.0
−15−10−5
Ricker
log(r)
271217
nt
−5 −4
−35−25−15−50
Pen
lo
−50
Varley
0−5
Mayna
Log−likelihood(103
)
Figure: Black: the true loglikelihood when σ = 0. Grey: bifurcation diagram.
Figure from Fasiolo et al. 2016.
Go here if you are unfamiliar with bifurcations.
29 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
The instability of some models for a small quantity of noise (σ)
produces major differences in simulated trajectories for small
perturbations in the parameters.
051015
nt
Time
5 10 15 20 25
−260−220−180−140
Log−likelihood
2.5 3.0
Figure: One path generated with log r = 3.8 (black) and one generated with
log r = 3.799 (red) when σ = 0.
30 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
A CHANGE OF PARADIGM
from S. Wood, Nature 2010:
“Naive methods of statistical inference try to make the model
reproduce the exact course of the observed data in a way that the real
system itself would not do if repeated.”
“What is important is to identify a set of statistics that is sensitive to
the scientifically important and repeatable features of the data, but
insensitive to replicate-specific details of phase.”
In other words, with complex, stochastic and/or chaotic model
we should try to match features of the data, not the path of the
data themselves.
31 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
SYNTHETIC LIKELIHOODS
y: observed data, from static or dynamic models
s(y): (vector of) summary statistics of data, e.g. mean,
autocorrelations, marginal quantiles etc.
assume
s(y) ∼ N(µθ, Σθ)
an assumption justifiable via second order Taylor
expansion (same as in Laplace approximations).
µθ and Σθ unknown: estimate them via simulations.
32 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARYnature09319-f2.2.jpg (JPEG Image, 946 × 867 pixels) - Scaled (84%) http://www.nature.com/nature/journal/v466/n7310/images/nature09319...
Figure: Schematic representation of the synthetic likelihoods procedure.
33 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
For fixed θ we simulate Nr artificial datasets y∗
1 , ..., y∗
Nr
and
compute corresponding (possibly vector valued) summaries
s∗
1 , ..., s∗
Nr
.
compute
ˆµθ =
1
Nr
Nr
i=1
s∗
i , ˆΣθ =
1
Nr − 1
Nr
i=1
(s∗
i − ˆµθ)(s∗
i − ˆµθ)
compute the statistics sobs for the observed data y.
evaluate a multivariate Gaussian likelihood at sobs
liks(θ) := exp(ls(θ)) = N(sobs; ˆµθ, ˆΣθ) ∝
1
|ˆΣθ|
e−(sobs−ˆµθ)ˆΣ−1
θ (sobs−ˆµθ)/2
This likelihood can be maximized for a varying θ or be plugged
within an MCMC algorithm for Bayesian inference
ˆπ(θ|sobs) ∝ liks(θ)π(θ).
34 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
For the Ricker model Wood (2010) uses 13 summaries,
including:
the sample mean of observations ¯y;
number of zeros in the dataset;
autocovariances up to lag 5;
estimates for β0 and β1 from fitting
E(y0.3
t ) = β0y0.3
t−1 + β1y0.6
t−1
and a few more summaries...(not important to be
mentioned here, but you got the idea).
So we have sobs = (¯y, #zeros, autocovars, β0, β1, ...).
35 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
RUNNING pomp ricker − synlik.R
We consider the dataset generated with the settings where
particle methods have failed (same starting values etc.), i.e.
σ = 0.01.
Here follow synthetic likelihood estimates:
r σ φ
starting value 12.18 1 20
true value 44.7 0.01 10
SL estimate 45.25 0.23 10.16
We maximised liks(θ) using Nelder–Mead with Nr = 5000
simulations.
Standard errors could be found via profile likelihood (not
reported).
36 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
BAYESIAN SYNTHETIC LIKELIHOODS
Just plug liks(θ) in a Metropolis-Hastings MCMC procedure to
sample from ˆπ(θ|sobs) ∝ liks(θ)π(θ).
Figure is from Price et al. (2016). They consider the “nicely
behaved” data obtained with σ = 0.3.
mated log SL at θ = (3.8, 10, 0.3) . A ‘-’ indicates that a result is not available for uBSL
as the value of n is too small.
n acc. rate (%) ESS log r ESS σe ESS φ sd log pA,n(y|θ = (3.8, 10, 0.3) )
20 2.4/1.8 11/8 13/10 18/13 large
30 8.2/7.7 23/22 27/25 38/31 4.1
40 12.5/12.6 29/28 34/32 44/42 2.5
50 15.9/15.7 30/31 35/35 45/45 1.9
80 21.8/21.4 29/29 34/32 47/47 1.2
100 24.1/23.7 25/27 30/30 41/39 1.0
250 29.6/29.5 13/14 15/17 20/23 0.6
3 3.5 4 4.5
0
1
2
n=30
n=100
n=250
(a) log r
8 10 12
0
0.2
0.4
0.6
0.8
n=30
n=100
n=250
(b) φ
-0.2 0 0.2 0.4 0.6
0
1
2
3
4
n=30
n=100
n=250
(c) σe
Figure 6: Posterior estimates for log r, σe and φ of the Ricker model when using BSL with
various values of n.Figure: Posteriors for several values of Nr. True parameter values are:
log r = 3.8, φ = 10, σ = 0.3 37 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
APPROXIMATE BAYESIAN COMPUTATION (ABC)
Another methodology, also using information-reduction via
summaries, is Approximate Bayesian Computation (ABC).
ABC has received enormous interest. A list of review papers is
provided in the references section.
You can also check my own intro to ABC.
38 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
LIKELIHOOD FREE REJECTION SAMPLING
Recall y is data.
We wish to sample from π(θ|y) ∝ p(y|θ)π(θ).
1. simulate from the prior θ∗ ∼ π(θ)
2. plug θ∗ in your model and simulate artificial data y∗ [this
is the same as writing y∗ ∼ p(y|θ∗)]
3. if y∗ = y store θ∗. Go to step 1 and repeat.
The above is a likelihood free algorithm: it does not require
knowledge of the expression of the likelihood p(y|θ).
Each accepted θ∗ is such that θ∗ ∼ π(θ|y) exactly.
39 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
ABC REJECTION (PRITCHARD ET AL.4
)
Same as before, but comparing s(y) with s(y∗) for “appropriate”
summaries s(·) and a small tolerance > 0.
1. simulate from the prior θ∗ ∼ π(θ)
2. simulate a y∗ ∼ p(y|θ∗), compute s(y∗)
3. if s(y∗) − s(y) < store θ∗. Go to 1 and repeat.
Samples are from π (θ|s(y)) with
π (θ|s(y)) ∝
Y
p(y∗
|θ∗
)π(θ∗
)IA ,y (y∗
)dy∗
A ,y(y∗) = {y∗ ∈ Y; s(y∗) − s(y) < }.
4
Pritchard et al. 1999, Molecular Biology and Evolution, 16:1791-1798.
40 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
EXAMPLE: g-AND-k DISTRIBUTIONS
Mixtures of Gaussians are often used to describe complex,
nonstandard distributions.
a mixture of two-Gaussians requires specifying 5
parameters.
apparently, it is sometimes challenging to estimate such
parameters due to unidentifiability (Marin and Robert5).
simulating (sampling) from mixtures is an optimization
problem (can be computer intensive).
5
Marin and Robert, Bayesian Core, Springer 2007.
41 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
g-and-k distributions only require 4 parameters and it’s fast to
simulate from them.
g-and-k distributions have no closed-form likelihood, but we can
simulate their quantiles.
F−1
(x; A, B, g, k) = A + B 1 + 0.8 ·
1 − exp(−g · r(x))
1 + exp(−g · r(x))
(1 + r2
(x))k
r(x)
with r(x) ∼ N(0, 1) the xth quantile from a standard Gaussian.
data
0 5 10 15 20 25 30
0
500
1000
1500
2000
2500
3000
3500
4000
4500
n = 10, 000 samples generated with A = 3, B = 1, g = 2, k = 0.5.
42 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
I wrote an MCMC algorithm using ABC with summaries the
20th, 40th, 60th, 80th empirical quantiles and the skewness
value
s(y) = (q20, q40, q60, q80, skew(y)).
Again, we simulate datasets y∗, compute s(y∗) and compare
them with s(y) at each MCMC iteration.
I let the threshold decrease progressively during the
algorithm.
43 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
Red lines are true parameter values.
A ×104
0 1 2 3 4 5 6
2.5
3
3.5
4
4.5
5
B ×104
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
0
1
2
3
4
5
6
g ×104
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
0
2
4
6
8
10
k ×104
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
0
0.5
1
1.5
2
It is evident that I reduced at iteration 10,000 and then at
iteration 20,000.
44 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
SUMMARY
We have discussed plug-and-play methods that only require the
ability to simulate from the model.
How to choose summaries s(·) is a delicate and open problem.
However there exist much literature available.
Information-reduction methods are very general and in many
cases are the only possible option for inference.
ABC and SL can be used for a short pilot run, to obtain a better
starting value for θ to be used in more accurate methods.
For example Owen et al.6
use ABC to obtain a starting value for
particle MCMC.
6
Owen et al. 2015. Likelihood free inference for Markov processes: a
comparison. Statistical applications in genetics and molecular biology, 14(2),
pp.189-209.
45 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
Our suggestion, when dealing with a new model or a new
dataset, is not to settle on a single methodology, but instead
try to perform a battery of algorithms.
This require method developers to share their software codes to
ease methods reproducibility.
This is still not common practice.
46 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
If you wish to review some of the presented concepts, slides for this
presentation are available on SlideShare at
http://goo.gl/gg0Cqq.
R code to reproduce the Ricker model results is at
https://github.com/umbertopicchini/pomp-ricker/
MATLAB code for ABC and g-and-k distributions is at
https://github.com/umbertopicchini/abc_g-and-k/
Feel free to write me or drop by7
for a chat.
7
umberto@maths.lth.se, office MH:321, Matematikcentrum
47 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
If you wish to review some of the presented concepts, slides for this
presentation are available on SlideShare at
http://goo.gl/gg0Cqq.
R code to reproduce the Ricker model results is at
https://github.com/umbertopicchini/pomp-ricker/
MATLAB code for ABC and g-and-k distributions is at
https://github.com/umbertopicchini/abc_g-and-k/
Feel free to write me or drop by7
for a chat.
Thank You
7
umberto@maths.lth.se, office MH:321, Matematikcentrum
47 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
REFERENCES ON SYNTHETIC LIKELIHOODS
1. Fasiolo, M., Pya, N. and Wood, S.N., 2016. A Comparison of
Inferential Methods for Highly Nonlinear State Space Models in
Ecology and Epidemiology. Statistical Science, 31(1), pp.96-118.
2. Meeds, E. and Welling, M., 2014. GPS-ABC: Gaussian process
surrogate approximate Bayesian computation. arXiv:1401.2838.
3. Price, L.F., Drovandi, C.C., Lee, A. and Nott, D.J., 2016. Bayesian
synthetic likelihood. http://eprints.qut.edu.au/92795/
4. Wood, S.N., 2010. Statistical inference for noisy nonlinear
ecological dynamic systems. Nature, 466(7310), pp.1102-1104.
(4) is where SL got first introduced. Our talk is largely based on (1).
(2) merges SL with ABC. (3) studies the performance of SL in a
(pseudo-marginal) Bayesian setting.
48 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
SOFTWARE FOR SYNTHETIC LIKELIHOODS
R package pomp: http://kingaa.github.io/pomp
R package synlik:
https://mfasiolo.github.io/synlik_release/
49 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
ABC SOFTWARE
EasyABC, R package. Research article.
abc, R package. Research article
abctools, R package. Research article. Focusses on tuning.
Lists with more options here and here .
examples with implemented model simulators (useful to
incorporate in your programs).
50 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
REVIEWS ON ABC
Fairly extensive but accessible reviews:
1. Sisson and Fan 2010
2. (with applications in ecology) Beaumont 2010
3. Marin et al. 2010
Simpler introductions:
1. Sunn˚aker et al. 2013
2. (with applications in ecology) Hartig et al. 2013
Review specific for dynamical models:
1. Jasra 2015
51 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
NON-REVIEWS ON ABC, SPECIFIC FOR DYNAMICAL
MODELS
1. SMC for Parameter estimation and model comparison:
Toni et al. 2009
2. Markov models: White et al. 2015
3. SMC: Sisson et al. 2007
4. SMC: Dean et al. 2014
5. SMC: Jasra et al. 2010
6. MCMC: Picchini 2013
52 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
MORE SPECIALISTIC ABC RESOURCES
selection of summary statistics: Fearnhead and Prangle
2012.
review on summary statistics selection: Blum et al. 2013
expectation-propagation ABC: Barthelme and Chopin 2012
Gaussian Processes ABC: Meeds and Welling 2014
ABC model choice: Pudlo et al 2015
53 / 54
INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
BLOG POSTS AND SLIDES ON ABC
1. Christian P. Robert often blogs about ABC (and beyond:
it’s a fantastic blog!)
2. an intro to ABC by Darren J. Wilkinson
3. Two posts by Rasmus B˚a˚ath here and here
4. my own intro to ABC
5. Tons of slides at SlideShare.
54 / 54
APPENDIX
55 / 54
VERIFY SUMMARIES DISTRIBUTION
The crucial assumption used in synthetic likelihoods is that
s ∼ N(µθ, Σθ).
We could check that (appendix in Wood 2010 for details):
(s∗ − ˆµθ) ˆΣ−1
θ (s∗ − ˆµθ) ∼ χ2
d with d = dim(s).
That is for Nr simulated summaries, plot the ordered
values of (s∗
j − ˆµθ) ˆΣ−1
θ (s∗
j − ˆµθ) versus the quantiles of
χ2
d, j = 1, ..., Nr.
For graphical purposes this can be done on log scale.
56 / 54
we could separately check that each dimension of the
simulated summaries is approximately normal (basically
produce d separate qqplots).
same as the above but for the observed summaries.
Next slide produces the three types of quantile plots for the
Ricker model when log(r) = 3.8, φ = 10, σ = 0.3.
Recall for the Ricker model we have d = 13 summaries.
57 / 54
1.0 1.5 2.0 2.5 3.0 3.5
123456
log theoretical quantiles
logobservedquantiles
−3 −2 −1 0 1 2 3
−4024
Marginal Q−Q Plot
N(0,1) quantiles
marginalquantiles
−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5
−1.50.01.0
Normal Q−Q Plot
Theoretical Quantiles
SampleQuantiles
Figure: Upper left: log χ2
d quantiles. Upper right N(0, 1) quantiles
separately for the 13 summaries. Lower left: N(0, 1) quantiles for the
observed summaries. 58 / 54
JUSTIFICATION OF GAUSSIANITY
Assuming Gaussianity for summaries s(·) can be justified from
a standard Taylor expansion.
Say that fθ(s) is the true (unknown) joint density of s.
Expand fθ(s) around its mode µθ:
log fθ(s) ≈ log fθ(µθ) +
1
2
(s − µθ)
∂2 log fθ
∂s∂s
(s − µθ)
hence
fθ(s) ≈ const × exp −
1
2
(s − µθ) −
∂2 log fθ
∂s∂s
(s − µθ)
s ∼ N µθ, −
∂2 log fθ
∂s∂s
−1
, approximately when s ≈ µθ
59 / 54
ASYMPTOTIC PROPERTIES FOR SYNTHETIC
LIKELIHOODS
As the number of simulated statistics Nr → ∞
the maximizer ˆθ of liks(θ) is a consistent estimator.
ˆθ is an unbiased estimator.
ˆθ might not be in general Gaussian. It will be Gaussian if
Σθ depends weakly on θ or when d = dim(s) is large.
60 / 54

More Related Content

What's hot

Presentation on GMM
Presentation on GMMPresentation on GMM
Presentation on GMMMoses sichei
 
Monte carlo simulation
Monte carlo simulationMonte carlo simulation
Monte carlo simulationMissAnam
 
Lecture: Monte Carlo Methods
Lecture: Monte Carlo MethodsLecture: Monte Carlo Methods
Lecture: Monte Carlo MethodsFrank Kienle
 
Suppose that the probability is 0.8 that any given person will believe a tale...
Suppose that the probability is 0.8 that any given person will believe a tale...Suppose that the probability is 0.8 that any given person will believe a tale...
Suppose that the probability is 0.8 that any given person will believe a tale...Nadeem Uddin
 
Definition of statistical efficiency
Definition of statistical efficiencyDefinition of statistical efficiency
Definition of statistical efficiencyRuhulAmin339
 
Distributed lag model
Distributed lag modelDistributed lag model
Distributed lag modelPawan Kawan
 
Stochastic Process and its Applications.
Stochastic Process and its Applications.Stochastic Process and its Applications.
Stochastic Process and its Applications.Dr. Rahul Pandya
 
Monte Carlo Simulation - Paul wilmott
Monte Carlo Simulation - Paul wilmottMonte Carlo Simulation - Paul wilmott
Monte Carlo Simulation - Paul wilmottAguinaldo Flor
 
Varian, microeconomic analysis, solution book
Varian, microeconomic analysis, solution bookVarian, microeconomic analysis, solution book
Varian, microeconomic analysis, solution bookJosé Antonio PAYANO YALE
 
Isi and nyquist criterion
Isi and nyquist criterionIsi and nyquist criterion
Isi and nyquist criterionsrkrishna341
 
Sampling and sampling distribution
Sampling and sampling distributionSampling and sampling distribution
Sampling and sampling distributionAli Raza
 
Basic simulation lab manual1
Basic simulation lab manual1Basic simulation lab manual1
Basic simulation lab manual1Janardhana Raju M
 
Discreate Event Simulation_PPT1-R0.ppt
Discreate Event Simulation_PPT1-R0.pptDiscreate Event Simulation_PPT1-R0.ppt
Discreate Event Simulation_PPT1-R0.pptdiklatMSU
 
Non linear curve fitting
Non linear curve fitting Non linear curve fitting
Non linear curve fitting Anumita Mondal
 
Probability Density Function (PDF)
Probability Density Function (PDF)Probability Density Function (PDF)
Probability Density Function (PDF)AakankshaR
 

What's hot (20)

Presentation on GMM
Presentation on GMMPresentation on GMM
Presentation on GMM
 
Time series Analysis
Time series AnalysisTime series Analysis
Time series Analysis
 
Monte carlo simulation
Monte carlo simulationMonte carlo simulation
Monte carlo simulation
 
Lecture: Monte Carlo Methods
Lecture: Monte Carlo MethodsLecture: Monte Carlo Methods
Lecture: Monte Carlo Methods
 
Econometrics - lecture 18 and 19
Econometrics - lecture 18 and 19Econometrics - lecture 18 and 19
Econometrics - lecture 18 and 19
 
Suppose that the probability is 0.8 that any given person will believe a tale...
Suppose that the probability is 0.8 that any given person will believe a tale...Suppose that the probability is 0.8 that any given person will believe a tale...
Suppose that the probability is 0.8 that any given person will believe a tale...
 
DM-unit-3.ppt
DM-unit-3.pptDM-unit-3.ppt
DM-unit-3.ppt
 
Definition of statistical efficiency
Definition of statistical efficiencyDefinition of statistical efficiency
Definition of statistical efficiency
 
2. adf tests
2. adf tests2. adf tests
2. adf tests
 
Distributed lag model
Distributed lag modelDistributed lag model
Distributed lag model
 
Stochastic Process and its Applications.
Stochastic Process and its Applications.Stochastic Process and its Applications.
Stochastic Process and its Applications.
 
Monte Carlo Simulation - Paul wilmott
Monte Carlo Simulation - Paul wilmottMonte Carlo Simulation - Paul wilmott
Monte Carlo Simulation - Paul wilmott
 
Varian, microeconomic analysis, solution book
Varian, microeconomic analysis, solution bookVarian, microeconomic analysis, solution book
Varian, microeconomic analysis, solution book
 
Isi and nyquist criterion
Isi and nyquist criterionIsi and nyquist criterion
Isi and nyquist criterion
 
Sampling and sampling distribution
Sampling and sampling distributionSampling and sampling distribution
Sampling and sampling distribution
 
Basic simulation lab manual1
Basic simulation lab manual1Basic simulation lab manual1
Basic simulation lab manual1
 
Discreate Event Simulation_PPT1-R0.ppt
Discreate Event Simulation_PPT1-R0.pptDiscreate Event Simulation_PPT1-R0.ppt
Discreate Event Simulation_PPT1-R0.ppt
 
Non linear curve fitting
Non linear curve fitting Non linear curve fitting
Non linear curve fitting
 
Probability Density Function (PDF)
Probability Density Function (PDF)Probability Density Function (PDF)
Probability Density Function (PDF)
 
Statistical Confidence Level
Statistical Confidence LevelStatistical Confidence Level
Statistical Confidence Level
 

Similar to My data are incomplete and noisy: Information-reduction statistical methods for knowledge extraction can save your day: tools and opportunities for modelling

Stratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computationStratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computationUmberto Picchini
 
Accelerated approximate Bayesian computation with applications to protein fol...
Accelerated approximate Bayesian computation with applications to protein fol...Accelerated approximate Bayesian computation with applications to protein fol...
Accelerated approximate Bayesian computation with applications to protein fol...Umberto Picchini
 
Current limitations of sequential inference in general hidden Markov models
Current limitations of sequential inference in general hidden Markov modelsCurrent limitations of sequential inference in general hidden Markov models
Current limitations of sequential inference in general hidden Markov modelsPierre Jacob
 
Approximate Bayesian computation for the Ising/Potts model
Approximate Bayesian computation for the Ising/Potts modelApproximate Bayesian computation for the Ising/Potts model
Approximate Bayesian computation for the Ising/Potts modelMatt Moores
 
Measuring credit risk in a large banking system: econometric modeling and emp...
Measuring credit risk in a large banking system: econometric modeling and emp...Measuring credit risk in a large banking system: econometric modeling and emp...
Measuring credit risk in a large banking system: econometric modeling and emp...SYRTO Project
 
20130928 automated theorem_proving_harrison
20130928 automated theorem_proving_harrison20130928 automated theorem_proving_harrison
20130928 automated theorem_proving_harrisonComputer Science Club
 
Epidemic processes on switching networks
Epidemic processes on switching networksEpidemic processes on switching networks
Epidemic processes on switching networksNaoki Masuda
 
Foundations of Statistics for Ecology and Evolution. 4. Maximum Likelihood
Foundations of Statistics for Ecology and Evolution. 4. Maximum LikelihoodFoundations of Statistics for Ecology and Evolution. 4. Maximum Likelihood
Foundations of Statistics for Ecology and Evolution. 4. Maximum LikelihoodAndres Lopez-Sepulcre
 
Count-Distinct Problem
Count-Distinct ProblemCount-Distinct Problem
Count-Distinct ProblemKai Zhang
 
A likelihood-free version of the stochastic approximation EM algorithm (SAEM)...
A likelihood-free version of the stochastic approximation EM algorithm (SAEM)...A likelihood-free version of the stochastic approximation EM algorithm (SAEM)...
A likelihood-free version of the stochastic approximation EM algorithm (SAEM)...Umberto Picchini
 
Some real life data analysis on stationary and non-stationary Time Series
Some real life data analysis on stationary and non-stationary Time SeriesSome real life data analysis on stationary and non-stationary Time Series
Some real life data analysis on stationary and non-stationary Time SeriesAjay Bidyarthy
 
Stratified Monte Carlo and bootstrapping for approximate Bayesian computation
Stratified Monte Carlo and bootstrapping for approximate Bayesian computationStratified Monte Carlo and bootstrapping for approximate Bayesian computation
Stratified Monte Carlo and bootstrapping for approximate Bayesian computationUmberto Picchini
 

Similar to My data are incomplete and noisy: Information-reduction statistical methods for knowledge extraction can save your day: tools and opportunities for modelling (20)

Stratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computationStratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computation
 
intro
introintro
intro
 
Accelerated approximate Bayesian computation with applications to protein fol...
Accelerated approximate Bayesian computation with applications to protein fol...Accelerated approximate Bayesian computation with applications to protein fol...
Accelerated approximate Bayesian computation with applications to protein fol...
 
Talk 5
Talk 5Talk 5
Talk 5
 
Input analysis
Input analysisInput analysis
Input analysis
 
Current limitations of sequential inference in general hidden Markov models
Current limitations of sequential inference in general hidden Markov modelsCurrent limitations of sequential inference in general hidden Markov models
Current limitations of sequential inference in general hidden Markov models
 
Approximate Bayesian computation for the Ising/Potts model
Approximate Bayesian computation for the Ising/Potts modelApproximate Bayesian computation for the Ising/Potts model
Approximate Bayesian computation for the Ising/Potts model
 
Measuring credit risk in a large banking system: econometric modeling and emp...
Measuring credit risk in a large banking system: econometric modeling and emp...Measuring credit risk in a large banking system: econometric modeling and emp...
Measuring credit risk in a large banking system: econometric modeling and emp...
 
20130928 automated theorem_proving_harrison
20130928 automated theorem_proving_harrison20130928 automated theorem_proving_harrison
20130928 automated theorem_proving_harrison
 
QMC: Transition Workshop - Probability Models for Discretization Uncertainty ...
QMC: Transition Workshop - Probability Models for Discretization Uncertainty ...QMC: Transition Workshop - Probability Models for Discretization Uncertainty ...
QMC: Transition Workshop - Probability Models for Discretization Uncertainty ...
 
Epidemic processes on switching networks
Epidemic processes on switching networksEpidemic processes on switching networks
Epidemic processes on switching networks
 
Foundations of Statistics for Ecology and Evolution. 4. Maximum Likelihood
Foundations of Statistics for Ecology and Evolution. 4. Maximum LikelihoodFoundations of Statistics for Ecology and Evolution. 4. Maximum Likelihood
Foundations of Statistics for Ecology and Evolution. 4. Maximum Likelihood
 
Count-Distinct Problem
Count-Distinct ProblemCount-Distinct Problem
Count-Distinct Problem
 
A likelihood-free version of the stochastic approximation EM algorithm (SAEM)...
A likelihood-free version of the stochastic approximation EM algorithm (SAEM)...A likelihood-free version of the stochastic approximation EM algorithm (SAEM)...
A likelihood-free version of the stochastic approximation EM algorithm (SAEM)...
 
Slides ensae-2016-9
Slides ensae-2016-9Slides ensae-2016-9
Slides ensae-2016-9
 
Some real life data analysis on stationary and non-stationary Time Series
Some real life data analysis on stationary and non-stationary Time SeriesSome real life data analysis on stationary and non-stationary Time Series
Some real life data analysis on stationary and non-stationary Time Series
 
Ch6 slides
Ch6 slidesCh6 slides
Ch6 slides
 
Master_Thesis_Harihara_Subramanyam_Sreenivasan
Master_Thesis_Harihara_Subramanyam_SreenivasanMaster_Thesis_Harihara_Subramanyam_Sreenivasan
Master_Thesis_Harihara_Subramanyam_Sreenivasan
 
CoopLoc Technical Presentation
CoopLoc Technical PresentationCoopLoc Technical Presentation
CoopLoc Technical Presentation
 
Stratified Monte Carlo and bootstrapping for approximate Bayesian computation
Stratified Monte Carlo and bootstrapping for approximate Bayesian computationStratified Monte Carlo and bootstrapping for approximate Bayesian computation
Stratified Monte Carlo and bootstrapping for approximate Bayesian computation
 

Recently uploaded

OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
Organic farming with special reference to vermiculture
Organic farming with special reference to vermicultureOrganic farming with special reference to vermiculture
Organic farming with special reference to vermicultureTakeleZike1
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
Ai in communication electronicss[1].pptx
Ai in communication electronicss[1].pptxAi in communication electronicss[1].pptx
Ai in communication electronicss[1].pptxsubscribeus100
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsSérgio Sacani
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
trihybrid cross , test cross chi squares
trihybrid cross , test cross chi squarestrihybrid cross , test cross chi squares
trihybrid cross , test cross chi squaresusmanzain586
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptJoemSTuliba
 
Biological classification of plants with detail
Biological classification of plants with detailBiological classification of plants with detail
Biological classification of plants with detailhaiderbaloch3
 
PROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalPROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalMAESTRELLAMesa2
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
Servosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicServosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicAditi Jain
 

Recently uploaded (20)

Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
Organic farming with special reference to vermiculture
Organic farming with special reference to vermicultureOrganic farming with special reference to vermiculture
Organic farming with special reference to vermiculture
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
Ai in communication electronicss[1].pptx
Ai in communication electronicss[1].pptxAi in communication electronicss[1].pptx
Ai in communication electronicss[1].pptx
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive stars
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
trihybrid cross , test cross chi squares
trihybrid cross , test cross chi squarestrihybrid cross , test cross chi squares
trihybrid cross , test cross chi squares
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.ppt
 
Biological classification of plants with detail
Biological classification of plants with detailBiological classification of plants with detail
Biological classification of plants with detail
 
PROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalPROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and Vertical
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
Servosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicServosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by Petrovic
 

My data are incomplete and noisy: Information-reduction statistical methods for knowledge extraction can save your day: tools and opportunities for modelling

  • 1. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY My data are incomplete and noisy! Information-reduction statistical methods for knowledge extraction can save your day: tools and opportunities for modelling Docent/Readership lecture Lund, 7 June 2016 Umberto Picchini Centre for Mathematical Sciences Lund University www.maths.lth.se/matstat/staff/umberto/ 1 / 54
  • 2. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY PREAMBLE: WHAT IS A DOCENT/READERSHIP LECTURE? It’s a lecture in a popular science context. Target are research students within the entire Faculty of Science at Lund University. It should cover my own subject area, but not my own research output. My notation and level of exposition will be subject to the above. 2 / 54
  • 3. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY I will discuss a few methods for parameter inference (aka calibration, inverse problem) in presence of incomplete and noisy data. However I will start mentioning: what I mean with incomplete and noisy observations; why use stochastic modelling; issues with state-of-art inference methods for dynamic models. I will focus on two methods based on information reduction that use summary statistics of data: synthetic likelihoods and approximate Bayesian computation. These are powerful, general and flexible methods but also very easy to introduce to a general audience of researchers. 3 / 54
  • 4. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY Goal is to show how state-of-art exact methods for parameter estimation can spectacularly fail in some scenarios, including: near chaotic systems; noisy (stochastic) dynamical systems; badly initialized optimization or Bayesian MCMC algorithms. But first, what do I mean with “incomplete data”? 4 / 54
  • 5. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY INCOMPLETE OBSERVATIONS There can be many situations where we only have partial information of the system of scenario of interest. We oversimplify and call “incomplete” or “partially observed” those experiments where not all the variables of interest are observable. This could mean that at least one of the following holds: 1. some variables are completely unobserved (measurements for those variables are unavailable) 2. we have a system evolving continuously in time but we only observe it a discrete time points; 3. the variables we observe are perturbed measurements of the actual variables of interest. 5 / 54
  • 6. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY Consider a noise perturbed “signal” Typically values of our measurements do not exclusively represent what we are really interested in measuring. The unperturbed signal is unavailable because measurements are corrupted with noise. 6 / 54
  • 7. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY Therefore we have a system (think about a physical system) having an observable component Y an unobservable/latent signal X noise ε And of course if we model time dynamics we could think of n discrete-time measurements Yti = Xti + εti , i = 1, ..., n Or more in general Yti = f(Xti , εti ) for some arbitrary yet known function f(·). 7 / 54
  • 8. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY AN EXAMPLE: CONCENTRATION OF A DRUG Time course of a certain drug concentration in blood. This is measured at discrete times. * * * * * * * * * * 0 20 40 60 80 100 120 020406080100 time in minutes C12concentration 8 / 54
  • 9. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY We may postulate a deterministic model: dCt dt = −µCt Ct = C0e−µt , µ > 0 With measurements Yti = Cti + εti , with εti ∼ N(0, σ2 ε) * * * * * * * * * * 0 20 40 60 80 100 120 020406080100 time in minutes C12concentration 9 / 54
  • 10. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY We may postulate a deterministic model: dCt dt = −µCt Ct = C0e−µt , µ > 0 With measurements Yti = Cti + εti , with εti ∼ N(0, σ2 ε) * * * * * * * * * * 0 20 40 60 80 100 120 020406080100 time in minutes C12concentration There’s some discrepancy with the fit (residual error). 9 / 54
  • 11. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY In previous slides you might have noticed the introduction of a parameter µ > 0. Even though an expert researcher might have a clue about some reasonable value for µ, in reality this is at best nothing more than a guess. In fact, when we previously wrote that we can assume Yti = f(Xti , εti ) = f(Cti , εti ) most often what we have is Yti = f(Xti , θ, εti ) = f(Cti , µ, εti ) that is a dependence from an unknown (vector) parameter θ. In our example θ ≡ µ. 10 / 54
  • 12. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY My main interest: to study and develop principled methods that return an estimate of θ and its uncertainty analysis. In particular, I am interested in modelling stochastic dynamics. In the previous example Ct was evolving in a deterministic fashion, given a fixed starting value C0 and a fixed value for µ. In next slide I show an alternative, stochastic approach. 11 / 54
  • 13. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY Add systemic (white) noise: dC(t) dt = −µC(t) + “white noise” And again Yti = Cti + εti , with εti ∼ N(0, σ2 ε) * * * * * * * * * * 0 20 40 60 80 100 120 020406080100 time in minutes C12concentration 12 / 54
  • 14. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY Add systemic (white) noise: dC(t) dt = −µC(t) + “white noise” And again Yti = Cti + εti , with εti ∼ N(0, σ2 ε) * * * * * * * * * * 0 20 40 60 80 100 120 020406080100 time in minutes C12concentration Dynamics are stochastic. Residual error can’t be eliminated. It’s always “there”. 12 / 54
  • 15. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY The problem with stochastic models is that: 1. realizations from the model might not really resemble the observed data... 2. this can happen even when simulate using a θ which is close to its true value... 3. ...and even when we are simulating from the “true model” (simulation studies). Although this is understandable, it is also upsetting when our methods are based on producing realizations meant to get close to observations. 13 / 54
  • 16. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY A NEARLY CHAOTIC MODEL Two realizations from the Ricker model (discussed later), without measurement noise. Nt = r · Nt−1 · e−Nt−1+et , et ∼ N(0, σ2 ) Small changes in a parameter cause major departures from data. 051015 nt Time 5 10 15 20 25 −260−220−180−140 Log−likelihood log(r 2.5 3.0 3.5 Figure: One path generated with log r = 3.8 (black) and one generated with log r = 3.799 (red) when σ = 0. 14 / 54
  • 17. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY Now, a modeller (say a statistician) does not necessarily have the expertise nor the time to study the qualitative behaviour of solutions of a series of candidate models. most often the modeller wants to test a range of possible models against available data. Simulations from postulated models should produce trajectories approximately resembling data. Parameter estimation should be performed to identify “best fitting” parameters. 15 / 54
  • 18. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY Notice for many years even simple stochastic models have been terribly difficult to calibrate against data. It is usually impossible to write the likelihood function p(y1:T|θ) in closed form. It is also impossible to write filtering densities for the states p(x1:T|y1:T; θ). Notation: I am going to use lower case letters both for random variables and for their realized values. I use a1:T to denote a sequence (a1, a2, ..., aT) where T is the time horizon. I assume integer time indeces to ease notation. 16 / 54
  • 19. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY SSM: STATE–SPACE MODEL Graphically: "! # "! # "! # "! # "! # "! # Yt−1 Yt Yt+1 Xt−1 Xt Xt+1 - - - - 6 6 6 ... ... (Markov chain) (Observations) (Yt|Xt = xt) ∼ p(yt|xt) (Observation density) (Xt+1|Xt = xt) ∼ p(xt+1|xt) (Transition density) X0 ∼ π(x0) (Initial distribution) Example: Yt a subject’s measured glycemia at time t. Xt the actual glycemia at t. 17 / 54
  • 20. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY THE LIKELIHOOD FUNCTION 1/2 It turns out that, even for such a simple construct, it is difficult to write the likelihood function. In a SSM data are not independent, they are only conditionally independent→ complication!: p(y1:T|θ) = p(y1|θ) T t=2 p(yt|y1:t−1, θ) =? We don’t have a closed for expression for the product above because we do not know how to calculate p(yt|y1:t−1, θ). Let’s see why. 18 / 54
  • 21. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY THE LIKELIHOOD FUNCTION 2/2 In a SSM the observed process is assumed to depend on the latent Markov process {Xt}: we can write p(y1:T|θ) = p(y1:T, x0:T|θ)dx0:T = p(y1:T|x0:T, θ) use cond. indep. × p(x0:T|θ) use Markovianity dx0:T = T t=1 p(yt|xt, θ) × p(x0|θ) T t=1 p(xt|xt−1, θ) dx0:T Problems The expression above is a (T + 1)-dimensional integral For most (nontrivial) models, transition densities p(xt|xt−1) are unknown 19 / 54
  • 22. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY However today we have quite a number of reliable Monte Carlo solutions to the integration problem. I am not going to introduce state-of-art methods for SSM but these are essentially based on particle filters (or sequential Monte Carlo) methods. particle marginal methods and particle MCMC (Andrieu and Roberts 2009; Andrieu et al. 2010) for Bayesian inference. iterated filtering (Ionides et al. 2011, 2015) for maximum likelihood inference. As shown in Andrieu and Roberts 2009: 1. obtain an approximation of the likelihood ˆp(y1:T|θ) using particle filters; 2. plug ˆp(y1:T|θ) into a MCMC algorithm for Bayesian inference on θ. 3. then the MCMC returns samples exactly from π(θ|y1:T). 20 / 54
  • 23. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY If you are interested in a quick review of particle methods for parameters inference (not state inference) check my slides on SlideShare http://goo.gl/4aZxL1 As I said in this presentation I focus on synthetic likelihoods and approximate Bayesian computation. I now consider a simple example where the celebrated particle marginal methodology of Andrieu and Roberts1, which is supposed to return exact Bayesian inference, does not work. 1 C. Andrieu and G. Roberts 2009, Annals of Statistics 37(2). 21 / 54
  • 24. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY THE STOCHASTIC RICKER MODEL yt ∼ Pois(φNt) Nt = r · Nt−1 · e−Nt−1+et , et ∼ N(0, σ2) It can be used to describe the evolution in time of a population of size Nt. r is the intrinsic growth rate of the population; φ is a scale parameter The process noise et can be interpreted as environmental noise. Assume the et as iid. This is a hidden Markov model, as the dynamics of {Nt} are Markovian and we assume measurements y1:T to be conditionally independent given {Nt}. 22 / 54
  • 25. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY Here is a realization of length T = 50 from the Ricker model. 050150250 y 0510152025 N −0.40.00.20.4 0 10 20 30 40 50 e time myobservedricker Ground-truth parameters are log r = 3.8, φ = 10 and σ = 0.3 (and imposing N0 = 7 and e0 = 0). Same settings as in S. Wood 20102 2 S. Wood 2010, Nature 466, pp. 1102–1104. 23 / 54
  • 26. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY SOFTWARE CODE The simulation results in the next three slides can be reproduced using the code at https://github.com/umbertopicchini/pomp-ricker 24 / 54
  • 27. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY RUNNING pomp ricker − pmcmc.R FOR EXACT BAYESIAN INFERENCE Suppose we are only interested in estimating the parameter r from data, while remaining quantities are fixed to their ground-truth values. Particle MCMC works well here. So exact inference is possible. Here is an MCMC chain of length 2000 with likelihood estimated via bootstrap filter (1000 particles used). We let r start at r = 12.2. True value is r = 44.7, estimated posterior mean is 44.6 [38.5,51.7] 0 500 1000 1500 2000 20304050 Iterations 25 / 54
  • 28. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY So all good in the considered example. We might reasonably imagine that if our model is “less stochastic” (a smaller σ) it should be even easier to conduct inference. Recall that Nt = r · Nt−1 · e−Nt−1+et , et ∼ N(0, σ2 ) Modellers don’t know a-priori the values of underlying parameters. Suppose we now use data generated with σ = 0.01 (instead of σ = 0.3) 26 / 54
  • 29. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY Here is what happens with the same conditions as before, except for using data generated with σ = 0.01. −1200−1050−900 loglik −7−6−5−4−3 log.prior 12141618 nfail 131517 r 0 500 1000 1500 2000 PMCMC iteration PMCMC convergence diagnostics The chain (lower panel) is stuck at the wrong mode! r ≈ 18 hence log r = 2.9. It gets stuck even if we use more particles. 27 / 54
  • 30. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY As beautifully illustrated in Fasiolo et al.3 , the very interesting reason why the estimation fails for nearly deterministic dynamics is the following: 2.0 2.5 3.0 3.5 4.0 4.5 −120−110−100−90−80 σ = 0.3 2.0 2.5 3.0 3.5 4.0 4.5 −140−120−100−80 σ = 0.1 2.0 2.5 3.0 3.5 4.0 4.5 −140−120−100−80 σ = 0.05 2.0 2.5 3.0 3.5 4.0 4.5 −180−140−100 σ = 0.01 Log−likelihood log(r) Figure: Black: the true likelihood for log r. Red: the particle filter approximation. Figure from Fasiolo et al. 2016. 3 M. Fasiolo, N. Pya and S. Wood 2016. Statistical Science 31(1),pp. 96-118 28 / 54
  • 31. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY However the filter being unable to approximate the likelihood is only consequence of something more subtle: look at the exact loglikelihood for non-stochastic dynamics, i.e. here σ = 0. 2.5 3.0 3.5 4.0 −15−10−5 Ricker log(r) 271217 nt −5 −4 −35−25−15−50 Pen lo −50 Varley 0−5 Mayna Log−likelihood(103 ) Figure: Black: the true loglikelihood when σ = 0. Grey: bifurcation diagram. Figure from Fasiolo et al. 2016. Go here if you are unfamiliar with bifurcations. 29 / 54
  • 32. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY The instability of some models for a small quantity of noise (σ) produces major differences in simulated trajectories for small perturbations in the parameters. 051015 nt Time 5 10 15 20 25 −260−220−180−140 Log−likelihood 2.5 3.0 Figure: One path generated with log r = 3.8 (black) and one generated with log r = 3.799 (red) when σ = 0. 30 / 54
  • 33. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY A CHANGE OF PARADIGM from S. Wood, Nature 2010: “Naive methods of statistical inference try to make the model reproduce the exact course of the observed data in a way that the real system itself would not do if repeated.” “What is important is to identify a set of statistics that is sensitive to the scientifically important and repeatable features of the data, but insensitive to replicate-specific details of phase.” In other words, with complex, stochastic and/or chaotic model we should try to match features of the data, not the path of the data themselves. 31 / 54
  • 34. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY SYNTHETIC LIKELIHOODS y: observed data, from static or dynamic models s(y): (vector of) summary statistics of data, e.g. mean, autocorrelations, marginal quantiles etc. assume s(y) ∼ N(µθ, Σθ) an assumption justifiable via second order Taylor expansion (same as in Laplace approximations). µθ and Σθ unknown: estimate them via simulations. 32 / 54
  • 35. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARYnature09319-f2.2.jpg (JPEG Image, 946 × 867 pixels) - Scaled (84%) http://www.nature.com/nature/journal/v466/n7310/images/nature09319... Figure: Schematic representation of the synthetic likelihoods procedure. 33 / 54
  • 36. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY For fixed θ we simulate Nr artificial datasets y∗ 1 , ..., y∗ Nr and compute corresponding (possibly vector valued) summaries s∗ 1 , ..., s∗ Nr . compute ˆµθ = 1 Nr Nr i=1 s∗ i , ˆΣθ = 1 Nr − 1 Nr i=1 (s∗ i − ˆµθ)(s∗ i − ˆµθ) compute the statistics sobs for the observed data y. evaluate a multivariate Gaussian likelihood at sobs liks(θ) := exp(ls(θ)) = N(sobs; ˆµθ, ˆΣθ) ∝ 1 |ˆΣθ| e−(sobs−ˆµθ)ˆΣ−1 θ (sobs−ˆµθ)/2 This likelihood can be maximized for a varying θ or be plugged within an MCMC algorithm for Bayesian inference ˆπ(θ|sobs) ∝ liks(θ)π(θ). 34 / 54
  • 37. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY For the Ricker model Wood (2010) uses 13 summaries, including: the sample mean of observations ¯y; number of zeros in the dataset; autocovariances up to lag 5; estimates for β0 and β1 from fitting E(y0.3 t ) = β0y0.3 t−1 + β1y0.6 t−1 and a few more summaries...(not important to be mentioned here, but you got the idea). So we have sobs = (¯y, #zeros, autocovars, β0, β1, ...). 35 / 54
  • 38. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY RUNNING pomp ricker − synlik.R We consider the dataset generated with the settings where particle methods have failed (same starting values etc.), i.e. σ = 0.01. Here follow synthetic likelihood estimates: r σ φ starting value 12.18 1 20 true value 44.7 0.01 10 SL estimate 45.25 0.23 10.16 We maximised liks(θ) using Nelder–Mead with Nr = 5000 simulations. Standard errors could be found via profile likelihood (not reported). 36 / 54
  • 39. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY BAYESIAN SYNTHETIC LIKELIHOODS Just plug liks(θ) in a Metropolis-Hastings MCMC procedure to sample from ˆπ(θ|sobs) ∝ liks(θ)π(θ). Figure is from Price et al. (2016). They consider the “nicely behaved” data obtained with σ = 0.3. mated log SL at θ = (3.8, 10, 0.3) . A ‘-’ indicates that a result is not available for uBSL as the value of n is too small. n acc. rate (%) ESS log r ESS σe ESS φ sd log pA,n(y|θ = (3.8, 10, 0.3) ) 20 2.4/1.8 11/8 13/10 18/13 large 30 8.2/7.7 23/22 27/25 38/31 4.1 40 12.5/12.6 29/28 34/32 44/42 2.5 50 15.9/15.7 30/31 35/35 45/45 1.9 80 21.8/21.4 29/29 34/32 47/47 1.2 100 24.1/23.7 25/27 30/30 41/39 1.0 250 29.6/29.5 13/14 15/17 20/23 0.6 3 3.5 4 4.5 0 1 2 n=30 n=100 n=250 (a) log r 8 10 12 0 0.2 0.4 0.6 0.8 n=30 n=100 n=250 (b) φ -0.2 0 0.2 0.4 0.6 0 1 2 3 4 n=30 n=100 n=250 (c) σe Figure 6: Posterior estimates for log r, σe and φ of the Ricker model when using BSL with various values of n.Figure: Posteriors for several values of Nr. True parameter values are: log r = 3.8, φ = 10, σ = 0.3 37 / 54
  • 40. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY APPROXIMATE BAYESIAN COMPUTATION (ABC) Another methodology, also using information-reduction via summaries, is Approximate Bayesian Computation (ABC). ABC has received enormous interest. A list of review papers is provided in the references section. You can also check my own intro to ABC. 38 / 54
  • 41. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY LIKELIHOOD FREE REJECTION SAMPLING Recall y is data. We wish to sample from π(θ|y) ∝ p(y|θ)π(θ). 1. simulate from the prior θ∗ ∼ π(θ) 2. plug θ∗ in your model and simulate artificial data y∗ [this is the same as writing y∗ ∼ p(y|θ∗)] 3. if y∗ = y store θ∗. Go to step 1 and repeat. The above is a likelihood free algorithm: it does not require knowledge of the expression of the likelihood p(y|θ). Each accepted θ∗ is such that θ∗ ∼ π(θ|y) exactly. 39 / 54
  • 42. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY ABC REJECTION (PRITCHARD ET AL.4 ) Same as before, but comparing s(y) with s(y∗) for “appropriate” summaries s(·) and a small tolerance > 0. 1. simulate from the prior θ∗ ∼ π(θ) 2. simulate a y∗ ∼ p(y|θ∗), compute s(y∗) 3. if s(y∗) − s(y) < store θ∗. Go to 1 and repeat. Samples are from π (θ|s(y)) with π (θ|s(y)) ∝ Y p(y∗ |θ∗ )π(θ∗ )IA ,y (y∗ )dy∗ A ,y(y∗) = {y∗ ∈ Y; s(y∗) − s(y) < }. 4 Pritchard et al. 1999, Molecular Biology and Evolution, 16:1791-1798. 40 / 54
  • 43. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY EXAMPLE: g-AND-k DISTRIBUTIONS Mixtures of Gaussians are often used to describe complex, nonstandard distributions. a mixture of two-Gaussians requires specifying 5 parameters. apparently, it is sometimes challenging to estimate such parameters due to unidentifiability (Marin and Robert5). simulating (sampling) from mixtures is an optimization problem (can be computer intensive). 5 Marin and Robert, Bayesian Core, Springer 2007. 41 / 54
  • 44. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY g-and-k distributions only require 4 parameters and it’s fast to simulate from them. g-and-k distributions have no closed-form likelihood, but we can simulate their quantiles. F−1 (x; A, B, g, k) = A + B 1 + 0.8 · 1 − exp(−g · r(x)) 1 + exp(−g · r(x)) (1 + r2 (x))k r(x) with r(x) ∼ N(0, 1) the xth quantile from a standard Gaussian. data 0 5 10 15 20 25 30 0 500 1000 1500 2000 2500 3000 3500 4000 4500 n = 10, 000 samples generated with A = 3, B = 1, g = 2, k = 0.5. 42 / 54
  • 45. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY I wrote an MCMC algorithm using ABC with summaries the 20th, 40th, 60th, 80th empirical quantiles and the skewness value s(y) = (q20, q40, q60, q80, skew(y)). Again, we simulate datasets y∗, compute s(y∗) and compare them with s(y) at each MCMC iteration. I let the threshold decrease progressively during the algorithm. 43 / 54
  • 46. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY Red lines are true parameter values. A ×104 0 1 2 3 4 5 6 2.5 3 3.5 4 4.5 5 B ×104 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0 1 2 3 4 5 6 g ×104 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0 2 4 6 8 10 k ×104 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0 0.5 1 1.5 2 It is evident that I reduced at iteration 10,000 and then at iteration 20,000. 44 / 54
  • 47. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY SUMMARY We have discussed plug-and-play methods that only require the ability to simulate from the model. How to choose summaries s(·) is a delicate and open problem. However there exist much literature available. Information-reduction methods are very general and in many cases are the only possible option for inference. ABC and SL can be used for a short pilot run, to obtain a better starting value for θ to be used in more accurate methods. For example Owen et al.6 use ABC to obtain a starting value for particle MCMC. 6 Owen et al. 2015. Likelihood free inference for Markov processes: a comparison. Statistical applications in genetics and molecular biology, 14(2), pp.189-209. 45 / 54
  • 48. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY Our suggestion, when dealing with a new model or a new dataset, is not to settle on a single methodology, but instead try to perform a battery of algorithms. This require method developers to share their software codes to ease methods reproducibility. This is still not common practice. 46 / 54
  • 49. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY If you wish to review some of the presented concepts, slides for this presentation are available on SlideShare at http://goo.gl/gg0Cqq. R code to reproduce the Ricker model results is at https://github.com/umbertopicchini/pomp-ricker/ MATLAB code for ABC and g-and-k distributions is at https://github.com/umbertopicchini/abc_g-and-k/ Feel free to write me or drop by7 for a chat. 7 umberto@maths.lth.se, office MH:321, Matematikcentrum 47 / 54
  • 50. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY If you wish to review some of the presented concepts, slides for this presentation are available on SlideShare at http://goo.gl/gg0Cqq. R code to reproduce the Ricker model results is at https://github.com/umbertopicchini/pomp-ricker/ MATLAB code for ABC and g-and-k distributions is at https://github.com/umbertopicchini/abc_g-and-k/ Feel free to write me or drop by7 for a chat. Thank You 7 umberto@maths.lth.se, office MH:321, Matematikcentrum 47 / 54
  • 51. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY REFERENCES ON SYNTHETIC LIKELIHOODS 1. Fasiolo, M., Pya, N. and Wood, S.N., 2016. A Comparison of Inferential Methods for Highly Nonlinear State Space Models in Ecology and Epidemiology. Statistical Science, 31(1), pp.96-118. 2. Meeds, E. and Welling, M., 2014. GPS-ABC: Gaussian process surrogate approximate Bayesian computation. arXiv:1401.2838. 3. Price, L.F., Drovandi, C.C., Lee, A. and Nott, D.J., 2016. Bayesian synthetic likelihood. http://eprints.qut.edu.au/92795/ 4. Wood, S.N., 2010. Statistical inference for noisy nonlinear ecological dynamic systems. Nature, 466(7310), pp.1102-1104. (4) is where SL got first introduced. Our talk is largely based on (1). (2) merges SL with ABC. (3) studies the performance of SL in a (pseudo-marginal) Bayesian setting. 48 / 54
  • 52. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY SOFTWARE FOR SYNTHETIC LIKELIHOODS R package pomp: http://kingaa.github.io/pomp R package synlik: https://mfasiolo.github.io/synlik_release/ 49 / 54
  • 53. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY ABC SOFTWARE EasyABC, R package. Research article. abc, R package. Research article abctools, R package. Research article. Focusses on tuning. Lists with more options here and here . examples with implemented model simulators (useful to incorporate in your programs). 50 / 54
  • 54. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY REVIEWS ON ABC Fairly extensive but accessible reviews: 1. Sisson and Fan 2010 2. (with applications in ecology) Beaumont 2010 3. Marin et al. 2010 Simpler introductions: 1. Sunn˚aker et al. 2013 2. (with applications in ecology) Hartig et al. 2013 Review specific for dynamical models: 1. Jasra 2015 51 / 54
  • 55. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY NON-REVIEWS ON ABC, SPECIFIC FOR DYNAMICAL MODELS 1. SMC for Parameter estimation and model comparison: Toni et al. 2009 2. Markov models: White et al. 2015 3. SMC: Sisson et al. 2007 4. SMC: Dean et al. 2014 5. SMC: Jasra et al. 2010 6. MCMC: Picchini 2013 52 / 54
  • 56. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY MORE SPECIALISTIC ABC RESOURCES selection of summary statistics: Fearnhead and Prangle 2012. review on summary statistics selection: Blum et al. 2013 expectation-propagation ABC: Barthelme and Chopin 2012 Gaussian Processes ABC: Meeds and Welling 2014 ABC model choice: Pudlo et al 2015 53 / 54
  • 57. INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY BLOG POSTS AND SLIDES ON ABC 1. Christian P. Robert often blogs about ABC (and beyond: it’s a fantastic blog!) 2. an intro to ABC by Darren J. Wilkinson 3. Two posts by Rasmus B˚a˚ath here and here 4. my own intro to ABC 5. Tons of slides at SlideShare. 54 / 54
  • 59. VERIFY SUMMARIES DISTRIBUTION The crucial assumption used in synthetic likelihoods is that s ∼ N(µθ, Σθ). We could check that (appendix in Wood 2010 for details): (s∗ − ˆµθ) ˆΣ−1 θ (s∗ − ˆµθ) ∼ χ2 d with d = dim(s). That is for Nr simulated summaries, plot the ordered values of (s∗ j − ˆµθ) ˆΣ−1 θ (s∗ j − ˆµθ) versus the quantiles of χ2 d, j = 1, ..., Nr. For graphical purposes this can be done on log scale. 56 / 54
  • 60. we could separately check that each dimension of the simulated summaries is approximately normal (basically produce d separate qqplots). same as the above but for the observed summaries. Next slide produces the three types of quantile plots for the Ricker model when log(r) = 3.8, φ = 10, σ = 0.3. Recall for the Ricker model we have d = 13 summaries. 57 / 54
  • 61. 1.0 1.5 2.0 2.5 3.0 3.5 123456 log theoretical quantiles logobservedquantiles −3 −2 −1 0 1 2 3 −4024 Marginal Q−Q Plot N(0,1) quantiles marginalquantiles −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 −1.50.01.0 Normal Q−Q Plot Theoretical Quantiles SampleQuantiles Figure: Upper left: log χ2 d quantiles. Upper right N(0, 1) quantiles separately for the 13 summaries. Lower left: N(0, 1) quantiles for the observed summaries. 58 / 54
  • 62. JUSTIFICATION OF GAUSSIANITY Assuming Gaussianity for summaries s(·) can be justified from a standard Taylor expansion. Say that fθ(s) is the true (unknown) joint density of s. Expand fθ(s) around its mode µθ: log fθ(s) ≈ log fθ(µθ) + 1 2 (s − µθ) ∂2 log fθ ∂s∂s (s − µθ) hence fθ(s) ≈ const × exp − 1 2 (s − µθ) − ∂2 log fθ ∂s∂s (s − µθ) s ∼ N µθ, − ∂2 log fθ ∂s∂s −1 , approximately when s ≈ µθ 59 / 54
  • 63. ASYMPTOTIC PROPERTIES FOR SYNTHETIC LIKELIHOODS As the number of simulated statistics Nr → ∞ the maximizer ˆθ of liks(θ) is a consistent estimator. ˆθ is an unbiased estimator. ˆθ might not be in general Gaussian. It will be Gaussian if Σθ depends weakly on θ or when d = dim(s) is large. 60 / 54