A Bayes Application and made up examples

Keywords: Bayes Theorem, Bayesian application, Bayes interpretation.

Yiwen Wu
2022-08-22

This post is my own practice and preparation of using/applying Bayesian regression. Material and example codes are from John Myles White.

The first step is to create a set of data that fulfills the following requirements: Gaussian distributed with equal variance and a meang which is a linear function of the predictors.

    b_length    b_mass
1 -2.6564554  5.712696
2 -2.4404669  9.119517
3 -1.7813084 17.362433
4 -1.7631631  9.323737
5 -1.3888607 18.636169
6 -0.6399949 15.015008

Now, before analyzing, convert to the list format. The list will have three elements, body mass, body length, and a number that indicates the sample size.

Write JAGS model. Create the model with JAGS code, there are differences between R and JAGS code: JAGS use dnorm for distribution’s precision (1/variance). Tilde (~) signifies that object on the left is random variable distributed according to the distribution on the right. Alpha and Beta are intercepts of linear relationships.

The initial parameter values for the MCMC. The parameters are chosen whose posterior distributions will be reported and runs the model in JAGS.

Three MCMC chanins with 12000 iterations each, and discards the first 2000 values, because these first 2000 values highly depend on initial values.

n.thin means only every 10th iteration is saved and the rest discarded.

Model output. The 2.5% and 97.5% quantile range are the 95% credible interval for each parameter. For the 95% credible interval, there is a 95% probability that the true parameter lies within this range.

The last two columns are convergence diagnostics. n.eff is a number smaller than or equal to the number of samples saved from the chains (3*(12000-2000)/10). The higher autocorrelation in the saved samples indicate a smaller effective sample size. Rhat indicates how well the three Markov chains are mixed. Ideal Rhat is close to 1. Large Rhat values mean that chains are not mixed properly and posterior estimates cannot be trusted.

fit_lm1
Inference for Bugs model at "/var/folders/xj/c74jv4d928z1rmtv4_vchj740000gn/T//RtmpKVxs4V/model158f583cb405.txt", fit using jags,
 3 chains, each with 12000 iterations (first 2000 discarded), n.thin = 10
 n.sims = 3000 iterations saved
      mu.vect sd.vect   2.5%    25%    50%    75%  97.5%  Rhat n.eff
alpha  29.079   1.038 27.063 28.408 29.101 29.761 31.086 1.002  1800
beta    9.736   0.845  8.044  9.199  9.744 10.294 11.342 1.001  3000
sigma   5.586   0.812  4.275  5.015  5.485  6.074  7.371 1.001  3000

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

The traceplot is used to visualize how well the chains have mixed. In this case, chains are nicely mixed

Other ways to demonstrate posterior distributions of parameters graphically.

Visualize in MCMC format

If we change parameters and plot the mcmc visuals, we want to see the differences when altering.

Create a Catogorical variable (sex) and adding to the jags model

$b_mass
 [1] 19.309795 21.757625  8.176761 18.101959  6.869465 30.544474
 [7] 26.567582 16.759391 28.439164 15.914424 32.568771 21.647253
[13] 29.606252 28.577468 35.155978 38.940458 23.314439 29.641170
[19] 33.889166 27.456285 22.908149 27.540421 26.500509 34.466663
[25] 37.523150 35.058236 34.596614 30.236815 41.302396 30.297417
[31] 34.570252 35.918447 33.942750 35.541939 39.130491 39.304575
[37] 39.992016 34.129303 34.527288 32.001647 38.439367 39.583496
[43] 57.643543 36.390552 44.395866 36.972887 38.228475 48.889866
[49] 45.201041 50.997405

$b_length
 [1] -2.65645542 -2.44046693 -2.41420765 -1.78130843 -1.76316309
 [6] -1.71700868 -1.38886070 -1.36828104 -0.85090759 -0.81139318
[11] -0.78445901 -0.72670483 -0.63999488 -0.60892638 -0.56469817
[16] -0.43144620 -0.43046913 -0.36105730 -0.30663859 -0.28425292
[21] -0.27878877 -0.25726938 -0.17191736 -0.13332134 -0.10612452
[26] -0.09465904 -0.06271410  0.03612261  0.20599860  0.36312841
[31]  0.40426832  0.43281803  0.45545012  0.46009735  0.50495512
[36]  0.63286260  0.63595040  0.65564788  0.70483734  0.75816324
[41]  1.03510352  1.21467470  1.30486965  1.32011335  1.37095845
[46]  1.44410126  1.51152200  1.89519346  2.01842371  2.28664539

$sex
 [1] 1 1 0 1 0 1 1 0 1 0 1 0 1 0 1 1 0 0 0 1 0 1 1 1 1 0 0 1 0 0 0 1 0
[34] 0 1 1 1 0 0 1 0 0 1 0 0 0 1 1 0 1

$N
[1] 50

Test of increasing the range of data (by increasing the standard deviation 10 times)

lm2_jags <- function(){
  # Likelihood:
  for (i in 1:N){
    b_mass[i] ~ dnorm(mu[i], tau) # tau is precision (1 / variance)
    mu[i] <- alpha[1] + sex[i] * alpha[2] + 
      (beta[1] + beta[2] * sex[i]) * b_length[i]
  }
  # Priors:
  for (i in 1:2){
    alpha[i] ~ dnorm(0, 0.01) 
    beta[i] ~ dnorm(0, 0.01)  
  }
  sigma ~ dunif(0, 1000) 
  tau <- 1 / (sigma * sigma)
}

# ---------
init_values <- function(){
  list(alpha = rnorm(2), beta = rnorm(2), sigma = runif(1))
}

params <- c("alpha", "beta", "sigma")

fit_lm2 <- jags(data = jagsdata_s2, 
                inits = init_values, 
                parameters.to.save = params, 
                model.file = lm2_jags, n.chains = 3, 
                n.iter = 12000, n.burnin = 2000, n.thin = 10, DIC = F)
Compiling model graph
   Resolving undeclared variables
   Allocating nodes
Graph information:
   Observed stochastic nodes: 50
   Unobserved stochastic nodes: 5
   Total graph size: 268

Initializing model
#--------
lm2_mcmc <- as.mcmc(fit_lm2)
plot(lm2_mcmc)

Test of JAGS model with wrong mean but low variance.

$b_mass
 [1] 42.27916 44.07903 20.24780 38.44588 15.68528 50.69550 45.73416
 [8] 23.60080 45.99189 19.97139 49.92215 25.28078 46.52624 31.62210
[15] 51.85007 55.23480 25.46678 31.44646 35.42236 43.30904 24.30209
[22] 43.31223 42.01626 49.86663 52.84152 35.53153 34.91018 45.12845
[29] 40.27240 28.48178 32.54891 49.61999 31.66550 33.24145 52.61563
[36] 52.40599 53.08417 30.85106 31.00310 44.72716 33.26385 33.51012
[43] 68.72893 29.78999 37.54107 29.75238 48.69391 58.20429 35.10892
[50] 59.13747

$b_length
 [1] -2.65645542 -2.44046693 -2.41420765 -1.78130843 -1.76316309
 [6] -1.71700868 -1.38886070 -1.36828104 -0.85090759 -0.81139318
[11] -0.78445901 -0.72670483 -0.63999488 -0.60892638 -0.56469817
[16] -0.43144620 -0.43046913 -0.36105730 -0.30663859 -0.28425292
[21] -0.27878877 -0.25726938 -0.17191736 -0.13332134 -0.10612452
[26] -0.09465904 -0.06271410  0.03612261  0.20599860  0.36312841
[31]  0.40426832  0.43281803  0.45545012  0.46009735  0.50495512
[36]  0.63286260  0.63595040  0.65564788  0.70483734  0.75816324
[41]  1.03510352  1.21467470  1.30486965  1.32011335  1.37095845
[46]  1.44410126  1.51152200  1.89519346  2.01842371  2.28664539

$sex
 [1] 1 1 0 1 0 1 1 0 1 0 1 0 1 0 1 1 0 0 0 1 0 1 1 1 1 0 0 1 0 0 0 1 0
[34] 0 1 1 1 0 0 1 0 0 1 0 0 0 1 1 0 1

$N
[1] 50
Compiling model graph
   Resolving undeclared variables
   Allocating nodes
Graph information:
   Observed stochastic nodes: 50
   Unobserved stochastic nodes: 5
   Total graph size: 268

Initializing model

Test of JAGS model with wrong mean but large variance.

The density of sigma shifted significanly.

$b_mass
 [1]  53.8990908  59.4426007  29.5231515  20.7303588  -6.3103379
 [6]  80.9496405  50.8925931  25.3696005  43.5739562  -3.9151883
[11]  62.1620864  20.9379797  42.8711028  50.2890263  68.2855351
[16]  83.0771246  15.9433063  44.4534275  63.2445684  21.0932663
[21]   7.0862415  20.6774554  12.8319806  51.4662785  65.9056104
[26]  59.5508392  55.8052067  25.0642743  77.2420405  15.1463068
[31]  34.6591869  41.1748751  29.2184963  37.0053126  54.9988444
[36]  51.9041366  55.2456198  21.1423585  20.9187584  11.5051759
[41]  25.6171744  23.2571170 122.7667536   2.5476609  40.2861977
[46]  -0.1201204  19.2851945  60.6983335  15.1761402  59.1010162

$b_length
 [1] -2.65645542 -2.44046693 -2.41420765 -1.78130843 -1.76316309
 [6] -1.71700868 -1.38886070 -1.36828104 -0.85090759 -0.81139318
[11] -0.78445901 -0.72670483 -0.63999488 -0.60892638 -0.56469817
[16] -0.43144620 -0.43046913 -0.36105730 -0.30663859 -0.28425292
[21] -0.27878877 -0.25726938 -0.17191736 -0.13332134 -0.10612452
[26] -0.09465904 -0.06271410  0.03612261  0.20599860  0.36312841
[31]  0.40426832  0.43281803  0.45545012  0.46009735  0.50495512
[36]  0.63286260  0.63595040  0.65564788  0.70483734  0.75816324
[41]  1.03510352  1.21467470  1.30486965  1.32011335  1.37095845
[46]  1.44410126  1.51152200  1.89519346  2.01842371  2.28664539

$sex
 [1] 1 1 0 1 0 1 1 0 1 0 1 0 1 0 1 1 0 0 0 1 0 1 1 1 1 0 0 1 0 0 0 1 0
[34] 0 1 1 1 0 0 1 0 0 1 0 0 0 1 1 0 1

$N
[1] 50
Compiling model graph
   Resolving undeclared variables
   Allocating nodes
Graph information:
   Observed stochastic nodes: 50
   Unobserved stochastic nodes: 5
   Total graph size: 268

Initializing model