This site reads like an incomplete book or essay. It serves as a way to organize my thoughts and collect fragments of information. The project originally started around the EcoPostView R package, but has since expanded into several loosely connected chapters that explore broader ideas.
On the left, you will find these chapters in the drop down menu. They begin with an explorations of the EcoPostView package, move through topics or reasoning, logic and belief some topics in and end with a chapter on statistics (focused analytically and empirical Bayes). The order is mixed: where a logical structure would start with abstract foundations and move toward application, here it begins with practice. Many readers will find it easier to start with the R functions and skip the more abstract chapters. That is completely fine — the site is meant to support that kind of entry point.
The reversed order reflects something about how we come to understand the world. We often begin with practice, then build frameworks around our experiences. But if we trace our tools and methods back far enough, we find ourselves dealing with metaphysical questions — about language, logic, and what we believe about reality. Reasoning, in my opinion, is not about discovering truth directly, but about trying to contain or express it through propositions. The point is not to assume these propositions are true, but to see what follows if we treat them as if they were.
Ideally, one might begin with reasoning and logic, then move through statistical ideas, and finally arrive at practical application. But going in reverse — from code and data to abstract beliefs — it might even be confronting. These themes can feel more existential than scientific, but the boundary between those categories is itself unclear to me (Bárdos and Tuboly 2025; Heidegger 1929; Laudan 1983; Popper 1968).
This site is still a work in progress — a kind of evolving rather than a polished presentation.
Ecological data is scattered throughout the literature in varying formats. It is often case-specific and not representative of the full range of ecological conditions encountered in the real world. As a result, a large amount of noise is introduced, limiting the generalizability of results. Moreover, there is no coherent framework to re-use, generalize, and integrate large amount of ecological data into understandable and predictive models. This lack of structure hampers ecological understanding and leads to a loss of valuable information.
As ecologists, we are not only interested in isolated, case-specific explanations or post-hoc rationalizations. We aim to build cumulative knowledge and apply it across diverse contexts. Working with logic, metadata, and stochastic processes can help solidify our understanding of ecological relationships. These tools enable us to make probability based statements and generate predictions. This R package is designed to assist with exactly that. Its goal is to facilitate the integration of multiple sources of information, combining their strengths to generate broader insights.
The core function of the package utilizes outputs from multiple fitted Linear and Generalized Linear Models ((G)LMs), leveraging Bayesian methods to generalize their effects into a single, meta-analytic (G)LM. This synthesized model can then be visualized (see Fig. 1 below) and used to make predictions on new data. The ultimate aim is to provide a robust, generalized understanding of ecological relationships by drawing from all available sources of information.
Figure 1: Expected evenness of aquatic invertebrates as a function of conductivity and fine sediment fraction within a river reach. The left panel shows the relationship on the response scale, while the right panel presents it on the log scale for improved visualization.
This package offers tools for ecological statistical modeling, with a strong emphasis on (subjective) Bayesian methods and visualisation. While a basic understanding of R is sufficient to use the core functions, a more in-depth theoretical background is provided in chapter 3 for those interested in exploring the underlying concepts. Please note that this is an actively developed R package, and improvements or updates may be introduced over time.
To start using this R-package, both JAGS and devtools must be installed. JAGS can be installed from https://sourceforge.net/projects/mcmc-jags/ and devtools can be installed in R via CRAN. The most recent version of the EcoPostView it can be installed from GitHub. Of course, any problems, questions or possible improvements can be directed to me.
## Loading required package: usethis
The meta function is a random-effect model and the structure is given as \[\{\beta_{i}, ..., \beta_{n}\}= \beta_{\text{pooled}} + u_i\] It has the option of placing a single or multiple random effect as a vector or matrix using the argument ‘random’ the structure then becomes then \[\{\beta_{i}, ..., \beta_{n}\} = \beta_{\text{pooled}} + u_i +r_i\] Similar, it has the option of placing a single or multiple moderators as a vector or matrix using the argument ‘moderator’ the structure then becomes then \[\{\beta_{i}, ..., \beta_{n}\} = \beta_{\text{pooled}} + u_i +m_i\]
for a single random effect. It can adjust for the relation between the \(se\) and model parameters using the the squared standard error \(se^2\) often refered to as Precision-Effect Estimate with Standard Errors or short PEESE (method=1, Stanley and Doucouliagos, 2014). The structure is then \[\{\beta_{i}, ..., \beta_{n}\} = \beta_{\text{pooled}} + u_i + \alpha_{i} \cdot se^2\] or inverse of the sample size \(1/n\) (method=2, the latter option is performed below) with the structure \[\{\beta_{i}, ..., \beta_{n}\} = \beta_{\text{pooled}} + u_i + \alpha_{i} \cdot \left(\frac{1}{n}\right)\] Of course if bias is considered neglect non can be performed (method=0). I still would like to include a third 4th option to utilize Robust Bayesian Model Averaging (RoBMA: Maier et al. 2023). But this sometimes adjust extremely when including \(se^2\) and therefore I left this option open for now.
Meta-analysis is often performed using standardized effect sizes (SES). While I do not endorse this practice, I believe it offers limited benefits for field ecology, applied ecology, and the generalization of real-world ecological relations (Baguley, 2009; Tukey, 1969). Therefore, I will provide a brief introduction to meta-analysis and demonstrate how bias correction methods perform. To illustrate this, I will compare the results with those obtained using my preferred metafor package in R.
## Loading required package: Matrix
## Loading required package: metadat
## Loading required package: numDeriv
##
## Loading the 'metafor' package (version 4.4-0). For an
## introduction to the package please type: help(metafor)
data("example1")
#Run metafor
standard_metafor <- metafor::rma(yi=example1$est, sei=example1$se)
#Run EcoPostView
standard_meta <- meta(estimate=example1$est, stderr=example1$se)
## Call:
## meta(estimate = example1$est, stderr = example1$se)
##
## Summary:
## parameter predictor link group map mu se
## 1 b1 none-specified identity none-specified -0.2225 -0.2283 0.0208
## ll ul I2 n
## 1 -0.2624 -0.1943 0.3573 83
##
## Random-Effects Model (k = 83; tau^2 estimator: REML)
##
## tau^2 (estimated amount of total heterogeneity): 0.0182 (SE = 0.0049)
## tau (square root of estimated tau^2 value): 0.1350
## I^2 (total heterogeneity / total variability): 65.66%
## H^2 (total variability / sampling variability): 2.91
##
## Test for Heterogeneity:
## Q(df = 82) = 250.8709, p-val < .0001
##
## Model Results:
##
## estimate se zval pval ci.lb ci.ub
## -0.2283 0.0199 -11.4886 <.0001 -0.2673 -0.1894 ***
##
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Name parameter predictor link group map mu se
## 1 Main b1 none-specified identity none-specified -0.2225 -0.2283 0.0208
## ll ul I2 n
## 1 -0.2624 -0.1943 0.3573 83
Both the metafor and EcoPostView packages yield similar means (-0.22) and standard errors. This suggests that, in many cases, not specifying priors may be uninformative.
We can further assess the models performance by examining the bias through the residuals.
par(mfrow=c(1,2))
plot(1/example1$se, resid(standard_metafor),
xlab = "1/se", ylab="Residuals", main="metafor")
abline(a=0, b=0, col="red", lty=2)
plot(1/example1$se, standard_meta$Residuals,
xlab = "1/se", ylab="Posterior mean residuals", main="EcoPostView")
abline(a=0, b=0, col="red", lty=2)
Both functions reveal a clear diagonal pattern, which is nearly identical in both cases. However, the posterior means are pulled closer to the overall mean (a phenomenon known as shrinkage) for estimates with weaker standard errors. This bias can also be assessed using the rescheck function from EcoPostView
#Use the residual check function within EcoPostView
res_bias <- rescheck(standard_meta)
print(res_bias$bias_se)
This bias is clearly a result of excluding or selectively retaining ‘significant’ results, often through practices such as manually dropping ‘non-significant’ variables or using stepwise model selection methods like forward/backward AIC or BIC (Gelman & Loken, 2013). These practices can lead to a significant overestimation of the parameter (or ‘effect size’).
This bias can be corrected using Method 1 presented by Stanley and Doucouliagos (2014).
#Run EcoPostView (increased the chain thinning interval and number of iterations to improve mixing)
adjusted_meta <- meta(estimate=example1$est, stderr=example1$se, method = 1,
n_thin = 5,
n_iter = 30000)
## Call:
## meta(estimate = example1$est, stderr = example1$se, method = 1,
## n_thin = 5, n_iter = 30000)
##
## Summary:
## parameter predictor link group map mu se
## 1 b1 none-specified identity none-specified -0.0742 -0.072 0.0262
## ll ul I2 n
## 1 -0.1157 -0.0306 0.1982 83
## Name parameter predictor link group map mu
## 1 Main b1 none-specified identity none-specified -0.07420 -0.07200
## 2 Adjust b1 none-specified identity none-specified -10.61454 -10.55618
## se ll ul I2 n
## 1 0.026200 -0.11570 -0.030600 0.1982 83
## 2 1.526953 -13.09256 -8.106732 NA NA
In the method outlined above, the bias has been adjusted (not removed), resulting in a much lower pooled estimate. This adjustment allows for a clearer assessment of the relationship between the standard errors by examining the residuals. While this is commonly done using funnel plots, it can also be done by directly checking the residuals.
However, the adjustment should only be applied when clear patterns of bias are present, as it can lead to over-corrections even when no bias is evident. That said, it is highly effective when a bias is present in the data. In the future, I aim to incorporate additional methods to better assess the strength of any bias.
The bias displayed in this example is extreme, and in such cases, it may be beneficial to further explore the nature of the bias to determine if a ‘publication gap’ exists. This gap can be highlighted by examining the z-distribution derived from the p-value. Normally, the p-value is derived from the z-value, but when a clear gap is visible (as observed in Zwet & Cator, 2021), we should be able to model the absolute z-value as a mixture of two half-normal distributions - one truncated at 0 and the other at 1.96. Since the likelihood of such a mixture is challenging to estimate, I employ an Approximate Bayesian Computation (ABC) algorithm based on rejection sampling (ABC-rejection). This method is further described in Csilléry et al. (2010) and Hartig et al. (2011), and the model is formally presented in the theoretical section.
#From the dataset calculate the p-value from the effect-sizes and standard errors
pvalues <- ptoz(estimate=example1$est, stderr=example1$se)
#run the ABC-rejection model
result_abc <- abctoz(p=pvalues$data$p, nsim = 250000)
#Extract the information from the results based on a selected threshold
extract_abc <- extrabc(result_abc, xpos = 4, dist_threshold = 0.052)
## Statistic Mean SE ll ul
## 1 c 0.7763 0.0844 0.6158 0.9009
## 2 mu(z) 1.4225 0.4695 0.5540 2.0557
## 3 sd(z) 0.8934 0.1826 0.6442 1.2286
#Plot the histogram of the z-values with the simulated density lines of the posterior
plot(extract_abc$hist)
Based on the distribution of z-values, we can clearly observe that values where |z|>1.96 are more frequently published than those with lower absolute values. If no publication gap existed this would be reason to believe the data could serve as decent evidence against a null model in the frequentist framework. However, the sharp boundary at |z| > 1.96 suggests a selection bias. Furthermore, the z-values can be reasonably modeled, and the model appears to fit the data well, with an R-squared 0.92. From a heuristic perspective an R-squared above 0.6–0.7 can be considered acceptable. Additionally, the density curves align well with the histogram, further supporting the model’s fit.
The proportion of observations explained by the censored component of the model is 0.76 (76%). This does not imply that 76% of the data is censored, but rather that the model’s censored component captures a substantial portion of the observed pattern. The goal here is to determine whether the model provides a good fit to the data under the assumption of selective reporting. If the residuals suggest severe bias, this model fit offers additional information of a selection process at play.
To ensure the robustness of the results, one should verify that the model fit remains adequate and that the number of accepted simulations is sufficient (typically > 100).
Inclusion of a moderator is possible as well. The estimated parameter of the relation between the moderator and the effect-sizes is given in the summary output.
#Run EcoPostView with moderator
standard_meta <- meta(estimate=example1$est, stderr=example1$se, moderator=example1$mod)
## Call:
## meta(estimate = example1$est, stderr = example1$se, moderator = example1$mod)
##
## Summary:
## parameter predictor link group map mu se
## 1 b1 none-specified identity none-specified -0.1758 -0.1774 0.0219
## ll ul I2 n
## 1 -0.2132 -0.1415 0.29 83
## Name parameter predictor link group map
## 1 Main b1 none-specified identity none-specified -0.1758000
## 2 Moderator b1 none-specified identity none-specified 0.9045858
## mu se ll ul I2 n
## 1 -0.1774000 0.0219000 -0.2132000 -0.141500 0.29 83
## 2 0.8991034 0.2154674 0.5362936 1.240691 NA NA
The information required for a meta-analytic approach can often be extracted from figures, tables, datasets, or combinations of these sources. However, such information is rarely used in a consistent or standardized way. Put simply, multiple datasets are needed on which (G)LMs can be fitted (see Kaijser et al., …). This process often reveals that ecological data is noisy, potentially biased, and exhibits considerable heterogeneity across studies.
These challenges pose difficulties for drawing causal inferences or controlling statistical error. Both error control and causal conclusions require controlled environments, well-designed experiments, and the identification or modeling of confounding variables. While it may not be possible to impose such controls retrospectively (i.e., a posteriori), the existing information is still highly valuable.
This data can be used to make probabilistic statements, generate predictions, and estimate the a priori power required to design future studies that do focus on error control or causal inference. The purpose of this package is to enable such posterior analysis—allowing users to generalize ecological effects from the literature, visualize emerging patterns, and make informed predictions.
To use this package effectively, data is required, and (G)LMs need to be fitted to that data. This (meta-)data can be obtained from figures (e.g., using tools like WebPlotDigitizer), tables (e.g., by converting PDFs to Excel files), datasets, or combinations of these sources (see Kaijser et al., …).
The so-called “effects” we refer to are, more precisely, model parameters, commonly known as the intercept and slope - typically denoted as (b0 or β0) and (b1 or β1). These parameters define the equation:
response variable = b0 + b1 · predictor variable
This package is built on the underlying philosophy that if we accept a reported parameter (e.g., (b1) to represent an “effect” of the predictor, then such an effect should ideally be generalizable. For instance, the relationship between chlorophyll-a and total phosphorus is widely considered generalizable across aquatic systems.
From here on, the term model parameter will be used instead of “effect.” By collecting estimates of (b0) and (b1) from various studies, we can build a pooled model that predicts responses for one or more new values (xi or \(x_i\)). This approach allows us to understand the magnitude of the relationship, assess its variability, and make informed predictions.
In the context of Generalized Linear Models (GLMs), the response variable is linked to the linear predictor through a link function, commonly denoted as g(…).For example, when using the identity link function, no transformation is applied. In this case, the expected value of y - written as E(y) or (E(y|x)) - is directly related to the linear component, just as in a standard LM. \[g(E(y_{i} \mid x_{i})) = \beta_0 + \beta_1 \cdot x_{i}\] However, in a GLM with, log- or logit-link it is easier to talk about log-linear relations \[log(E(y_{i} \mid x_{i})) = \beta_0 + \beta_1 \cdot x_{i}\] or logit-linear relations \[logit(E(y_{i} \mid x_{i})) = \beta_0 + \beta_1 \cdot x_{i}\] In a Generalized Linear Model (GLM), the slope is not a “true” slope in the geometric sense, since the relationship between the response variable (y) and predictor variable (x) is no longer a straight line. However, the model is still considered linear because the parameters are incorporated linearly in the linear predictor. As such, the terms coefficient or regression coefficient typically refer to these model parameters, denoted as (\(\beta\)).
In practice, I often prefer to work with elasticity or semi-elasticity coefficients (Wooldridge, 2001), which can offer an interpretable measures in log-linear or logit-linear models. That said, their use is context-dependent and may not always be appropriate. The elasticity coefficient quantifies the percentage change in \(y\) associated with a 1% change in \(x\). For example, an elasticity of 0.2 implies a 0.2% increase in \(y\) for every 1% increase in \(x\). In a log-log model: \(y\) given 1% in \(x\). Hence, for a log-linear model \(log(E(y \mid x)) = \beta_0 + \beta_1 \cdot log(x)\) and thus \(\beta_1 = \frac{\log(y)}{\log(x)}\). For the semi-elasticity coefficient (i.e., logit-linear) this only accounts partially and values closer 0 are better interpretable because \(logit(E(y \mid x)) = \beta_0 + \beta_1 \cdot log(x)\) and thus \(\beta_1 = \frac{logit(y)}{\log(x)}\). This expressed the change in the log-odds per 1% elasticity
(Cramer, 1991; Wooldridge, 2001). These coefficients allow comparison across models and predictors while maintaining interpretable units for prediction. Since \(x\) is log-transformed, its original units are preserved — unlike in standardized coefficients, where this interpretability is lost.
As an example, consider a decline in benthic invertebrate species richness from 100 to 30 as conductivity increases from 50 till 5000 \(\mu S·cm^-1\) The elasticity is: \[\beta_{elasticity}= (log(100)-log(30))/(log(50)-log(5000))=-0.26\]. This decrease is the same for a decline from 10 till 3 over the same range \[\beta_{elasticity}=(log(10)-log(3))/(log(50)-log(5000))=-0.26\] Although the model’s intercept \(\beta_0\)would differ, this does not affect the interpretation of the regression coefficient \(\beta_1\), nor its uncertainty or visualization.
I use the ‘unofficial expressions’ for b0 and b1, due to the reference in the R-package to The expression from the models above would be more more formally expressed: \[g(E(y_{i} \mid x_{ij})) = \sum_{j=1}^{j} \beta_j \cdot x_{ij}\] Where \(x_{ij}\) refers to the \(j\) the predictor variable (e.g., salinity is \(j\)=1 and light is \(j\)=2) and \(i\) is the \(i\)-th observation. This expression will later be utilized in the explanation of the visualization.
At this stage, we assume that multiple (G)LMs have been fitted. From these models, the parameter estimates and their standard errors have been extracted and compiled into a dataset. For each estimate, it is useful to record relevant metadata including the source (e.g., DOI), the type of predictor variable (e.g., conductivity), group of the response type (e.g., benthic-invertebrates), link-function and if the model parameters is the intercept b0 or a regression coefficient b1. When models include multiple predictor variables, all corresponding regression coefficients are denoted as b1, distinguishing them from the intercept b0. The example below in R demonstrates the expected structure of this data frame.
## # A tibble: 6 × 9
## doi link group predictor parameter est se n mean
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 10.1127/1863-9135… log Inve… Salinity b1 0.0697 0.202 10 870.
## 2 10.1127/1863-9135… log Inve… Oxygen b1 0.319 0.196 10 8.15
## 3 10.1127/1863-9135… log Inve… Sediment b1 0.258 0.164 10 0.385
## 4 10.1127/1863-9135… logit Inve… Salinity b1 -0.950 0.723 10 870.
## 5 10.1127/1863-9135… logit Inve… Oxygen b1 -0.136 0.353 10 8.15
## 6 10.1127/1863-9135… logit Inve… Sediment b1 0.266 0.309 10 0.385
In the example above, the est column contains the estimated model parameters, while the se column holds the standard error of those estimates. The group column can represent an organism group, specific species, or taxon (or any other category you wish to use for grouping). The predictor column denotes the specific predictor variable, and the parameter column indicates whether the estimate corresponds to the intercept (b1) or a regression coefficient (b1). The link column specifies the link function used in the model. Additionally, it is recommended to include the sample size (n) in your dataset to adjust for ‘small-sample effects’ if needed (Peters et al., 2006; Moreno et al., 2009).
The meta-function can be applied over the example data via the following argument.
mod1 <- meta(estimate=example2$est, #Model estimate
stderr=example2$se, #Standard error of the model estimate
parameter=example2$parameter, #Model parameter (b0 or b1)
predictor=example2$predictor, #Predictor variable (independent variable)
link_function=example2$link, #Link function
grouping=example2$group, #Group
Nsamp=example2$n, #Sample size (optional, for adjustment 2=Peters (1/n)),
method=2) #Adjustment method (0=none, 1=Egger's (1/se), 2=Peters (1/n))
## Call:
## meta(estimate = example2$est, stderr = example2$se, parameter = example2$parameter,
## predictor = example2$predictor, link_function = example2$link,
## grouping = example2$group, method = 2, Nsamp = example2$n)
##
## Summary:
## parameter predictor link group map mu se ll
## 1 b1 Oxygen log Fish 0.3727 0.3762 0.1251 0.1828
## 2 b1 Oxygen log Invertebrates 0.1720 0.1711 0.0883 0.0268
## 3 b1 Oxygen logit Fish -0.0658 -0.1000 0.4492 -0.8385
## 4 b1 Oxygen logit Invertebrates -0.0315 -0.0287 0.1504 -0.2726
## 5 b1 Salinity log Fish -0.0394 -0.0313 0.0594 -0.1261
## 6 b1 Salinity log Invertebrates -0.1099 -0.1067 0.0251 -0.1481
## 7 b1 Salinity logit Fish -0.2856 -0.2711 0.4157 -0.9231
## 8 b1 Salinity logit Invertebrates -0.2935 -0.2989 0.0976 -0.4600
## 9 b1 Sediment log Fish 0.0504 0.0346 0.1608 -0.1916
## 10 b1 Sediment log Invertebrates -0.0559 -0.0557 0.0155 -0.0819
## 11 b1 Sediment logit Fish -0.4003 -0.3670 0.7676 -1.4529
## 12 b1 Sediment logit Invertebrates -0.1956 -0.1930 0.0848 -0.3305
## ul I2 n
## 1 0.5816 0.5928 27
## 2 0.3148 0.5387 59
## 3 0.6234 0.2475 14
## 4 0.2154 0.8264 38
## 5 0.0670 0.4911 29
## 6 -0.0658 0.4828 83
## 7 0.4252 0.5367 13
## 8 -0.1428 0.3780 52
## 9 0.2901 0.6648 9
## 10 -0.0310 0.8742 46
## 11 0.7020 0.6373 6
## 12 -0.0517 0.0137 26
The meta-function can return a warning that the MCMC-chains are not
properly mixing. This can be an issue due to various reasons. Where this
warning originates from can be assessed by looking at the ‘raw’ JAGS
model output (mod1$model$JAGS_model
). This could show that
a parameter of interested ‘mu[.]’ Has a a large Rhat or small effective
sample size. Most of these issues can be resolved by thinning the
chains, increasing the number of chains or setting more informed or
stronger priors. Moreover, if the issue is not an issue of the estimated
‘mu’ parameter, it could be decided to ignore it. These choices are
ultimately up to the user. An option to prevent warnings would be to set
the warning level for Eff_warn lower i.e., Eff_warn = 500.
The meta-function may return a warning indicating that the MCMC
chains are not mixing properly. This issue can arise for various
reasons. To diagnose the source of the warning, examine the raw JAGS
model output (mod1$model$JAGS_model
). Specifically, look
for cases where a parameter of interest, such as mu[.]
, has
a large Rhat value or a small effective sample size.
Most of these issues can be addressed by thinning the chains, increasing the number of chains, or specifying more informed or stronger priors. If the problem is not related to the mu parameter, you may choose to disregard the warning. Ultimately, the decision on how to address these issues lies with the user.
To prevent the warning from being raised, you can lower the threshold for the Eff_warn parameter (e.g., Eff_warn = 500).
A key advantage of the Bayesian approach is the ability to
incorporate prior information, thereby explicitly shifting the posterior
estimates toward more plausible values for the pooled model parameter.
To define a single prior for each relation and parameter, a specific
structure is required. By default, model parameters are assumed to
follow a normal (Gaussian) distribution with a mean (\(\mu\), prior_mu
) of 0 and a
standard deviation (\(\sigma\),
prior_mu_se
) of 0.5. At present, the prior distribution for
model parameters is limited to the normal distribution. The prior for
the residual standard deviation (\(\sigma\)) is defined as a uniform
distribution, with the upper bound (prior_sigma_max
) set to
5 by default.
As discussed in Section, I often prefer to work heuristically with elasticity or semi-elasticity coefficients. However, this is not required, and the choice of prior should reflect your modeling preferences and domain knowledge. In fact, failing to think carefully about the priors — even when even limited prior information is available — means the analysis is not truly Bayesian in nature.
Users can specify their own prior values for both the mean and
standard deviation. To obtain a structured overview of the required
prior inputs, set get_prior_only = TRUE
. This will return a
data frame containing a level column, as well as columns for the prior
mean (\(\mu\)) and standard deviation
(\(se\)). These values can then be
tailored to the specific context of the analysis using available prior
information. Details on how to incorporate this prior data frame into
your model are provided later.
only_priors <- meta(estimate=example2$est,
stderr=example2$se,
parameter=example2$parameter,
predictor=example2$predictor,
link_function=example2$link,
grouping=example2$group,
Nsamp=example2$n,
method=2,
get_prior_only=TRUE) #Only show the structure of the priors
print(only_priors)
## Levels Prior_mu Prior_se
## 1 b0_NA_log_Fish 0 10
## 2 b0_NA_log_Invertebrates 0 10
## 3 b0_NA_logit_Fish 0 10
## 4 b0_NA_logit_Invertebrates 0 10
## 5 b1_Oxygen_log_Fish 0 10
## 6 b1_Oxygen_log_Invertebrates 0 10
## 7 b1_Oxygen_logit_Fish 0 10
## 8 b1_Oxygen_logit_Invertebrates 0 10
## 9 b1_Salinity_log_Fish 0 10
## 10 b1_Salinity_log_Invertebrates 0 10
## 11 b1_Salinity_logit_Fish 0 10
## 12 b1_Salinity_logit_Invertebrates 0 10
## 13 b1_Sediment_log_Fish 0 10
## 14 b1_Sediment_log_Invertebrates 0 10
## 15 b1_Sediment_logit_Fish 0 10
## 16 b1_Sediment_logit_Invertebrates 0 10
An important advantage of the Bayesian framework is the ability to incorporate multiple prior distributions (\(k\)), enabling Bayesian Model Averaging (BMA; Hoeting et al., 1999; Hinne et al., 2020). This approach allows one to represent multiple plausible scenarios that could have explained the observed data, and to average over these competing models based on their relative credibility.
To implement BMA, a dataset similar in structure to the single-prior setup is required, but extended to include multiple prior specifications. Each prior distribution typically includes a mean (\(\mu\)) and standard error (\(se\)), just as before.
In many cases, prior weights are assigned to reflect how strongly each prior contributes to the model. These weights range between 0 and 1 and ideally sum to 1 (or 100%). For simplicity, especially when no strong preference among priors exists, equal weighting can be used (e.g., with three priors, each receives a weight of 1/3). Alternatively, when the weights themselves are uncertain, they can be treated as random variables and modeled using a Dirichlet distribution: \(weight \sim Dir(\alpha_i)\) where \(\alpha_i = 1\) for each prior (\(i\)), yielding a uniform Dirichlet distribution.
In the example below, I illustrate this approach using priors with varying values of \(\mu\) and \(se\). For intercept parameters, a broader prior such as \(N(\mu = 0, se = 10)\) is often reasonable, reflecting higher uncertainty.
## # A tibble: 705 × 9
## doi link group predictor parameter est se n mean
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 10.1127/1863-913… log Inve… Salinity b1 0.0697 0.202 10 8.70e+2
## 2 10.1127/1863-913… log Inve… Oxygen b1 0.319 0.196 10 8.15e+0
## 3 10.1127/1863-913… log Inve… Sediment b1 0.258 0.164 10 3.85e-1
## 4 10.1127/1863-913… logit Inve… Salinity b1 -0.950 0.723 10 8.70e+2
## 5 10.1127/1863-913… logit Inve… Oxygen b1 -0.136 0.353 10 8.15e+0
## 6 10.1127/1863-913… logit Inve… Sediment b1 0.266 0.309 10 3.85e-1
## 7 10.2307/1468008 log Inve… Sediment b1 -0.254 0.109 94 9.47e-2
## 8 10.1007/s10530-0… logit Inve… Salinity b1 0.825 0.204 27 2.87e+2
## 9 10.1007/s10530-0… log Fish Salinity b1 -0.151 0.143 25 2.86e+2
## 10 10.1007/s10661-0… log Inve… Oxygen b1 -0.370 0.391 10 2.71e+0
## # ℹ 695 more rows
mod2 <- meta(estimate=example2$est,
stderr=example2$se,
parameter=example2$parameter,
predictor=example2$predictor,
link_function=example2$link,
grouping=example2$group,
prior_mu=example3[c(2,4,6)], #prior for the mean
prior_mu_se=example3[c(3,5,7)], #prior for the standard error of the mean
Nsamp=example2$n,
method=2,
n_thin=10, #thinning the chains
n_chain=4) #changing the number of chains from 2 to 4
## Call:
## meta(estimate = example2$est, stderr = example2$se, parameter = example2$parameter,
## predictor = example2$predictor, link_function = example2$link,
## grouping = example2$group, method = 2, Nsamp = example2$n,
## prior_mu = example3[c(2, 4, 6)], prior_mu_se = example3[c(3,
## 5, 7)], n_chain = 4, n_thin = 10)
##
## Summary:
## parameter predictor link group map mu se ll
## 1 b1 Oxygen log Fish 0.3711 0.3566 0.1005 0.2130
## 2 b1 Oxygen log Invertebrates 0.1792 0.1972 0.0808 0.0720
## 3 b1 Oxygen logit Fish 0.2221 0.1490 0.2485 -0.2718
## 4 b1 Oxygen logit Invertebrates 0.1172 0.0747 0.1433 -0.1594
## 5 b1 Salinity log Fish -0.0642 -0.0555 0.0584 -0.1536
## 6 b1 Salinity log Invertebrates -0.1078 -0.1099 0.0251 -0.1497
## 7 b1 Salinity logit Fish -0.2942 -0.2763 0.2034 -0.5949
## 8 b1 Salinity logit Invertebrates -0.2754 -0.2929 0.0830 -0.4313
## 9 b1 Sediment log Fish -0.0138 -0.0575 0.1562 -0.3098
## 10 b1 Sediment log Invertebrates -0.0561 -0.0573 0.0155 -0.0826
## 11 b1 Sediment logit Fish -0.3266 -0.3309 0.1871 -0.6244
## 12 b1 Sediment logit Invertebrates -0.2187 -0.2121 0.0769 -0.3380
## ul I2 n
## 1 0.5323 0.5998 27
## 2 0.3339 0.5371 59
## 3 0.5201 0.2445 14
## 4 0.3129 0.8235 38
## 5 0.0353 0.4564 29
## 6 -0.0673 0.4878 83
## 7 0.0316 0.5444 13
## 8 -0.1561 0.3862 52
## 9 0.1989 0.6627 9
## 10 -0.0325 0.8733 46
## 11 -0.0772 0.6358 6
## 12 -0.0866 0.0135 26
## Name parameter predictor link group map mu
## 1 Main b1 Oxygen log Fish 3.711000e-01 3.566000e-01
## 2 Main b1 Oxygen log Invertebrates 1.792000e-01 1.972000e-01
## 3 Main b1 Oxygen logit Fish 2.221000e-01 1.490000e-01
## 4 Main b1 Oxygen logit Invertebrates 1.172000e-01 7.470000e-02
## 5 Main b1 Salinity log Fish -6.420000e-02 -5.550000e-02
## 6 Main b1 Salinity log Invertebrates -1.078000e-01 -1.099000e-01
## 7 Main b1 Salinity logit Fish -2.942000e-01 -2.763000e-01
## 8 Main b1 Salinity logit Invertebrates -2.754000e-01 -2.929000e-01
## 9 Main b1 Sediment log Fish -1.380000e-02 -5.750000e-02
## 10 Main b1 Sediment log Invertebrates -5.610000e-02 -5.730000e-02
## 11 Main b1 Sediment logit Fish -3.266000e-01 -3.309000e-01
## 12 Main b1 Sediment logit Invertebrates -2.187000e-01 -2.121000e-01
## 13 Adjust b0 NA log Fish -5.109015e-05 -4.932263e-05
## 14 Adjust b1 Oxygen log Fish -4.560827e-05 -4.203272e-05
## 15 Adjust b1 Salinity log Fish -3.384888e-05 -3.941297e-05
## 16 Adjust b1 Sediment log Fish -3.568222e-05 -1.828781e-05
## 17 Adjust b0 NA logit Fish 1.282226e-04 9.482775e-05
## 18 Adjust b1 Oxygen logit Fish -2.655013e-04 -3.037635e-04
## 19 Adjust b1 Salinity logit Fish 7.125142e-05 5.976862e-05
## 20 Adjust b1 Sediment logit Fish 2.993586e-05 1.036368e-05
## 21 Adjust b0 NA log Invertebrates -8.004157e-09 -4.928497e-09
## 22 Adjust b1 Oxygen log Invertebrates 1.363751e-06 6.973553e-07
## 23 Adjust b1 Salinity log Invertebrates -1.714578e-09 -1.901168e-09
## 24 Adjust b1 Sediment log Invertebrates -2.988813e-08 -2.213373e-08
## 25 Adjust b0 NA logit Invertebrates -1.376791e-05 -5.749704e-06
## 26 Adjust b1 Oxygen logit Invertebrates 2.034396e-05 2.258677e-05
## 27 Adjust b1 Salinity logit Invertebrates -1.466133e-05 -1.411271e-05
## 28 Adjust b1 Sediment logit Invertebrates -2.356307e-07 -1.094706e-06
## se ll ul I2 n
## 1 1.005000e-01 2.130000e-01 5.323000e-01 0.5998 27
## 2 8.080000e-02 7.200000e-02 3.339000e-01 0.5371 59
## 3 2.485000e-01 -2.718000e-01 5.201000e-01 0.2445 14
## 4 1.433000e-01 -1.594000e-01 3.129000e-01 0.8235 38
## 5 5.840000e-02 -1.536000e-01 3.530000e-02 0.4564 29
## 6 2.510000e-02 -1.497000e-01 -6.730000e-02 0.4878 83
## 7 2.034000e-01 -5.949000e-01 3.160000e-02 0.5444 13
## 8 8.300000e-02 -4.313000e-01 -1.561000e-01 0.3862 52
## 9 1.562000e-01 -3.098000e-01 1.989000e-01 0.6627 9
## 10 1.550000e-02 -8.260000e-02 -3.250000e-02 0.8733 46
## 11 1.871000e-01 -6.244000e-01 -7.720000e-02 0.6358 6
## 12 7.690000e-02 -3.380000e-01 -8.660000e-02 0.0135 26
## 13 4.611614e-05 -1.209478e-04 2.801339e-05 NA NA
## 14 2.642763e-05 -8.129904e-05 -9.666053e-07 NA NA
## 15 3.818205e-05 -9.827232e-05 2.463036e-05 NA NA
## 16 9.609997e-05 -1.626278e-04 1.194527e-04 NA NA
## 17 1.551216e-03 -2.432399e-03 2.574546e-03 NA NA
## 18 3.216326e-04 -8.037165e-04 2.551030e-04 NA NA
## 19 1.701535e-04 -2.381799e-04 3.038913e-04 NA NA
## 20 2.519442e-04 -2.436701e-04 2.420605e-04 NA NA
## 21 3.475176e-08 -6.166694e-08 5.065854e-08 NA NA
## 22 6.672871e-06 -9.614258e-06 1.221858e-05 NA NA
## 23 3.236347e-09 -6.857288e-09 3.419455e-09 NA NA
## 24 1.522298e-07 -2.698153e-07 2.293962e-07 NA NA
## 25 5.113420e-05 -8.771998e-05 7.646278e-05 NA NA
## 26 4.427064e-05 -4.560722e-05 9.724465e-05 NA NA
## 27 2.767021e-05 -5.685341e-05 3.332325e-05 NA NA
## 28 5.327189e-06 -1.006590e-05 7.113999e-06 NA NA
The results of the meta-analysis are summarized in a table that includes the Maximum A Posteriori (MAP) estimates, the posterior mean (\(\mu\)), standard error (\(se\)), and the High Density Interval (HDI), which by default is set to 90%. Additionally, the heterogeneity among studies is quantified using the \(I^2\) statistic.
The prior for the between-study variance (\(\tau^2\)) is, by default, specified as a uniform distribution ranging from 0 to 5. This choice has the benefit of producing wider intervals, which can be conservative—particularly useful when dealing with smaller sample sizes. However, this conservatism can also be a drawback in cases where more precision is desired.
To offer users flexibility, the argument prior_var_fam
can be set to "exp"
to use an exponential distribution
instead of the default "unif"
(uniform distribution). When
using"unif"
, the variance prior is specified as \(Unif(0, \text{prior_study_var})\). When
“exp” is selected, the prior variance becomes \(Exponential(\frac{1}{\text{prior_study_var})}\)
In general, for complex meta-analyses with many predictors and responses (e.g., Kaijser et al. …), the uniform prior is recommended, provided that convergence is achieved. In contrast, for more focused analyses with fewer predictors and responses - and particularly with small datasets (e.g., \(n < 10\)) — an exponential prior (e.g., with a mean of 1000) may be more appropriate.
These recommendations are heuristic — they are grounded in practical experience and prior applications, but should not be treated as universally optimal. Users are strongly encouraged to conduct sensitivity analyses to assess how prior assumptions influence the results.
After analyzing the meta-data, it is essential to check for bias, which can arise from multiple sources. This check should ideally be part of a sensitivity analysis, employing various methods such as: Display of the z-distribution, Egger’s test, Peters tests and/or funnel plots. Bias is nearly always present to some extent, but its magnitude may vary depending on the dataset.
A straightforward first step is to visually assess the relationship between the residuals and the inverse of the standard error (\(1/se\)). If sample sizes are available, one can also assess the relationship with \(1/n\).
If a clear diagonal pattern between \(\beta\) to \(1/se\) can indicate the selection larger effects with broader intervals, p-hacking, HARKing, data dredging, noise in the data, etc. A relation with \(1/n\) often occurs when small sample sizes an noise result in so called ‘small-study-effects’.
The residuals can be assessed across the total dataset, per group or per predictor.
Below the residuals per group in relation to \(1/se\).
And the residuals per predictor in relation to \(1/se\).
The dotted red line should approximately overlay the solid blue line, which represents the expected relationship with an intercept and slope of 0. However, small sample sizes can substantially influence the slope of the red line, potentially leading to misleading inferences. When clear bias is detected, it is advisable to either apply a bias correction method or specify stronger priors to mitigate the consequences.
Sensitivity checks play an important role in assessing the robustness of model results. For simpler models with highly informative data, these checks may not always be necessary. However, one cannot assume that identical results would be obtained in a subsequent study under different conditions. Consequently, drawing strong conclusions based solely on whether an interval includes zero is arbitrary and often misleading.
A more informative approach involves directly inspecting the posterior distribution, along with the estimate, its uncertainty, and a visualization of the variance in patterns revealed by the data. This is particularly important in ecology, where data tend to be noisy and often exploratory in nature—meaning the posterior distribution can vary substantially between studies. Still, sensitivity checks can serve as a useful reality check.
In this framework, a sensitivity check evaluates the difference between a fully specified model with informed priors (mod1) and a baseline model with vague or weak priors (mod0). This comparison is made by computing the posterior odds ratio: \(Log(P(Mod1|Data, Info)/P(Mod0|Data, Info))\) assuming that all other model hyper parameters are held constant except for the priors.
An alternative approach is to assess the extent to which prior information in mod1 versus mod0 contributes to a shift in the posterior away from zero: \(Log(P(Mod1>0|Data, Info)/P(Mod0>0|Data, Info))\) This can be transformed into a probability between 0 and 1, where 0.5 indicates no net influence of the prior on the posterior, 0 indicates complete negative influence, and 1 complete positive influence. However, I am not the biggest fan of this way of assessing sensitivity, as this again treats the posterior as a form of dichotomous hypothesis test.
In the earlier example, mod2 was treated as mod1. A corresponding weakly informed mod0 model can be created by setting all prior means (\(\mu\)) to 0 and their standard errors (\(se\)) to 100.
#Create a model with minimal prior information
mod0 <- meta(estimate=example2$est,
stderr=example2$se,
parameter=example2$parameter,
predictor=example2$predictor,
link_function=example2$link,
grouping=example2$group,
prior_mu=0, #prior for the mean
prior_mu_se=100, #prior for the standard error of the mean
Nsamp=example2$n,
method=2,
n_thin=10,
n_chain=4)
## Call:
## meta(estimate = example2$est, stderr = example2$se, parameter = example2$parameter,
## predictor = example2$predictor, link_function = example2$link,
## grouping = example2$group, method = 2, Nsamp = example2$n,
## prior_mu = 0, prior_mu_se = 100, n_chain = 4, n_thin = 10)
##
## Summary:
## parameter predictor link group map mu se ll
## 1 b1 Oxygen log Fish 0.3602 0.3803 0.1255 0.1907
## 2 b1 Oxygen log Invertebrates 0.1575 0.1727 0.0881 0.0325
## 3 b1 Oxygen logit Fish -0.0239 -0.0972 0.4486 -0.8626
## 4 b1 Oxygen logit Invertebrates -0.0278 -0.0265 0.1536 -0.2851
## 5 b1 Salinity log Fish -0.0401 -0.0341 0.0609 -0.1335
## 6 b1 Salinity log Invertebrates -0.1111 -0.1074 0.0251 -0.1521
## 7 b1 Salinity logit Fish -0.2809 -0.2718 0.4212 -0.9491
## 8 b1 Salinity logit Invertebrates -0.2838 -0.2981 0.0966 -0.4492
## 9 b1 Sediment log Fish 0.0192 0.0315 0.1661 -0.1762
## 10 b1 Sediment log Invertebrates -0.0559 -0.0554 0.0153 -0.0818
## 11 b1 Sediment logit Fish -0.3984 -0.3541 0.8782 -1.4656
## 12 b1 Sediment logit Invertebrates -0.1972 -0.1928 0.0837 -0.3322
## ul I2 n
## 1 0.5901 0.5906 27
## 2 0.3237 0.5399 59
## 3 0.5844 0.2399 14
## 4 0.2126 0.8247 38
## 5 0.0616 0.4729 29
## 6 -0.0700 0.4888 83
## 7 0.3818 0.5344 13
## 8 -0.1364 0.3758 52
## 9 0.3066 0.6662 9
## 10 -0.0308 0.8744 46
## 11 0.8176 0.6382 6
## 12 -0.0579 0.0132 26
#Perform the sensitivity check
sens_check <- senscheck(mod2, mod0)
#Plot the posterior odds
print(sens_check$posterior_odds)
The vertical black line in the plot represents the threshold where there is no difference between the models, i.e., where: \(0=Log(P(Mod1|Data, Info)/P(Mod0|Data, Info))\) This corresponds to equal support for both mod1 and mod0. Notably, the results show that for log-linear models, the fish–oxygen relationship, and for logit-linear models, the salinity–sediment relationship, the inclusion of prior information in mod1 shifts the posterior distributions toward more negative values.
To quantify the strength and direction of this shift, the inverse logit of the Maximum A Posteriori (MAP) value can be taken: To quantify the strength and direction of this shift, the inverse logit of the Maximum A Posteriori (MAP) value can be taken: \(logit^{-1}(MAP)\) This transformation expresses the shift as a probability where smaller then 0.5 indicates negative shift, bigger than 0.5 a positive shift and 0.5 none.
#Select only predictor and link function
inv_df <- sens_check$table[c(3:4)]
#Calculate the probability
inv_df$prob <-plogis(sens_check$table$mu[sens_check$table$group=="Fish"])
#Print the table
print(inv_df)
## predictor link prob
## 1 NA log 0.5019072
## 2 NA log 0.4683105
## 3 NA logit 0.4879110
## 4 NA logit 0.4796958
## 5 Oxygen log 0.5503834
## 6 Oxygen log 0.4543336
## 7 Oxygen logit 0.5088733
## 8 Oxygen logit 0.4042189
## 9 Salinity log 0.5019072
## 10 Salinity log 0.4683105
## 11 Salinity logit 0.4879110
## 12 Salinity logit 0.4796958
## 13 Sediment log 0.5503834
## 14 Sediment log 0.4543336
## 15 Sediment logit 0.5088733
## 16 Sediment logit 0.4042189
The table shows that for sediment in a logit-linear model related to fish, the posterior probability shifts from 0.38 in mod0 to 0.50 in mod1. This corresponds to a 12% increase, i.e., 0.50−0.38=0.12. This indicates that prior information contributes additional support for a negative relationship between fine sediment and lotic fish species — an relation is not fully supported by the data alone.
This is not a limitation but rather reflects what occurs in practice: prior information often drawn from empirical studies or domain expertise tends to be more directional. This highlights the value of incorporating prior information from the literature when building and refining Bayesian models.
Admittedly, this is the most demanding phase of the workflow: gathering and extracting data, fitting multiple (G)LMs, defining and implementing priors, assessing potential biases, and optimizing the model to ensure stable and interpretable results. Once complete, presenting the results or making predictions from the fitted models is considerably more straightforward.
Additionally, overlaying the probability density distribution of both models. Gives another way to asses the influence of the priors. It shows posterior density of both mod1 (M1) and mod0 (M0).
To visualize posterior results, a common approach is to display point estimates along with credible intervals. However, this method imposes sharp boundaries on a continuous distribution of uncertainty, which may not fully reflect the nature of a posterior probability distribution.
An alternative—and often more informative—approach is to plot the Posterior Density Distribution (PDD). This plot combines the point estimate, interval range, and the full shape of the posterior, offering a richer picture of the uncertainty and possible parameter values.
The PDD represents the distribution of the pooled parameter estimate
conditional on the meta-data and prior information: \(f(\beta_{\text{pooled}} \mid Meta-data,
Info)\). This is conceptually the inverse of the likelihood,
which tells us how likely the observed data are given a set of parameter
values: \(f(Meta-data \mid \{\beta_{i}, ...,
\beta_{n}\})\). In practice, this visualization can be generated
using the pdplot()
function, which overlays the posterior
density curve with interval and point estimates, allowing for intuitive
interpretation of the central tendency and uncertainty of the pooled
estimate.
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_vline()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_vline()`).
The object contains the figures generated for both the log
and logit functions.
And, a summary belonging to the figures.
## map mu se ll ul
## 1 0.3711 0.3566 0.1005 0.2130 0.5323
## 2 0.2221 0.1490 0.2485 -0.2718 0.5201
## 3 -0.0642 -0.0555 0.0584 -0.1536 0.0353
## 4 -0.2942 -0.2763 0.2034 -0.5949 0.0316
## 5 -0.0138 -0.0575 0.1562 -0.3098 0.1989
## 6 -0.3266 -0.3309 0.1871 -0.6244 -0.0772
## 7 0.1792 0.1972 0.0808 0.0720 0.3339
## 8 0.1172 0.0747 0.1433 -0.1594 0.3129
## 9 -0.1078 -0.1099 0.0251 -0.1497 -0.0673
## 10 -0.2754 -0.2929 0.0830 -0.4313 -0.1561
## 11 -0.0561 -0.0573 0.0155 -0.0826 -0.0325
## 12 -0.2187 -0.2121 0.0769 -0.3380 -0.0866
For larger datasets with multiple groups and predictor variables, it may be useful to adjust the order of the Posterior Density Distributions (PDDs) for better clarity and comparison. You can control the ordering of predictors and groups by using the arguments order_predictor and order_group. These arguments expect a character vector containing the names of the predictors or groups in the desired order.
By adjusting these arguments, you can organize the PDDs in a way that facilitates easier interpretation, especially when working with complex models involving numerous predictors and groupings.
A classical way to display the results of a meta-analysis is through a forest plot. These plots show the effect sizes of individual studies along with the pooled effect size. To ensure transparency, the prior distribution should also be included in the forest plot.
It is also possible to perform a formal model comparison using the Bayes Factor, which quantifies the evidence for one model over another. In the example below, regression coefficients for both fish and invertebrates are analyzed. The focus here is on displaying the logit-linear model relating invertebrate evenness to salinity.
We are interested in whether the observed parameters (effect sizes) across studies are more likely under model M1 (e.g., a model assuming a relationship) than under model M0 (a null or weak-effect model). To reflect no prior preference between models, we set fixed_prior = TRUE, which assigns equal prior probabilities to each model. This approach also generalizes to comparisons involving more than two models.
Note that, even when multiple priors are included, only the Bayes Factor comparing prior 1 to prior 2 is displayed in the figure.
The Bayes Factor comparing models 1 and 2 is given by:
\[ BayesFactor_{12}=\frac{P(Data|M1)}{P(Data|M2)}\cdot\frac{P(M1)}{P(M2)}=\frac{P(Data|M1)}{P(Data|M2)}\cdot\frac{0.5}{0.5}=\frac{P(Data|M1)}{P(Data|M2)} \]
In the example below, we focus solely on the model parameter for the logit-linear relationship between salinity and invertebrate evenness. The specific prior model M1 assumes a negative relationship, while M2 assumes no relationship or only an ecologically negligible one. Note that formal model comparison is not always necessary. It is entirely reasonable to interpret model parameters or effect sizes in terms of their ecological relevance. Not all statistical modeling needs to serve as a hypothesis test.
#Select only salinity from the dataset for invertebrates, logit-link and parameter b1
salinity_df <- example2[example2$predictor %in% "Salinity" & example2$group %in% "Invertebrates" & example2$link %in% "logit" & example2$parameter %in% "b1",]
#Get only the priors
priors <- meta(estimate=salinity_df$est,
stderr=salinity_df$se,
parameter=salinity_df$parameter,
predictor=salinity_df$predictor,
link_function=salinity_df$link,
grouping=salinity_df$group,
get_prior_only = T)
#Display the priors (default prior are broad)
print(priors)
## Levels Prior_mu Prior_se
## 1 b1_Salinity_logit_Invertebrates 0 10
#Generate the alternative model first M1 (H1), because I want to know how more probable the data is generated
#under H1 than H0 as P(Data|M1)/P(Data|M0)
priors$Prior_mu <- -0.3
priors$Prior_se <- 0.15
#Generate the null model second M2 or (H0)
priors$Prior_mu2 <- 0
priors$Prior_se2 <- 0.025
print(priors)
## Levels Prior_mu Prior_se Prior_mu2 Prior_se2
## 1 b1_Salinity_logit_Invertebrates -0.3 0.15 0 0.025
mod3 <- meta(estimate=salinity_df$est,
stderr=salinity_df$se,
parameter=salinity_df$parameter,
predictor=salinity_df$predictor,
link_function=salinity_df$link,
grouping=salinity_df$group,
prior_mu=priors[c(2,4)],
prior_mu_se=priors[c(3,5)],
fixed_prior = T, #Set the prior weights (odds) so each hypothesis gets 50% equal probability.
Nsamp=salinity_df$n,
method=2,
n_thin=10,
n_chain=4)
## Warning in meta(estimate = salinity_df$est, stderr = salinity_df$se, parameter
## = salinity_df$parameter, : The Rhat or/and effective sample size for some >1.01
## or/and <1000. Chains might not be mixing.
## Call:
## meta(estimate = salinity_df$est, stderr = salinity_df$se, parameter = salinity_df$parameter,
## predictor = salinity_df$predictor, link_function = salinity_df$link,
## grouping = salinity_df$group, method = 2, Nsamp = salinity_df$n,
## prior_mu = priors[c(2, 4)], prior_mu_se = priors[c(3, 5)],
## fixed_prior = T, n_chain = 4, n_thin = 10)
##
## Summary:
## parameter predictor link group map mu se ll
## 1 b1 Salinity logit Invertebrates -0.3047 -0.2938 0.0848 -0.4363
## ul I2 n
## 1 -0.1694 0.2827 52
#Generate a figure is more groups are included more plots are generated in order of "mod$summary"
fig <- forestplot(mod3, xlimit = c(-2.5, 1.5))
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_errorbarh()`).
#Display the most aesthetically appealing figure without adjusting the x-axis
print(fig$figures[[1]])
Hypothetical Outcome Plots (HOPs) are a valuable tool for visualizing how the expected value of a response variable might change in response to variations in a predictor variable, while keeping all other variables constant (Kale et al., 2019).
In a HOP, each line represents the marginal change in the expected value of the response variable as the predictor variable changes. This allows for a clear understanding of how the relationship between the predictor and response behaves, without the influence of other variables.
\[ g(E(y_{i} \mid x_{ij)})) = \beta_{pooled,j=0,m} + \beta_{pooled,j=1,m} \cdot x_{i,j=1}+ \sum_{j=1}^{j} \beta_{pooled,j} \cdot \hat{x}_{j} \]
The hypothetical prediction is generated using the posterior estimates, denoted as \(f(\beta_{pooled}|Data, Info)\). This parameter reflects the influence of a one-unit change in the predictor on the response variable.
To display this change, a set of sequential values for the predictor \(x_{i,j=1}\) can be generated where \(x_{i,j=1}=\{i, ..., n\}\) represents realistic values for the observed gradient of the predictor. The function \(f(\beta_{pooled,j}|Data, Info)\) provides the most plausible values for the pooled regression coefficients \(\beta_{pooled,j}\). By drawing a sample \(m\) from this distribution, \(\beta_{pooled,j,m}\sim f(\beta_{pooled,j}|Data, Info)\) and repeating this process multiple times, a range of hypothetical outcomes can be generated.
For this process, all other parameters are held constant at their estimated values, \(\sum_{j=1}^{j} \beta_{pooled,j}\), while the predictor \(x_j\) is varied. Therefore, the HOP lines represent simulations of possible marginal changes in the response from the posterior distribution.
A slight difference from classical HOPs is that, in the case of meta-analysis, \(\hat{x}=1\) is assumed. This approach still illustrates the marginal change in \(y_i\) given a change in \(x_{i,j}\) , but only under the condition where all other predictors are set to their estimated values \(\hat{x}_j=1\).
\[ g(E(y_{i} \mid x_{ij)})) = \beta_{pooled,j=0,m} + \beta_{pooled,j=1,m} \cdot x_{i,j=1}+ \sum_{j=1}^{j} \beta_{pooled,j} \cdot 1 \]
Note that the function operates under the assumption of log-transformed variables. This implies that the xlim argument, which defines the limits of the x-axis in the plot, is given in log-transformed units. For example, for the fraction of fine sediment, the log-transformation of the limits is exp(−4.6)=0.01 and exp(0)=1. These values correspond to the range of the fraction of fine sediment on a logarithmic scale.
The y-axis, on the other hand, is presented on the response scale, meaning that the values are not transformed, and they represent the actual predicted responses (e.g., taxonomic richness in this case).
In the future, I plan to implement an argument that allows users to set the xlim on the response scale directly, without needing to manually convert to log-transformed values.
Below is an example of the outcomes showing the response of invertebrate taxonomic richness along the gradient of fine sediment.
log_sal1 <- hop(mod2, #Object from the meta function
group="Invertebrates", #Select group to out of Invertebrates and Fish
predictor = "Sediment", #Select group to out of Salinity and Oxygen
xlab= "Fine sediment (%)", #Give x-axis a name
link_function = "log", #Which link function out of log and logit
ylab="Invertebrate taxonomic richness", #Give y-axis a name
xlim=c(-4.6, 0), #Give y-axis a name
ylim=c(0, 45), #Set limits y-axis
hop_lwd = 0.3) #Set width of the hop lines
#Display the HOPs
print(log_sal1)
It is also possible to scale the x-axis to display the exponentiated
values using the argument exp_axis
. This option will
transform the values on the x-axis back from their log-transformed scale
to their original scale, providing a more intuitive representation of
the predictor variable.
Additionally, you may notice that the intercept could appear
unusually high. This might be due to the fact that the intercept
represents the average intercept across all studies, with all other
variables held constant at a value of 1. If you wish to adjust the
position of the intercept, you can use the shift_b0
argument. This allows you to shift the intercept value for a more
accurate representation based on the specific context of your
analysis.
log_sal2 <- hop(mod2,
group="Invertebrates",
predictor = "Sediment",
xlab="Fine sediment (%)",
link_function = "log",
ylab="Invertebrate taxonomic richness",
xlim=c(-4.6, 0),
ylim=c(0, 20),
hop_lwd = 0.3,
exp_axis = T, #Exponentiate the x-axis notations
round_x_axis = 2, #Round the notation to full integers
shift_b0 = -1) #Shift the intercept by -1
#Display the HOPs
print(log_sal2)
When multiple predictors are included in the dataset, it is possible to visualize the effect of a change in one predictor while holding the other predictors constant. This can be achieved by creating Partial Dependency Plots (PDPs), which show how the predicted response variable changes as a particular predictor varies, keeping other predictors fixed.
The Partial Dependency Plot is a powerful tool for understanding the marginal effect of each predictor on the response variable. In the context of a meta-analysis, it helps illustrate how the relationship between a specific predictor and the response is shaped by the collective data from multiple studies, while controlling for the influence of other variables.
log_sal3 <- hop(mod2,
group="Invertebrates",
predictor = c("Sediment", "Oxygen"), #Select both Sediment and Oxygen
xlab= "Fine sediment fraction", #Give x-axis a name
ylab= expression(Oxygen ~ mg ~ L^-1), #Give y-axis a name
gradient_title = "MAP Invertebrate \ntaxonomic richness",#Give the y-axis gradient a name
pdp_resolution = 100, #Set resolution of the grid
link_function = "log",
exp_axis = T,
round_x_axis = 2,
round_y_axis = 0,
xlim=c(-4.61, -0.92),
ylim=c(1.61, 2.77))
#Display the PDP
print(log_sal3)
This part is still under construction.
I consider epistemics with a focus on reasoning and logic should play a central role in everyday life and science (but see Lakbut 2021 and Harman, 1986). In practice, reasoning - especially formal logic - is often not a major concern for many scientists. To be fair, some of the most significant scientific breakthroughs were not the result of formal reasoning (e.g., the discovery of penicillin; see Feyerabend, 1975). Nevertheless, structured reasoning and logic guide our thinking, shape clear arguments, and offer clarity to our thoughts. Perhaps the most immediate benefit of reasoning is not its guarantee of ‘truth’, but the aesthetic and coherence it brings to our argumentation.
Despite the previous normative position and values, many conclusions drawn and claims made in contemporary society are neither reasonable or logically valid and sound. This becomes especially evident when one closely examines the structure of arguments in scientific papers and popular articles. Perhaps an immediate consequence of understanding reasoning and logic is the impossibility (or near impossibility) for empirical sciences to develop valid and sound arguments. Logic, in its formal sense, concerns itself with the structure of arguments: for example, if B follows from A, and C follows from B, then C follows from A. Logic does not concern itself with the ‘truth’ (or meaning) of A, B, or C - those could just as well be X, Y, Z or \(\alpha, \beta, \gamma\). What matters is that the form of reasoning obeys certain principles.
It is often argued that reasoning, to some extent, depends on the principles of logic - though some (e.g., Harman, 1986) challenge this view. In everyday life, strict logical principles may not always be applicable or followed. Still, this does not diminish their potential instrumental value. To me applying logical principles can improve our ability to (1.) Decide what to believe about the world, (2.) clearly communicate our thoughts to others, and (3.) better understand the reasoning of others. This makes no attempt to “justify” why one should use logic (as I posited in the first paragraph), although it might lighten the burden of cognitive effort it demands in shaping a picture of the world.
As someone with a background in ecology, I increasingly find that confusion in reasoning does not stem from the world itself but from the way we use (or not use) logic, reason and language specifically, from issues in syntax and semantics. Hence, I see the choice of words (semantics) and the way we lay connection between them (syntatical structure) is more central to scientific practice - including ecology - than is commonly acknowledged. Yet, it seems not the case our language and reasoning is used to delve into what might be believed to be true about the world. Often it is used with a focus on status (power) and with an underlying political agenda (Foucault, 1981). The issue not that this is the case, but that this is hidden and not strengthens nor serves the argument well.
Throughout this work, I will use formal expressions - though not strictly within the syntax of formal predicate logic and epistemic logic. These expressions help expose the structure of beliefs and arguments, making their internal connections explicit. They are concise and avoid the ambiguities of natural language, aiming for a more neutral and formal presentation of meaning. There is less ambiguity about the form of the argument proposed and is useful to highlight confusion. While this is not intended to be a full introduction to logic - but focuses on reasoning - it will address key concepts such as validity, soundness, cogency, ambiguity, vagueness and the formalization of arguments. For a more comprehensive treatment, see Barwise et al. (2002), Lee (2017), and Smith (2021).
Because I focus on what can be believed (or what it makes sense to believe), I inevitably engage with other fundamental epistemological concepts—such as truth, belief, justification, normativity, “knowledge,” and “science.” I do not claim to have fixed definitions of these terms, nor do I take a rigid stance on them. Still, my writing tends to lean toward a mix of pragmatism, instrumentalism, constructivism, and minimalism. This may be because I am inclined to resist, or at least postpone, full-blown skepticism for as long as possible. What I present is therefore closer to applied meta-epistemology than to formal reasoning, logic, or epistemics per se (see Fumerton 1995). At times, however, my work may read as pessimistic, cynical, or even critical - perhaps in some places it drifts closer to skepticism than I intended. That said, cynicism is undeniably present. While I believe humanity has enormous potential for empathy and radical acceptance, which could be cultivated through reasoning, I have not yet found convincing evidence of this in practice. For this reason, I also turn to questions of sociology and identity.
Thus - as for now - the main (working) question can be empasized as (1.) When can we reasonably belief a claim to be an authentic search for truth in the world (2.) And perhaps more drastic (2.) How should we behave in light of this information?
As mentioned, some formal and syntactical logic will be used. This will be mainly in the form of predicate and epistemic logic and meta-language. It will also serve as a refresher and guide people into the specific style used for syllogism, propositional, predicate, and epistemic logic.
A simple syllogistic argument consisting of two premises.
\[ \text{Major premise 1)} \ all \ insects \ have \ six \ legs, \\ \text{Minor premise 2)} \ x \ is \ an \ insect, \\ \text{Conclusion)} \therefore \ x \ has \ six \ legs \\ \]
The argument above is called Modus Ponens, which follows a specific logical structure. In this form, the conclusion is entailed by the premises — that is, the truth of the premises guarantees the truth of the conclusion. Entailment means that in all cases where the premises are true, the conclusion cannot be false. If an argument has this feature, it is considered logically valid.
This argument can be further condensed by abstracting away the content and focusing only on its structure. In this way, the validity of the argument depends solely on the form of the entailment, not on the specific subject matter.
\[ \text{Major premise 1)} \ all \ P \ are \ Q, \\ \text{Minor premise 2)} x \ is \ P, \\ Conclusion) \therefore \ x \ is \ Q \]
In an argument like the one above, all premises are declarative statements, also known as propositions — sentences that are either True or False, but not both. Each premise typically consists of a subject (or antecedent), a copula (such as is, are, will be, etc.), and a predicate (or consequent). These components express a logical relationship between concepts.
An argument is said to be valid if, assuming all the premises are true, the conclusion cannot be false. In other words, the conclusion is logically entailed by the premises. An argument is sound if it is both valid and all of its premises are in factually true. Therefore, soundness implies both logical structure and factual accuracy.
For example: P1: All unicorns fly when eating carrots, P2: my unicorn eats a carrot, C: therefore my unicorn will fly. This is a valid argument because if the premises were true, the conclusion would necessarily follow. However, it is not sound because unicorns do not exist (as far as I know), so now consider the following argument: P1: most grass is green P2: the sky is blue C: you are reading this text. Even if all three propositions happen to be true, this argument is not valid because the conclusion is not entailed by the premises — there is no logical connection between them. An argument where the conclusion does not follow from the premises is considered invalid, regardless of whether its statements are true.
In propositional logic, the argument as given before can be formally expressed as below. In this formal expression \(\rightarrow\) indicates an implication (as “implies”) and the three dots \(\therefore\) indicates therefore. Similar \(\vee\) means “or”, \(\wedge\) means “and” and \(\neg\) is the negation symbol. Thus for example \(\neg P\) means not-P. Other operators include \(\vee\) indicates “or” (disjunction) and \(\wedge\) indicates “and” (conjunction). Using these basic operators, many more valid argument forms can be constructed—such as disjunctive syllogism, hypothetical syllogism, modus tollens, and more. These forms serve as the foundation for formal reasoning in logic.
In meta-language it is addresses how we speak about the object language. The meta-language discusses how the object language should and is used. It can contain more operators as long as it is defined what they imply. For example \(\equiv\) and \(\leftrightarrow\) indicates equivalence which can be understood as \(A \leftrightarrow B\) or \((A \rightarrow B) ∧ (B \rightarrow A)\). Hera \(A \equiv B\) have equivalent logical status, but are not the same. \(=\) indicates \(is\) or an outcome as identity i.e., \(A = B\) indicating A and B are the same. The symbol \(/\) indicates over or divided by.
In propositional logic some valid structure, such as Modus Ponens is then expressed as follows: \[P \rightarrow Q, P \therefore Q\]
Modus Tollens \[P \rightarrow Q, \neg Q \therefore \neg P\]
Hypothetical Syllogism \[P \rightarrow Q, Q \rightarrow Z, P \therefore Z \]
Disjunctive Syllogism \[Either \ P \vee Q, \neg P \therefore Q\]
Semantic equivalent structures
\[ \text{Modus Ponens:} \ for \ all \ x \ that \ are \ P \ it \ implies \ they \ are \ Q, x \ is \ P \ therefore \ x \ is \ Q \\ \text{Modus Tollens:} \ for \ all \ x \ that \ are \ P \ it \ implies \ they \ are \ Q, x \ is \ \neg Q \ therefore \ x \ is \ \neg P \\ \text{Hypothetical Syllogism:} \ for \ all \ x \ that \ are \ P \ it \ implies \ they \ are \ Q, for \ all \ x \ that \ are \ Q \ it \ implies \ they \ are \ Z, \ x \ is \ P \ therefore \ x \ is \ Z\\ \text{Disjunctive Syllogism:} \ either \ x \ is \ P \ or \ x \ is \ Q, x \ is\ \neg P \ therefore \ x \ is \ Q \]
An valid argument - in the semantic interpretation of validity - in case \(p_n\) \(p_i\) is true in all interpretations in which \(p_1, ..., p_{n-1} ,p_n\) are true. Simplistically, if all propositions are true the conclusion cannot be false. Such a valid argument can be notated as (see also Haack, 1978),
\[p_1, …, p_{n-1} \models p_n\]
In predicate logic modus ponens is expressed as \(\forall(P(x)\rightarrow Q(x))\) where ‘\(\forall\)’ indicates ‘for all’ and ‘\(\models\)’ therefore.Each proposition referred to as \(p_i\) consists of P and Q, called properties, and x is an element from a set \(x\in X\). The benefit of predicate logic (or approximating languages such as deontic-logic or a meta-language)is its possibility to quantify over different elements from a set \(x \in X\) can be assessed.
In predicate logic, the structures can be notated similarly. However, the advantage of predicate logic is the possibility to quantify over variables. The universal quantifier (\(\forall\)) means “for all,” and the existential quantifier (\(\exists\)) means “some,” “at least one,” or “few.” To quantify over all variables \(x\) with a predicate \(P(..)\), we write \(\forall x P(x)\), which reads as “for all \(x\) that have the property \(P\).” Similarly, for \(\exists\), the formula \(\exists x P(x)\) reads as “there exists at least one \(x\) that has the property \(P\).”
\[ \text{Modes Ponens:} \ \forall x(P(x)\rightarrow Q(x)), P(x) \vdash Q(x)\\ \text{Modes Tollens:} \ \forall x(P(x)\rightarrow Q(x)), \neg Q(x) \vdash \neg P(x)\\ \text{Hypotehtical Syllogism:}\ \forall x(P(x)\rightarrow Q(x)) \wedge \forall x(Q(x) \rightarrow Z(x)), P(x) \rightarrow Z(x)\\ \text{Disjunctive Syllogism:} \ P(x) \vee Q(x), \neg P(x) \vdash Q(x) \]
Epistemic logic is a modal extension of predicate logic, which is used to talk about an agent’s epistemic attitudes toward first-order (predicate-logic) statements. For example, the proposition \(\forall x (P(x) \rightarrow Q(x))\) can be stored in \(p\) as \(p := \forall x (P(x) \rightarrow Q(x))\). Then it is possible to speak about the proposition \(p\).
The basic modal operators are indicated by a capital letter, i.e., \(K =\) Knows, \(B =\) Believes, \(O =\) Ought (should). This capital is always accompanied by a subscript, e.g., \(K_a\), where “\(a\)” indicates a single moral agent \(a\). A capital subscript “\(A\)” indicates all moral agents of a community \(A\).
\(K_a(p)\): Indicates the modal
operator for “moral agent \(a\) knows
\(p\).”
\(B_a(p)\): Indicates the modal operator for
“moral agent \(a\) believes \(p\).”
\(O_a(p)\): Indicates the modal operator for
“moral agent \(a\) ought to \(p\).”
\(J_a(p)\): Indicates the modal operator for
“moral agent \(a\) is justified to
assert \(p\).”
There exist some rules (axioms) of logic that are self evident. There are no clear rules of reasoning as far as I am aware. Of course there are ‘laws’ in logic that can be derived from within the object language. There some *normative?) rules of rationality that make intuitively sense if one is aware of them. Yet, it remains unclear if we actually use them. I will lay them out here for further support. This does not mean they are not normative, but they will be useful down the line.
Law of identity Informal: The law of identity states that any object that is equal to itself. \[ \text{Formal:}\ A=A \ or \ A(x)=A(x) \]
Law of excluded middle Informal: A proposition is either true or false (there is no middle ground) \[ \text{Formal:}\ A(x)\vee \neg A(x) \]
Law of non-contradiction Informal: One cannot rationally hold logically inconsistent beliefs. \[ \text{Formal:}\ \neg (A(x)\wedge \neg A(x)) \] It could further be expressed informally as: For agent (\(a\)) to believe both \(p\) and \(\neg p\) is to be not a rational agent about p. \[ B_a(p) \wedge B_a(\neg p) \rightarrow \neg R_a(p) \\or \\B_a(p) \rightarrow \neg B_a(\neg p) \] Furthermore, the following would also hold for a rational agent. If the agent believes the antecedent (\(p\)) is followed by consequent (\(q\)) and \(p\) is the case the agent has to believe \(q\) follows.
Informal: If agent \(a\) believes the premises of a valid argument, one must believe the conclusion. \[ (B_a(p \rightarrow q) \wedge B_a(p) \rightarrow B_a(q)) \rightarrow R_a(p) \] Both the law of non-contradiction and norm of logical consequence are invoked albeit in different forms as a basis of ‘correct’ reasoning (not logic) by Harman (1986). This has been adapted by (Labukt 2021). I am unaware if this being actually taken serious, but they all make intuitively sense. The law of identity is also could be added as it is self evident that something that is different cannot be itself. Otherwise the law of excluded middle is largely supporting of the law of non-contradiction. Hence, if we cannot hold the belief that it is probable a proposition is either 80% true and 20% false but either 100% or 0% then it follows that we can neither belief both A or B to be true. This would seem to clash with the idea that beliefs come in degrees. Yet, there is a difference in believing something yes/no or expressing and modeling the belief.
Informal: If the probability of \(A\) can only take 1 and 0 \(P(A(x)) \in \{ 0, 1 \}\) (which is not a probability) and the probability of \(B\) can only take 0 or 1 \(P(B(x)) \in \{ 0, 1 \}\) which implies that either A or B. \[ \text{Formal:}\ (P(A(x))=1 \ \vee P(A(x))=0) \ \wedge (P(B(x))=1-P(A(x)) ) \rightarrow A(x)\vee \ B(x) \]
Deduction
Deductive reasoning moves from general principles to specific conclusions. It begins with premises that are assumed to be universally true and applies them to individual cases. If the premises are true and the reasoning is valid, the conclusion cannot be false. For example: P1: All humans are mortal, P2: I am a human C: Therefore I am mortal. \[ \forall x (H(x) \rightarrow M(x)), H(x) \models M(x) \]
All the argument displayed before where deductive moving from axiomatic assumptions to individual elements or objects.
Induction
In contrast, inductive reasoning moves from specific observations to broader generalizations. It attempts to infer a universal rule or principle from limited data, and thus does not guarantee the truth of the conclusion — even if all observed instances support it. Induction tries to generalize from the instances (x) we observe to all \(\forall\).
As such, if I observe twenty white swans therefore all swans are white.
\[ (Observe(x) \rightarrow WhiteSwan(x)) \models \forall(Observe(x) \rightarrow WhiteSwan(x)) \]
This reasoning is not valid. The observation of one swan with any other colour would negate the whole argument. This issue is known as the problem of induction. Hence, how can I justify my conclusion to be universally true based on a limited set of instances? This means inductive arguments might not be valid or sound.
Abduction
Abduction might be better seen in light of the Bayesian and looks more like affirming the consequent. If we state \(W1(a)\) 50% of the swans are white or \(W2(a)\) 90% of swans are white. We take 20 observations and see all are white we might therefore infer 10% of the swans are black.
\[ WS1(a) \vee WS2(a), ((Observe(x) \rightarrow WhiteSwan(x)) \models WS2(x) \]
Of course also this is invalid because we might actual have sampled a wrong location where only white swans occurred. However, it can be derived more quantitative. This makes abduction appealing in practice, although often not applied due to its subjective perception of how WS1 or WS2 are derived.
Experience of observation and being in the natural world Before addressing foundational concepts, our perception of the world itself has to be addressed. We often speak about the natural world. But when we do so, we do this subjectively. This subjectivity can be highlighted reflecting on it via phenomenology (Zahavi 2025).
The world is experienced from a first-person, subjective situated perspective. The experience being that of texture (touch), sound (hearing), smell, taste, colour (seing), and thought (Hume 1739; Lusthaus 2002). While thought is often unconventional, when I investigate it it often arises and just as the other five senses consciousness experiences our thought. Just as the other senses, they come and go, being temporal. What consciousness is, might be a question best left open (Lenharo 2023). However, to experience involves a subject-object duality. The experience is believed to come from an object, yet the only connection to an object is the experience itself. To suggest the experience of the object to be the object, is to presuppose the ontological commitment that all experiences are exactly what they are “reality”. However, our experiences are mediated by language, culture, biology, prior beliefs, ideas and cognition (For example, think about the fear of injecting the pain a needle can have, which for some is severe and others do not understand why). This is what phenomenology highlights not as the things themselves, but as they appear to the eyes of the subject. The readers’ understanding of this text or the pure disagreement of what is written. We can observe species and colours. Yet, these species are only socially defined by a taxonomic system. Colours are experienced but the experience can not be shared. Therefore we define an intersubjective social system with wavelengths providing a mentally reproducible definition of colour. If I see the green grass, I identify this organism as a species of Poa annua. It appoints properties to objects and based on a partial reductionist view of species traits or characteristics coded in its DNA, arising from molecules, consisting of atoms, built from subatomic particles till an underlying model of quantum fields. We dualize “the green grass” from what is not “the green grass”. The meaning of it all is intentional, heuristic and pragmatic from the creation of the atomic bomb to species identification for conservation. Our subjective experience (the I, ego or identity) is always busy with something. Meaning it itself is thereby a search for duality. Trying to search not for A=A, but A=not B. We try to generate context via duality or information via duality. It makes no sense to say that all=all. Hence, to learn something is to minimally exclude the belief in one of two propositions.
Objectivity might then be characterized in two ways: unconditional- or non-being. Unconditional being, could be characterized as a presence without conceptualization, without duality, yet still experiential. It is not negating experience, but describing a mode of experience without the subject - object split. It experiences objects without conceptualising it for heuristic-practical use (experiencing the properties of a spoon without knowing it is a spoon).
There is no “I see grass”. There is only seeing, or better yet: Being. This is not subjectivity stripped of concepts, but being prior to the need for a subjective heuristic-pragmatic spoon. Experience before duality, before identity, before categories. It asks a question: where does the “grassness” start? Hence there would be no grass without water, carbon dioxide, light etc. There would be no grassness without me assigning it the grassness to the object. What reason, beyond my reason, do I have to separate the grassness from the water, carbon dioxide and light? What objective reason do I have to separate my experience from the “greeness”? Without me there would be no grass, there would only be objects (but where would they start and end to be objects?). We acknowledge the properties. It is these questions that drive inquiry or ‘science’. However, often objectivity is viewed or misleadingly portrayed as a domain with no first- or third-person standpoint. No perspective. There is no experience, no subject, no object. Non-being, at least, not in a way that can be spoken of. This is the absolute outside of consciousness. No conditionality, no experience, no relation. Perhaps an extreme form of realism that equally requires extreme ontological commitments. This ontological commitment is often transcendental (e.g., Platonism).
However, “objectivity” as a 3th person cannot be a view from nowhere, it is impossible any view or perspective presupposes a stance, and a stance presupposes being. Unconditional objectivity is the silence and emptiness at the edge of thought and language. Belief, truth, knowledge, purpose and meaning collapses into presence, presence collapses into silence, and silence gives way to unknowability. The only constant of “nothingness”. This would be a place without sense, a place we cannot speak, believe and think of. Hence, to go beyond ourselves means to project our beliefs into nothingness, but this would be to generate ontological commitment to something that cannot be sensed and not be spoken of (Anat and Anat 2021; Wittgenstein 1921).
Ecology often claims objectivity, particularly through the use of models, data, and typologies that aim to describe the “natural world as it is”. The output of the models are seen as a transcendental object from a world inaccessible to humans. The models are defined as a “simplification of reality”, this definition a misuse of language. We can see this as follows: assume a t-test that compares the difference between two sample means (to a null model). Do these two sample means exist as physical objects? Hence, if we would both be in a room the person with an average height does not exist.
Many ecological concepts (ecosystems, niches, typologies) only make sense within a conceptual (anthropogenic) framework that includes subjective, interpretive, and heuristic elements. It is in itself already subjective to speak about them. There is no non-conceptual access to “nature itself” that could validate or replace them. Even the most empirical model is grounded in prior theoretical and ontological commitments, data selection, and interpretive frameworks.
By trying to adopt a purely third-person stance, ecology would strip itself of its own explanatory tools. It would strip itself from its access to the concept of ecosystem or nature. In fact, the act of modeling, measuring, and even observing always involves an epistemic perspective - a standpoint, a language, a purpose. It is subjective but it should be transparent in its epistemic and ontological commitments and thereby the meaning, intent and use of words. Pretending otherwise risks collapsing into either naïve realism or philosophical incoherence. The path forward is not to reject objectivity, but to redefine it as an intersubjective, rationally constrained, but perspectival process - not a meta-physical mirror of nature. However, this means the possibility to let-go of some theories and concepts (i.e., strict typologies or the idea of balance) to find a better fit under particular conditions.
In general beliefs that in this moment are hold in the mind are termed explicity believes. Believes that are current not in the mind are termed implicit believes. For example, we know that grass will not often be purple or red. Therefore if we are asked about non common colours of grass we might give the answer green. Yet we do not actively think about as grass not having a white colour.
Furthermore that I remember my past in the form of mental representations-perception (figures or movies) is regarded not the same as believing that I read that ‘most grass is green’. While both might be classified as believes the former described as episodic believes and the other as semantical believes. Both are believes but manifest in different ways. That most believes are communicated via semantics is a consequence of not being able to directly share particular beliefs.
It might also be possible to regard the belief about a proposition also possible for paintings, figures and other forms of expressions. For example propaganda during th WWII can be seen a propositions to generate belief about the Allies or Axis. This seemingly more acceptable in informal logic than in formal logic (Puppo 2019).
Through this text I speak about the believe in propositions this however does not mean that believe is only possible in and about propositions.
Some philosophical theories — such as the dispositional account of belief (Silva 2024) — argue that beliefs are characterized by a tendency to act: to assert the proposition, to deliberate with it, or to behave in ways consistent with it. Yet this view seems to rely on a prior assumption: that the proposition being acted upon is taken to represent something about the world. If a belief had no representational content — no relevance to one’s understanding of reality (including social, cultural, or religious contexts) — it is unclear why it would influence behavior at all. Thus, even dispositionalist theories seem to rest upon a representational core.
A belief, then, can be understood as an attitude toward a proposition (or other abstract representation as discussed before). It is a mental stance regarding how that proposition relates to the world. In this way, belief can be best described as a mental model or picture of reality. Whether this model reflects the world as it is or as it ought to be remains an open question. Often, belief is treated dichotomously: we either accept a proposition as true or we do not. For instance, if I believe the proposition “the grass is green”, then I regard it as a true statement about the world.
Belief does not have to represent ‘knowledge’. Assume I have thermometer in my aquarium that indicates always the correct temperature. I can fool the aquarium thermometer by placing it in a small plastic box with higher water temperature in the aquarium, or directly next to the heating element. The thermometer does not indicate the temperature of the aquarium, and this would surely indicate a false believe in the aquarium. Yet we gave it the sticker aquarium temperature. If would give it the name hydrometer to measure the air humidity. Indeed we might say that is not incorrect but we make the who placed it in the wrong positions. Hence, it is our job to place and read the information the thermometer represents us correctly .
The thermometer (or instrument) transfers information not believes. The information that is transferred is the information of the fact that the temperature is x degrees Celsius. In the same sense assume we have a frog and a frog has the instinct to try to jump and eat every small shade that moves by. Assume that small shades are created by the leaves of the trees and the frogs tries to jump and eat it. We would not suggest the frog is broken. The information that was transferred was insufficient for the frog to distinguish the dark spot of a shadow from a fly (adapted from, Bernecker and Dretske 2009).
Does this mean the frog is broken or not functioning properly? I think not, the frog simply jumps and tries to eat it. It does not see these spots as more than they are unless it believes they are flies. The only difference is that the shadows cannot be eaten. There is nothing wrong as long there is no expectation (or goal): all shadows are flies. Similar we are not wrong by suggesting the medium in the thermometer stopped at the symbol of 40 degrees Celcius as long a there is no expectation (or goal): the symbol represents the temperature in the aquarium which is 25 degrees Celcius.
The only thing that the liquid in the thermometer (for us) and black dots (for the frog) have in common is it relation to the state-of-affairs in the world. The information in itself not ‘wrong’ or ‘right’, ‘true’ or ‘false’. Information interpreted in the context of information theory transfers information through cables or wireless signals (Stone 2015). Assume that a person calls or text me, the voice I hear or the text I get is not the person actually speaking or reading. Even if I was to hear the person speaking the movement of the air reaches my ears where the movement of the … transfers it to signals which the brain interpreted. It could be said that through our senses we cannot truly experience the world. The flow that might reflect the state-of-affairs to me is then suggested to be information.
This information also does not have to be believed to be or stand in some relation to the world. Assume I cannot speak mandarin. The person in front of me can perfectly write mandarin an gives me a text. In a corresponding book I search for the proper text for a response and write this back (akin to the Chinese Room argument). In this sense this might be similar to what Large Language Models (LLM) do. Furthermore, I would not know if I would not do the same under the assumption the universe is purely deterministic (Lamme (2011) presents ‘the frog’ example as well). Yet, the argument shows that we do not have to hold believes about the relation of the information to the world at all.
Assume I am unaware a thermometer of my aquarium is broken, however it does indicate the correct temperature. After reading the temperature I would say I ‘know’ the correct temperature. Everybody unaware that the thermometer is broken would assume I am justified to suggest the temperature reading is correct.
Of course, some approaches — such as Bayesian epistemology — interpret belief in terms of degrees of credence, assigning probabilities to propositions. This works well for events that can be framed in terms of empirical frequency or proportions — such as the likelihood of drawing a black marble from a jar. However, this probabilistic interpretation becomes less coherent when applied to abstract entities. For instance, I prefer to express the information I have about a slope of a linear regression in possibilities. Yet, the slope is a theoretical construct derived from data — it does not ‘exist’ in the same way a marble does. Thus, beliefs about such constructs are not directly about world but about beliefs of what these models represent of the data. This raises questions about what these beliefs actually represent and related to the world.
Contemporary debates on causality, such as those involving Judea Pearl’s framework (Pearl 2009), often result in finger-pointing between philosophers, statisticians, and scientists (Pearl and Mackenzie 2019). These disputes frequently bypass a fundamental question: what exactly are ‘cause’ and ‘effect’ in the physical world? Seemingly because it prevents progress (Pearl and Mackenzie 2019). Yet, without clear definitions, critiques across disciplines often lack any substance. If we are arguing about something that has no content no definition it is impossible to discuss it. The same account is about ‘knowledge’.Consider I claim to ‘know’ that I wrote this text, then I am expressing that I remember writing it — i.e., I possess information tied to that act of writing. To distinguish ‘knowledge’ from ’held information (i.e., slope=y/x) leads to greater confusion rather than clarity.
The classical definition of knowledge as Justified True Belief (JTB)—though long challenged by Gettier cases—complicates the matter further. A more straightforward solution might be to discard JTB and the ideas defining of knowledge (Lemos (2021) does not consider this to beneficial) and instead treat ‘knowledge’ as a historical term. For example, consider this Gettier-case like scenario that the thermometer of the previous example is in the aquarium, the aquarium is 25 degrees Celsius (Truth). I read thermometer indicating 25 degrees (Justified), I believe it is 25 degrees Celsius (Believe), but the thermometer is broken. Clearly I would be wrong at any other temperature, but since the water temperature is so constant this is not the case. Can we consider this than ‘knowledge’? For the information provided it matters how we acquired a piece of information and what source supports it. Importantly, information need not be true or justified in a strict sense. While we may question whether information is reliable enough to inform a model of the world, there is no objective requirement for such reliability — only a normative one tied to specific goals.
If we insist that knowledge must be justified, then we are assuming that a sentence must be either a ‘true reflection’ or an ‘adequate representation’ of reality. Hence, the sentence has to stand in relation to something it needs to have sense. In such a framework, justification is the relational bridge between information and a particular purpose. But this leads directly to the problem of infinite regress: we must then define what we mean by ‘true’ and ‘adequate’, and ensure that these definitions are not themselves subject to the same ambiguities and vagueness. Otherwise, they may be applied inappropriately to cases or conditions to which they do not belong.
Take, for instance, the cliché ‘correlation is not causation’. A correlation alone does not belong to an objects (case or condition) causation. Thus, ‘cause’ and ‘effect’ are part of a linguistic framework — a language game, in Wittgensteinian terms. We must define the conditions under which these terms are justified. Pearl’s (2009) work, then, can be viewed as an attempt to formalize the syntactic structure under which the semantics ‘cause’ and ‘effect’ are descriptors, but for what exactly? Pearl (2009) refers to the ontological structure of reality: “because causal relationships are ontological, describing objective physical constraints in our world”. This would mean ‘cause’ and ‘effect’ are physical. However, I could perfectly argue that the properties of an object P of object x interact with the properties Q of object y as in \(P(x) \wedge P(y) \rightarrow R(x,y)\), for example precipitating \(Ca^{2+}\) salt as \(CaCO_3\)from water by adding \(Na_2CO_3\) via \[ Ca^2(aq) + CO_{3}^{2-}(aq) \rightarrow CaCO_3(s) \] The implication sign is than nothing more that a way to describe ‘cause’ and the effect would be \(R(x, y)\). This means ‘cause’ and ‘effect’ are linguistic games. We could as well suggest that cause and effect are meta-physical transcendental entities. That are there but cannot be observed by us (like Platonism).
Therefore we need to clarify what ‘cause’ and ‘effect’ are:
Descriptive (1.): Cause and effect are a language to describe that observation \(y\) is often or always followed by \(x\),
Physical (2.): Cause and effect are physical cases/objects
Metaphyscial/Transcedental (3.): Cause and effect exist beyond empirical observations.
Each position carries its own philosophical commitments and pitfalls. My point is not to resolve the debate (I will end it here as I am not here to discuss causality), but to highlight that ambiguity and vagueness in the use of the terms ‘cause’ and ‘effect’ signify. Perhaps as well as knowledge we coul also abandon it in most cases as it leads to mis-communication about what is meant and what expectations arise. The consequence of this ambiguity and vagueness is the use of statistical tools (p-values, Bayes factors and intervals) to imply causes via ‘significant effects’ — not the tools themselves, but the assumptions smuggled in through their interpretation are an issue.
A as a consequence to believe something about the world (given a proposition) is to make an ontological (and epistemic) commitment. We express our beliefs through symbols - words, numbers, figures - and thereby assume a relation between these symbols and the world. One possible philosophical position is to adopt a form of extreme realism (e.g., Platonism), where symbols such as the correlation coefficient reflect imperfect versions of ideal forms accessible only through reason. This view implies that relationships between variables like \(x\) and \(y\) are metaphysical in nature, and our human faculties only approximate them. The language we use reveals our ontological stance. In this context, I highlight two prevalent orientations:
Nihilism: An indifference to philosophical coherence or ontological reflection, often guided only by hedonistic utility (adapted from Gertz, 2019). Dogmatism: An appeal to authority or fixed ideology - whether realist, anti-realist, pragmatist, etc. - used to justify beliefs a claim (Sienkiewicz, 2019).
I intentionally exclude Skepticism from this classification. I contend that no one is a true skeptic in practice, though skepticism remains a valuable methodological tool for examining beliefs, ontologies, and openness to alternate viewpoints. As such, skepticism—though not adequately defined here—serves as a recurring theme throughout this text.
Consequently, I adopt a more traditional, dichotomous view of belief: we either hold a proposition to be true or we do not. This view aligns most closely with representationalism, which I take as foundational. The core of this position can be summarized formally:
\[ B_a(p_i) := \{ \text{At least part of what it is for an moral agent} \ \mathcal{a} \ \text{to believe proposition} \ \mathcal{p}_i \\ \text{ is for} \ \mathcal{a} \ \text{ to take } \mathcal{p}_i \text{ as an adequate representation of the world }\omega \} \]
In contrast, dispositionalism claims that to believe \(P_i\) is to be disposed to assert it, act on it, or treat it as a premise in deliberation (Silva 2024). Yet this still presumes that the agent regards \(P_i\) as a adequate representation of \(\omega\). One would not be disposed to act on a proposition unless one first represents it as true or significant in relation to the world. Therefore, I argue that even dispositionalist accounts depend on — and cannot escape — an underlying representationalist framework.
\[\emptyset\]
Language and the logic of semantics are strongly intertwined and a rational perspective is necessary. Terminology such as validity on soundness have rather strict definitions in logic (Barwise et al. 2002; Haack 1978; Smith 2021) but are not properly used in ecology.
Both validity and soundness are said to be about an argument. An argument consists of propositions which are declarative statements that can be either true or false (truth-apt). For example, ‘the grass is green’ is a proposition, whereas statements such as ‘watch out for the dog’ cannot be truth-apt.
In the form of a single world epistemic logic (EL; Van Ditmarsch, Van Der Hoek, and Kooi 2008) a model (Mod) maps a sentence (S; text, statement, and statements) in a language (L), to a set of propositions (P) to which each proposition (p) truth-value (V) can be assigned.
\[Model=⟨S,L,P,V ⟩\\p: =\{ p \in P \mid V(p) \in \{T, F\} \}\]
An argument is said to be valid if all propositions are true the conclusion cannot be false. The truth value of a proposition is assessed via an valuation function (\(V\)) \(V(p)=T\). If all propositions are true, the truth of the propositions is then entailed in the conclusion:
\[p_1, \dots, p_{n-1} \models p_n\]
Soundness is a stronger principle and refers to an argument to be valid and all propositions are factually true. Based on different propositions (\(p\)) and the connections between them a “belief system” is generated (Harman 1986; Usó-Doménech and Nescolarde-Selva 2016).
Since each proposition in the empirical sciences is expressed in a language (\(L\)), the beliefs about the world are shaped by it. The concept of Language in this context can be understood rather informally (see also, Puppo 2019) as the totality of abstractions and symbols that converge information. The language and chosen words conveying the beliefs, intent, status, moral, prudential and political position of the author(s). Since language is not the world, but a way we communicate our ideas, beliefs and experience of it is used to convey meaning. Hence, the meaning of a language is thus not solely in its axioms, definitions and form, but also in its use (Wittgenstein 1953). This brings a clearly challenging issue with it that if “anyone could give the words whatever interpretation he wished, under the pretext that in this context it means thus and so, no matter what it may mean elsewhere” (Lazurca 2025) then what are we claiming when we understand and interpret the text. If the semantic validity of our text depends on the meaning and thereby truth of a proposition. If as proposition in a set of proposition is appointed to incorrect meaning the validity of the claim does not follow \(p_1, \dots, p_{n-1} \models p_n\). It is certainly the case that meaning language is dependent on its use, but this cannot be solely seen to be disentangled fully from its formalism. This or else the truth of each proposition will not be entailed in the conclusions: pesticides kill pests under sufficient concentration, there are pests on these plants, I apply sufficient pesticides, therefore they will be killed.
Based on Wittgenstein (1921) it is possible to broadly categorize propositions according to their structure. I take only the (pragmatic parts) based from which try to construct a possible definition whether a proposition has empirical sense.
I will take a proposition - according to Wittegenstein (1921), Anat and Anat (2021), and Bronzo (2022) - to have empirical sense (EL) if it is logically coherent, truth-apt (can be assigned a truth value), and refers (\(R_w\)) directly or indirectly to the objects (\(w\)) in the empirical world (\(W\)).
\[\exists w \in W: \exists p(V(p) \in \{T, F\} ∧ R_w(p, w) \rightarrow EL(p))\]
Furthermore, both senseless and nonsense propositions are defined/interpreted. A proposition is senseless as when it is a tautology or true by identity alone (e.g., a = a, or 0 = 0) are true by virtue of logical form alone. While such propositions are valid and foundational to formal systems, they are empty from an empirical perspective. Propositions such as “we are correct if either the result is significant or non-significant” or “we are correct if either the results are plausible or non-plausible”. They are syntactically well-formed but tautological and do not, by themselves, contribute empirical content. Hence, while logically valid, they are senseless.
A proposition is defined as nonsense when it is grammatically correct, but meaningless because they attempt to say what lies beyond the limits of language and sense, like “red is a colour” (meaningful would be to say my shoe answer is red). According to Wittgenstein the tractatus if no property is appointed to an object is nonsense. Another way to look at it is that by saying “red is a colour” is to speak about how to use the language. It seems like the use of a meta-language. We could say “specific stones have a colour and the colours are red, grey and white”. Nonsense in such a view seems to ascribe the mixing of object and meta-language.
There is something peculiar about this, because it speaks about the world. We need to define how to interpret the language. We cannot refer to every word as being defined differently and apriori. If we assume our use of language and even logic is normative (source), then we have to define a foundation upon which the language is used. As in probability, a sample space is here a combination in which words apply. As such the meaning of the word is the use in its language. But we need to use a language in relation to the world or else we lose to get it detached. The issue is how to define if propositions and words relate to the world Rw(p, w). And if the proposition is truly empirical and exists \(∃w ∈ W\), because it would be truely nonsensical if \(\neg ∃w ∈ W\).
Karl Popper did not focus at first on characterizing what is ‘science’ (Laudan 1983; Popper 1968). What was his main interest was the separation of the metaphysical from the physical. According to the interpretation of his view p is empirical if it can be refuted (falsified) via empirical means (testing). But the issue is that the sentence “if there is a significant effect, it is not observed by chance alone” cannot be formally and empirically falsified, it is still believed. The problem is, in the first instance we can keep believing the proposition. And what does it mean to empirically test something? The relation between chlorophyll-a and TP to be strong but it does not empirically address the mechanism that causes algae to take up TP and assimilate in the cell mass. What needs to be directly shown is the assimilation of PO4 into the cell mass. Furthermore, if we use a model parameter to assess it incompatibility with a null model P(Data>=0|Theta_0) or the plausibility of the alternative P(Data|Theta_1)/P(Data|Theta_0). We do not address the mechanism, just a summary of the data. But how is this summary of the data empirical?
In logic there is work on the principle of excluded middle either A or B. A bit of A or a bit of B does not work. As rectified from Popper, I would formulate the following principle of refutation, that does make an ontological (there exists at least) and normative (reasoning is needed) assumption. It is built on the law of excluded middle, the problem of induction and Modens tollens.
Law of excluded middle. We either let our language and argument perform a fallacy of affirming the consequent (\(\alpha\)) or a valid deductive inference (\(\beta\))
\[Either \ \alpha \ or\ \beta\]
Affirming the consequent. For example if it rains (\(A\)) the ground is wet (\(B\)). The ground is wet (\(B\)) therefore it rained is invalid. There are numerous reasons why the ground is wet. \[\alpha = \{ (A → B), (B \not \models A) \}\]
Modens tollens. For example if it rains (\(A\)) the ground is wet (\(B\)). The ground is not wet (\(~B\)) therefore it did not rain (\(~A\)) is valid.
\[\beta = \{\forall(A → B), (\neg B \models \neg A) \}\]
Since affirming the consequent is fallacious and not valid only \(\beta\) remains.
Problem of induction: after observing only a few (objects \(x\)) swans that are white (\(Q\)) it is concluded all swans are white. Since it is assumed at least one case can show the invalidity of this argument to search for ~B is the most efficient.
\[ \neg B:=\{(P(x)\rightarrow Q(x)) \models \forall(P(x) \rightarrow Q(x)), \\ \exists x(P(x) \rightarrow \neg Q(x) \not \models \forall(P(x) \rightarrow Q(x))) \} \]
It is unreasonable to assume that all swans are white, because what valid rule exists to extend this to all swans?
\[ B:= \{ P(x) \rightarrow Q(x)) \models \forall(P(x) \rightarrow Q(x)) \} \]
I short, since \(alpha\) is invalid only \(\beta\) remains. Thus the only logical way out of the issue is to accept the problem of induction and to search for inconsistencies. However, this creates a conflict with Quine (…) which, i.e., suggested defining the world into predicate logic and all variables that are existentially quantified reveal the ontological commitment. This means that at least an ontological commitment is made to \(\neg B\), in the sense we commit to the belief a case might exist where the rule does not hold.
However, this still does not access what we can define as epistemically senseful. Assume there exist an animal that is a dog \(∃x(dog(x))\) would be my ontological commitment to the existence of dogs in the same way that there exist an animal that is a unicorn. In this sense, the belief in the existence of ‘the probable’ of an event would be classified as ontological. Unless we disconnect the probable from the natural world. That is we treat it as epistemic or instrumental, then it is a meta-language about information, not about the world itself.
The issue is that no one follows \(\beta\) because a strict mechanistic falsification in argumentative form is never performed. In most cases we invalidly perform \(\alpha\) as if it was probable. Hence, take restoration management if nutrients increase algae blooms occur, there is an algae bloom therefore nutrient increased. Hence, what is applied is affirming the consequent \(A \rightarrow B, B \not \models A\). The success of the principle than on the cases in which affirming the consequent turned out to correctly address\(A \rightarrow B, B \models A\). In probable terms it is: plausible that if nutrients increase algae blooms occur, there is an algae bloom therefore nutrient increases. This in regard is almost the application of Bayes rule, with exception of the prior: \(P(B|A)P(A|B) \cdot P(B)/P(A)\). In this regard we might also fall to a form of frequentism where we accumulate the proportion of successes \(A \rightarrow B, B \models A\) and failures \(A \rightarrow B, B \not \models A\). Thus none of this really in line with Popper as clearly often is suggested when ‘refuting null hypothesis’ (…). In this sense, Popper did focus on mechanistic theories (e.g., evolution theory …) not on hypothesis per se. Hence the Quine-Duhem thesis suggests that even if a hypothesis seems unreasonable often an ‘auxiliary hypothesis’ is invoked when needed to save a theory. In ecology this often happens quite unconvincingly by suggesting it is “context dependent”.
Language and the beliefs shaped by it play a role in sharing information. The objectivity of the language and chosen words conveying the beliefs, intent, status, knowledge and moral and political position of the author. It is therefore used to convey meaning. Broadly there could be three levels of objective uses of language.
(1.) Objective language uses statements of objects, stones, fish, rivers and Subjective language to feelings and emotions such as anger, happiness, good or despair.
(2.) Objective language is used when one speaks of objects other than what is in the mind of a moral agent. For example, I saw a fish swimming, or your dad was angry are objective, while Subjective language would be, I think your dad is angry or I think I feel sad.
(3.) Objective language is used in so far it is not coloured by emotions. For example an ‘Temperature is negatively correlated to invertebrate species number’ is objective (see also the lexicon used by (Markowitz and Hancock 2016)), while Subjective language of the same information would be ‘Worldwide anthropological altered temperature increase causes extreme negative effects on biodiversity’.
Most statements in ecology are largely objective in 2. but subjective in 3. Also, while objectivity to 1. is often implied utterances contain subjectivity connected to 3. Hence, ‘good’ ecological quality, ‘natural’ or ‘balanced’ ecosystems are in itself judgments on ‘naturalness’ and ‘goodness’ (Shrader-Frechette and Mccoy 1994). Yet, superficial reasoning does not not make ends meet (Maier 2012). It needs either full proof of why this moral (normative) judgment is required.
Additional 2. provides difficulties in that we end up in peril when talking about our data and models. Assume we sample the number of species in relation to an independent variable (e.g., temperature), then the set of numbers is subjective in 2 and 3 if we would speak about the outside world. A correlation coefficient is not the world, the data behind it is not the world. There are no data and correlation coefficients floating around in the universe. At least, that is the belief in what I ontologically commit myself to. Unless it can be shown (and fully proven) the correlation coefficient is part of the world independent of the mind of the moral agent. The phenomenology question of interest is how these beliefs structure the experience of the world as these views are forced into our writing.
However, our being and experience is not the subjectivity, it is the unexamined assumptions and wishes that underlie the intentions and communication. We still seem to search for an objective world as unconditional from our subjective experience of it. But the whole implication of communication and language is a subjective being. What should be clearly communicated is what we mean, believe and justify and which assumption underlie this. The move towards a simplified language close to topic neutral disregards the emotive, social and hierarchical nature which is related to value laden normative statements.
The issue being that the object language - the set of numbers and models (abstracts) describing objects/things - is not the meta-language. The meta-language describes what one “ought” to believe or act based on the interpretation of the object language. For example, the title “Global meta-analysis shows action is needed to halt genetic diversity loss” shows that we need to act based on what a meta-analysis shows. This mistakes that a statistic is not sufficient to generate such claims. We end up in the classical “is-ought” dilemma. We cannot make an “ought” of an “is”. In an analogy, if coconuts fall from trees and kill people we need to remove all coconut trees. This presumes the normative position that the death of people is a bad thing. Yet, this badness does not exist in a purely descriptive universe; it is a normative claim.
In a short epistemic each identity of a subject arises from holding or asserting some normative beliefs formed via our interaction with the world (we form an identity as Sarte would say).
P1: propositions are used to conceptualize (navigate) the world by generating a belief about the relation between proposition and the world.
\[ B_a(p) \leftrightarrow B_a(R_w(p, w)) \] P2: A proposition is true when it is believed the proposition is an adequate reflection of the world under the proper interpretation of the language.
\[ V(p)=T :=B_a(R_w(p, w)) \]
Language (abstracts, thoughts, ideas, paintings and figures) and belief are inseparable: our understanding of the world is always mediated through propositions that structure our beliefs. Truth- and false-hood do not pertain to the world itself - the world simply is as it is - but only to our beliefs about it.
P3: Normative propositions are not epistemically factual.
\[ B_a(O_a(p, w)) \rightarrow \neg F_a(p) \] What “ought” (should) to be, is a statement relative to what “is”. What “ought” to be cannot generate an “is” statement. For example, we ought to restore a habitat is a normative statement.
C: For any concluding normative beliefs in a proposition, the belief can not be epistemically factual and the argument can not be sound.
\[ p_1, \dots, p_{n-1} \models p_n \\ and \\ B_a(O_a(p_n, w)) \rightarrow \neg F_a(p_n) \]
This in itself is not surprising nor counter intuitive and aligns with Hume (1739). However, this seems largely to be the case of the writing and arguments in ecology where facts are mixed with norms. At least many articles advocate environmentalism (ought to be) rather stating what is. Hence, normative reasoning is belief- and identity-based and not solely epistemically factual. This demands that we should be open about contextual, identity-relative, and not masked as empirical facts.
In the previous paragraph it was shown the valuation of the truth- or false-hood of a proposition was performed by a valuation function. However, the valuation of propositions as true or false is not something that is performed independently of being. If the validity (and thereby soundness) of an argument is derived by the valuation of a proposition independent of being then no discussion and argument would be necessary. Then the truth of a proposition is solely defined by the factuality of the world (Lynch 2001). I will not shun realism, but there lies a peculiar difficulty in aligning facts, truth, belief and justification. While one could agree our abstractions are not reality this does not imply we need to forsaken realism (Armstrong 1978). Yet, this does also not mean we should search for an experience of the external world beyond our own being.
The question is also not “does truth exist”, it is the question “what is it what we mean by truth”. Hence, the answer we cannot obtain “the truth” already presupposes it exists (Heidegger 2016). Thus, the question is when or by what criteria do we state that a proposition is true? I hereby simply take a minimalistic and more Tarskian semantic definition of truth. Hence, it is true when we utter “the grass is green” when “the grass is green” in the world. This is more conforming to a concordance theory of truth at first glance, yet it is more nuanced.
Language (abstracts, thoughts, ideas, paintings and figures) and belief are inseparable: our understanding of the world is always mediated through propositions that structure our beliefs. Truth- and false-hood do not pertain to the world itself - the world simply is as it is - but only to our beliefs about how our belief of a proposition relates to the world. These beliefs are articulated in a language, and it is through this articulation that propositions acquire truth values. Without social interpretation and valuation, notions of truth- and false-hood would lack operational significance. It would become like a map with no reader. Thus, truth is not an intrinsic property of the world, but a feature of our social, cognitive, linguistic engagement with it.
The world (w) as we experience it cannot exist independent of being. It is our belief in the fact, as a meta-physical commitment that is forever in a finite mental state. To state ‘the grass is green is a fact’ is to hold the inner belief that the grass is green independent of an observer. But the grass being green without me observing is epistemically nonsensical because there is no-one to assess the truth-value of the facts of “grassness” and “greeness”. Grassness (Poaceae) is the creation of a taxonomic system (a model of characteristics) to align the expressions of a social system to create coherence within a language. A counter argument could be that “water boils at 100°C”, but “boiling” is phase change from liquid to gas under standard conditions, “100°C” is a point on a human-devised temperature scale. ”Sea level” is the average surface of the ocean as a reference for pressure. Without humans or our measurement conventions, these concepts do not exist as meaningful distinctions in the universe - they are descriptive tools we impose on our experiences. Even if we would argue that both ‘the grass is green’ and ‘water boils at 100°C’ we make a meta-physical commitment - a belief in a proposition - of the ontological status. It is not that the ontological status (factual existence) of p as w cares, it is the necessity to convince the other that p factually exists as w. However, a pure ontological commitment would not see the necessity to convince the other because the fact is. Factuality disregards the mental state of individuals convincing a subject p exists as w.
When reflecting on soundness, soundness asks for the facts as if words have sense, while validity asks for coherence, although coherence does not need sense. Therefore, a foundation needs to lie outside the system that is described, while the system itself should be logical coherent. Hence, soundness asks for the factuality of the information contained in \(p\) relative to \(w\) \(R_w(p, w)\). Validity conceptualises how \(p_1, …, p_{n-1} ⊨ p_n\) entail each other. A valuation function on empirical truth needs to contain a function \(R_w\). Therefore, when an empirical derived proposition is believed it is valued against the world.
In so far that our experience of the world and the fact itself are unmistakingly the same and influenced by our beliefs and social structure. The question if facts exist independently as in unconditional being makes only sense in contrast to the idea of conditional being. The facts are in itself described as independent of our experience. The unconditionality is a rather 3th person perspective (objective) from nowhere we seem to aim to formalize.
Thus, truth is in this sense only a concordance of semantics with the facts believed as they are and the social structure of the language resides in. Hence, salvation from suffering after death might in the middle ages indeed have been valued to be ‘true’. As after death one is laid to rest. In the same instance ‘significance’ is currently found meaningful, indicative for an error-rate (although mistaken) or causality. The statements are valued to be ‘true’ and in a particular social community. Similar beliefs on patterns in the world based acceptance of others, background information and models can be considered a decent approximation i.e., \(log(Chlorophyll-a)=-1 \cdot 0.8 \cdot log(Total Phopshorus)\). This, even though the model might underfit and is too simplistic, we accept the statement there is a ‘causal’ relation between Total Phosphorus and Chlorophyll-a.
The strength of the social construct that forms the belief in valuing the “truth” of a proposition can be completely social and power based (Foucault 1981). Hence, if we can control and change the meaning of words, abstractions and figures, we can control what truth consists of. Of course, this is extremely challenging for a perspective of realism; not only is truth not fixed, it is directed.
A proposition is then true if and only if (iff) it is believed (or valued) to be concordant to objects in the world. And all agents in the community accept the meaning of the words within the coherence of a system, often under epistemic norms or institutional authority.
Hence, it would be true to say ‘the correlation between x and y in the data is … ‘, iff ‘the correlation between x and y in the data is … ‘ It would be false to state that the correlation between x and y in the data is … ‘ then ‘the process that generated the data is causal’. We would need a premise that links the correlation between x and y to the concept of causality. (And if skeptical ask how causality is defined; skipped by many e.g., Pearl 2009). This indicates, that either a priori should be specified what it means for something or the relation to be causal, how this applies to the data, and how the data relates to the world. Hence, it would require more meta-physical assumptions to rely on a realist-concordance theory of truth.
As explained before deduction on empirical data is not possible so no valid or sound argument can be generated. The universal quantifier ‘\(\forall\)’ (largely?) unsupported and therefore \(\exists x(P(x) \rightarrow Q(x))\) where ‘\(\exists\)’ some. This indicates that propositions are merely acceptable. And in most cases ∃ can be ignored and as it would be wises to explicitly assume it. Meaning that \(p_i: (P(x)\rightarrow Q(x))\) or \(P \rightarrow Q\) and referred to as a conjecture. A conjecture is a proposition which is accepted as true via a valuation function \(V(p_i)=T\) (True) or false \(V(p_i)=F\) (False).
Empirical science developing theories built upon conjectures. Where a conjecture has to be supported by either party in an argument. If a conjecture cannot be accepted by one of the parties, discussion is meaningless and the source of the rejection of one of the parties should be resolved (Popper 1968).
Arguments are either weakly or strongly cogent (weak or strong arguments) as the conjectures cannot be guaranteed in empirical sciences. Strong arguments are then arguments where most conjectures are acceptable (e.g., temperature is 0 degrees when the thermometer works and indicates 0 degrees).
Similar an argument is then indicated as: \(p_1, …, p_{n-1} \models p_n\), and should entail the truth of the conjectures within the derived conclusion. When this is the cases, such an argument is then valid when all parties agree on the truthood \(v(p_i)=T\) (True) of the conjectures involved.
The problem is that the empirical sciences has no axiomatic foundation from which the proof of an argument can be logically deduced. In the sense that I believe the the conclusion (\(p_3\)) of the following argument. \(p_1\): All humans die, \(p_2\): I am human, conclusion (\(p_1 \models p_2\)), therefore I die (\(p_1 \rightarrow p_2, p_1 \models p_2\)). The conclusion forming a conjecture in my belief system and that of others. However, it does not ensure the soundness of the argument, where soundness refers to the factuality of all propositions. Yet, having not known all humans ever lived or going to live the argument is not sound. Moreover, it is questionable if reasoning deals with valid conclusions. Where one better can speak about cogency of the argument.
The agreement among parties - rather informally - forming a belief system that would be a function of acceptable (cogent) conjectures (Usó-Doménech and Nescolarde-Selva 2016). The conjectures tied together for each individual moral agent (Carthwright et al. 1996; van Fraassen 1980, Closely representing the nodes and edges in a network).
The moral agent then entertains the relations between propositions and arguments. Counter intuitively, the moral agents also appoints a property to an element x. Hence, a colour blind person would not see the specific colours a person that is not color blind would see, therefore not appoint the same colour property. Or, if a person does not know the function of a wrench, and therefore does not appoint the property wrench to it. If a property is not appointed to an element then the cogency (or validity) of an argument does not follow because without a property on x, the element x remains inert and undefined. Therefore,\(V(p_i)=T\) or \(V(p_i)=F\) remains undecided because \((x \rightarrow Q(x)), x \models Q(x)\) does not follow. Hence, each element is appointed a property, each proposition is valued, woven into an a cogent or not cogent (strong or weak) and formulated as a belief system.
As highlighted earlier, the agreement among the beliefs of individuals and groups forms what we might call a belief system — a structure dependent on a set of acceptable propositions called conjectures. The way our beliefs are organized directly influences how we reason.
Figure 2: Examplified structure of a belief system
A ‘string; of beliefs can be traced from a foundational or “hinge” proposition (labeled ’A’ in Fig. 2), which supports subsequent beliefs. If this foundational belief turns out to be false, all dependent beliefs — B, C, D, etc. — are undermined. This reflects the logical principle that if any proposition in a chain of implications \(p_1, …, p_{n-1}\) is false, then the conclusion \(p_n\) in \(p_1, …, p_{n-1}\models p_n\) does not follow.
For example, if \(A(x) \rightarrow B(x)\) and the valuation \(V(A(x))=F\) then logically \(A(x) \rightarrow \neg B(x)\). This view reflects what Harman (1986) refers to as the foundational theory of belief revision: a belief in a proposition \(p_n\) needs to be supported by another proposition \(p_{n-1}\).
It turns out that people do not follow this system most of the time and actually search to minimal reconstruct their believes system (Harman, 1986). This is termed the ‘maxim of minimal mutilation’ (Ferrari and Carrara 2025). For example, suppose we have \(F(x) \rightarrow H(x)\) and \(V(F(x))=F\). This leads to \(F(x) \rightarrow \neg H(x)\) threatening our belief in \(H(x)\). To preserve our believe in \(H(x)\) we might adopt a new proposition \(G(x) \rightarrow H(x)\), provided it does not conflict with existing beliefs. This is the basis of coherence theory in belief revision (Harman, 1986): people tend to make minimal changes that preserve the overall structure and coherence of their belief system. This can also be refered to all. We can draw parallels to a system van Fraassen (1980) sketches.
Consider a practical example. Suppose I believe I am bad at cooking because people dislike my food: \(DislikeFood(x) \rightarrow BadCooking(x)\) However, I later realize that some people dislike vegetarian food specifically. In contrast, vegetarians enjoy my cooking. The original belief no longer holds because: \(\neg DislikeFood(x) \rightarrow BadCooking(x)\). Therefore I need to revise all my beliefs that follow from my \(BadCooking\). According to the foundational theory of belief revision I need to drop the belief I am bad at cooking. According to the coherence theory of belief revision I could rationalize that vegetarians “have no taste,” preserving my original belief. This mirrors the logic shift from \(F(x) \rightarrow \neg H(x)\) to \(G(x) \rightarrow H(x)\).
A hybrid view can integrate both the foundational and coherence models. Beliefs can be seen as clusters (e.g., D, E, F, H, I, J), connected via strings of core beliefs (A, B, D). Different set of strings and clusters can be connected. For example, the left cluster connects to the right cluster (A2, …, J2) bridging beliefs.
Some beliefs (propositions) are accepted tentatively — neither fully embraced nor rejected. However, in practice, people often fully accept propositions because suspending judgment creates cognitive dissonance. Dewey (1997) notes that suspending belief is uncomfortable; full acceptance gives us peace of mind and reduces cognitive load (Harman, 1986).
Yet, full acceptance of a belief implies the need for justification. We must ask: Why do we accept this proposition? And how do we justify it?
I will here focus my skepticism to skepticism against philosophy (or meta-physics), although I am convinced that most things we believe are meta-physical. Skepticism proposes a system in which the properties, propositions, or arguments are ‘attacked’. Considering the structure of a valid semantic argument \(p_1, …, p_{n-1} \models p_n\) an attack tries to invoke disagreement so that the ground on which the properties within a proposition, and the propositions on which they are built cannot be considered true or false. Since valid argument is a believable argument unable to valuate the truth or falsehood of a proposition makes it impossible to asses its validity \(p_1, …, p_{n-1} \models p_n\) or invalidity \(p_1, …, p_{n-1} \not\models p_n\). Since, neither truth or falsehood can be asses one can neither judge the conclusion to follow from it. Unable to close an argument leads to suspense in judgement on p_n$. What remains unclear is that whether the skeptic aims at the suspense of judgment on the factuality of the propositions, the validity of the argument or both. I would assume both are a target because it is eventually we want to talk about the factual universe not about the validity of an argument.
The dogmatist - as opposed to the skeptic - always claims to have ‘knowledge’ or ‘truth’ but never able to show it upon attack. Nobody is really a full skeptic, nobody a full dogmatist therefore an absurd skeptic is unable to accept the suspense of judgement always in pursuit of refinement. The absurd skeptic searches for a different kind of meaning to ever regress propositional knowledge, perhaps to the annoyance of peers and willing ignorance of others due to the painful suspense of judgment (Dewey, 1979). It is willing to challenge its belief by entertaining ideas beyond dogma. It is both a revolt against to stagnation (status quo) and dogmatism.
I will place the following sections under the first line of attack. These sections highlight different methods (3 modes) to causing doubt in the believed conjecture. Hence, if the conjecture cannot be believed to hold then it logically follows that \(p_1, …, p_{n-1} \not\models p_n\).
The problem being the attachment of identity to knowledge (belief a proposition is true forming a conjecture). Hence, if one of the conjectures does not hold anymore and any belief that follow from it to \(V(p_i)=F\). This attacks the identity of the moral agent often leading to conflict. As such the logical counter argument cannot be accepted which in my experience often invokes an appeal to common practice or authority. I believe this to be the case because we believe the social hierarchy holds with identity. But to admit that most of the arguments we pose are invalid means to give up on social hierarchy.
The generation of an initial proposition \(p_1\) and counter proposition \(p_2\) generates a source of disagreement. For example, species x (P(x)) occurs more in freshwater (Q(x)) than in brackish water (~Q(x)). The latter can be utilized in the mode of disagreement either based on a dogmatic argument to disagreement between ‘experts’ of what brackish is, or on the basis that there is no demarcation line as salt in surface water is a concentration not a dichotomous anthropological defined boundary. Falling to the mode of relativity. Since experts decide whether x is Q(x) and ~Q(x) the ground of it relative. Either the expert provides full proof without contradiction or the expert cannot use the belief system with falling back to an appeal to authority.
The grounds for \(V(p_1)=T\) not known \(p_1, …, p_{n-1} \not\models p_n\). What is questioned is the valuation function \(V(...)\) assigning a ‘truth’ value to a proposition and based on what grounds does the ecologist decide \(V(p_i)=T\) or \(V(p_i)=F\). The state of this proposition called a ‘final variable’ or by De Groot (1998), describing the acceptance of \(V(p_i)=T\) without requiring any more normative justification in similar fashion Wittgenstein referred to it as a ‘hinge proposition’. Hence, a source of disagreement is almost sure to appear when searching for foundation or support of a ‘basic belief’ (in foundationalism).
Equipollence
I consider the mode of disagreement a way to start the ‘attack’. I do no specifically consider it a mode. The principle of equipollence is for example a way to start or end an ‘attack’. Perhaps equipollence is to be situated on similar grounds as the mode of hypothesis, which will be addressed later. A skeptical ‘attack’ starts with often proposing an equipollent argument. An equipollent argument being with the same structure and weight, but absurd conclusion. Often they can be used to highlight an informal fallacy, such as an appeal to authority, common practice or reduction to the absurd . If an equal weighted equipollent argument has the same persuasion power and cannot be accepted on the same grounds, suspense of judgement should reasonably follow on the original argument.
Equipollent arguments are however, not easily related to only the formal structure of an argument of a logical valid argument. A logical valid argument is \(C(x)→W(x), C(x) \models W(x)\) has the same form as \(U(x)→G(x), U(x) \models G(x)\) which are both valid. In plain language the former argument could be, A cactus (C) needs water to grow (W), there is no water, therefore it will not grow (modes tollens). This is as valid as the argument, my unicorn (U) can only fly when eating grass (G), my unicorn did not eat grass, therefore it cannot fly. An equipollent argument, thus attacks not only the validity of the argument, but mainly the context of the predicates or hidden premises. If I give the cactus salt brine water it will grow, which is absurd, in this case the argument is incomplete. As the argument is not complete it does not guarantee the the conclusion under all conditions. It is then reasonable to suggest it is internally valid iff these propositions would make up the context of the argument the moral agent refers to, but this is also absurd because that is not how the world works. Hence, the skeptic also attacks the soundness of the argument by pointing to the non-factual status of the propositions. Just as equipollent arguments highlight the incompleteness of premises, the problem of induction demonstrates that a finite set of observations cannot determine universal truths. This reinforces the skeptic’s claim that absolute certainty is unattainable.
The mode of hypothesis asserting that if observing worldwide biodiversity decline (B) is caused by humans (H) it is equally evident that observing no worldwide biodiversity (~B) occur if no humans are present (~H) is equally valid: \(p_1: \forall x (B(x) \rightarrow H(x))\) or \(p_2: \forall x (~B(x)→~H(x))\). If one has not proposed the counter proposition (hypothesis) and did not consider observations on \(p_1\), how to then to unbiased (with a straight face and conscience) suggest \(p_2\) is not equally likely?
Any proposition (\(p_1\)) accepted as conjecture in an argument where a counter proposition (\(p_2\)) has not been addressed can only be tautological. Since (\(p_2\)) might be a possibility, and \(p_2\) has been asserted then the commitment to believe \(p_1\) might be the wrong. Since this commitment might be potential wrong, it does not guarantee the conclusion, suspense of judgment should reasonably follow.
If we can define support for \(p_1\) via \(p_2\) till \(p_n\), where \(p_n\) is support for \(p_1\) we have a circular argument. For example, \(p_1 :(D0(x)\rightarrow F(x))\), \(p_2: (F(x) \rightarrow C(x))\) and \(p_3: (C(x) \rightarrow F(x))\). Here \(p_1\): When it is 0 degrees D0(x) water (x) freezes F(x). And \(p_2\): when the water is frozen F(x) it crystallizes C(x) and when the water is crystallized it is 0 degrees D0(x). Thus, when it is 0 degrees water freezes and when the water is frozen it is zero degrees \((D0(x)\leftrightarrow F(x) \leftrightarrow C(x))\). We cannot verify beyond the propositions. The same argument is used to defend a dogmatic boundary for ‘significance’ highlighted by Wasserstein and Lazar (2016).
Q: Why do so many colleges and grad schools teach p = .05? A: Because that’s still what the scientific community and journal editors use. Q: Why do so many people still use p = .05? A: Because that’s what they were taught in college or grad school.
The same issue accounts for other forms of boundaries. Hence, a BayesFactor01 < .05 could be raised to the same dogmatic status.
The claim to any proposition \(p_1\) requires support by \(p_2\) and \(p_2\) by \(p_3\) \(...\) \(p_{\infty}\). Assume the following example \(p_1\): Grass is truely green, \(p_2\) green light are particles having a particular wavelength, \(p_3\) the particles have this wavelength because \(...\). When we regress till infinity we cannot support \(p_1\) because \(p_{\infty}\) is unknown. Therefore we cannot suggest that the grass is truly green because we do not know what greenness truly consists of \(p_{\infty}, ..., p_2 \not\models p_1\). Since, the soundness of \(p_1\) cannot be supported based on a known foundation, judgement should reasonably follow. This does not mean infinite regress is a ’bad; thing. It highlights the weak basis arguments are founded on. Another example, \(p_1\): A BayesFactor10>20 is strong support, \(p_2\) >20 indicates >95%, \(p_3\) we use >95% because \(...\). Even a pragmatic reason for 95% would need a supporting proposition. For example, 95% is uses because 99% is too unstable in the iterations of the Markov-Chain-Monte-Carlo and then we can invoke the mode of hypothesis why is not 96% used? or mode of relativity what does 95% represent of reality?
This highlight any believe in a proposition \(p_n\) under some condition \(c \in C\) in an argument \(p_1, …, p_{n-1} \models p_n\). Can be suspended based on any other premise that is not certain/conclusive and therefore \(p_n\) does not follow.
More formally, for any belief (claim) from an agent (\(a \in A\)) there exists at least some propositions of disagreement over the claim under particular conditions so that the disagreement can be exploited in the mode of hypothesis (MH), Mode of Circularity (MC) and mode of infinite regress (MI). \[ \forall a \forall p \exists (B_a(p_n) \rightarrow Disagreement(B_a(p_n), c) \rightarrow \{MH \vee MC \vee MI\} \\ MH: Pr(p_n)/Pr(\neg p_n) \neq \{0, \infty\} \rightarrow \neg J_a(B_a(p_n)) = Undecided \\ MC: Valid(p_1, …, p_{n-1} \models p_n) \ p_n \in \{ p_1, …, p_{n-1} \}) \rightarrow \neg J_a(B_a(p_n)) = Circulair \\ MI: \forall p_m \exists p_n (B_a(p_m \models p_n) \wedge p_m < p_n ) \rightarrow \neg J_a(B_s(p_m))= Non-foundational \]
This formal expression can be more clearly visualized in a diagram as in Fig. 3 below.
Figure 3: Structure of a the skeptical network that can be applied to suspend judgment on propositional claims.
There were previous chapters addressing ‘objective language’, the relation between ‘information, belief and facts’, ‘truth’ (still under construction) and ‘normativety’. They indicate the concepts that are normative such as what should be and what is.
In most cases the theory and background is often not provided in many cookbooks, but makes it impossible to interpreted, criticize our results. This section introduces some theory. However, if you are already familiar with it or find it too technical, feel free to skip it. Since this R-package primarily uses Bayesian inference, an introduction to Bayes Theorem and methods used may be of interest. While it is not necessary needed for the utilization of the R-package itself.
All statistics focuses on estimating the parameter of interest \(\theta\) which in most (G)LMs is denoted as the parameter \(\mu\) or \(\beta\). For consistency I will use the parameter of interest \(\mu\).
The population parameter \(\mu\) is fixed but unknown. To investigate plausible values for \(\mu\), we collect measurements or samples. These observed data points, denoted as \(x = {x_1, \dots, x_n}\), are realizations of an underlying random variable \(X = {X_1, \dots, X_n}\), where each \(X_i \in \mathbb{R}\). We assume that these observations are independently and identically distributed (i.i.d.) from a common distribution.
\[X\stackrel{\text{iid}}{\sim} N(\mu, \sigma^2)\]
Since we do not have \(X\) but only a set of realizations we need an estimator, which is the sample mean \(\bar{x}(x)\). Hence, the sample mean would be
\[\bar{x}=\frac{\sum_{i=1}^n(x_i)}{n}\]
If x is indeed i.i.d. then \(\hat{x}\) would serve as an unbiased estimator for \(\mu\). Therefore, the sample mean has certain properties.
\[\mathbb{E}[\bar{x}]=\mu \ and \ Var(\bar{x})=\frac{\sigma^2}{\sqrt{n}}\]
Then according to the weak law of large numbers suggest that the probability of deviation from the population parameter decreases when sample size \(n\) increases till it eventually converges.
\[\lim_{n\to\infty} P\left(|\bar{X}_n-\mu|\geq \epsilon\right)=0\] This means that according to the central limit theorem \[Z_n=\frac{\bar{X}_n-\mu}{\sigma/\sqrt(n)}\] converges to the normal. The probability of observing \(Z\) under a long-run of repetitions
\[P(-1.96\lesssim Z \lesssim 1.96)=0.95\].
Similar, the probability of the intervals of \(\bar{x}\) to cover \(\mu\) in a long-run of repeated experiments at 95% is
\[1 - c = P(\bar{X}_n > \mu - 1.96 \cdot \frac{\sigma}{\sqrt{n}}) ~\text{and}~ P(\bar{X}_n < \mu - 1.96 \cdot \frac{\sigma}{\sqrt{n}})\].
A visual explanation of this concept is provided in the following Shiny app: https://snwikaij.shinyapps.io/shiny/.
Furthermore, it is clear that statistics does do nothing with causality and the focus on error-control. Causality starts by satisfying theoretical conditions needed the arrive at believes in these concepts. Hence, error-control and causality start a-priori (Fisher 1949, Pearl 2009, Mayo 2018). Such a focus and framework is extremely useful if objectivity over repetitions are the goal and favorable. While classical statistics focuses on fixed but unknown parameters, error-control and objectivity of information, Bayesian methods extend this perspective by introducing prior information and viewing parameters as random variables. This shift opens the door to more flexible and informative inference, as explained in the next section.
Informally Bayes Theorem would be notated \(\text{Posterior}\, \text{probability} = \frac{\text{Likelihood} \cdot \text{Prior}}{\text{Evidence}}\). More formally Bayes theorem is often notated with A and B where P indicates probability and ‘|’ given or conditional on. \(P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}\).Other expression such as \(P(\theta|Data, Info) = \frac{P(Data|\theta) \cdot P(\theta|info)}{P(Data)}\) are to highlight that the posterior describes the information of that conditional on the prior information that is given in there.
The derivation of Bayes theorem relies on the axioms probability theory.
Premise 1)
\[ P(A | B) = \frac{P(A \cap B)}{P(B)} \] similarly
\[ P(B |A) = \frac{P(B \cap A)}{P(B)} \] Premise 2)
Also, the joint probability, expressed as a set-theoretic relationship on \(z\), indicates that element of both sets are the same.
\[ z = \{x : x \in A \cap B : x \in B \cap A\} \] thus
\[ P(A \cap B) = P(B \cap A) \] Premise 3)
In accordance with the previous
\[ P(A| B) \cdot P(A) = P(A \cap B) \] and
\[ P(B | A) \cdot P(B) = P(B \cap A) \] Conclusion)
Therefore
\[ P(A | B) \cdot P(A) = P(B | A) \cdot P(B) \] \[ P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)} \]
The previous expression can help us with answer simple questions. Assume that there is a likelihood of 0.7 \(P(Species|<Threshold)\) that a species if found below a certain threshold. Furthermore, we also know that that the environment will only be found 0.3 or 30% of the time as \(P(<Threshold)\).How probable would it then be we are below the threshold if we observe the species \(P(<Threshold|Species)\)?
\[ P(Species|<Threshold) = 0.7\\ P(<Threshold) = 0.3\\ P(<Threshold|Species) = ?\\ \]
Expressing this in Bayes theorem would result in
\[ P(<Threshold|Species)=\frac{P(<Threshold|Species)\cdot P(Species) }{P(<Threshold)} \]
The only thing still required is \(P(Species)\) often called the ‘evidence’. Yet, this evidence is simply the total probability of observing a species, below and above the threshold. For this we can assume that this is the reverse of the \(P(Species|<Threshold)\) and \(P(<Threshold)\)
\[ P(Species)=[P(<Threshold|Species)\cdot P(Species)] + [(1-P(<Threshold|Species))\cdot(1-P(Species))]\\ 0.42=[0.7\cdot0.3]+[0.3\cdot0.7] \]
Then it is simply filling in the blanks
\[ P(<Threshold|Species)=\frac{P(<Threshold|Species)\cdot P(<Threshold) }{P(Species)}=\frac{0.7\cdot0.3}{0.42}=0.5 \]
The answer not very satisfying as the probability is simply a ‘coin toss’. This could be improved if we would introduce more species with the same indicative potential.
\[ P(<Threshold|Species)=\frac{P(<Threshold|Species)\cdot P(<Threshold) }{P(Species)}=\frac{(0.7^2*0.3)}{[(0.7^2*0.3)+(1-0.7)^2*(1-0.3)]}=0.7 \] We would need around five species to get an indicative potential of >0.95 (it would be 0.97).
The previous example works for simpler approximations yet if we want to derive an interval for a particular parameter \(\theta\) then we can approach this analytically using conjugate priors. Where a prior is conjugate to a likelihood if the resulting posterior is in the same family as the prior.
As introduced, in statistics and estimation is about finding out the value for \(\theta\) which is assumed be \(\mu\). Where in the frequentist framework this is considered fixed and unknown, this is in the Bayesian framework considered to be random ‘and approximately’ known. Of course also in the Bayesian framework samples \(x\) are taken. Assume that we already know something about \(\mu\) then it is possible to restrict to exclude unreasonable values or for the information we have on \(\mu\) to more acceptable values.
\[ P(\mu|Data) = \frac{P(Data|\mu) \cdot P(\mu)}{P(Data)} \]
For a simple mean and variance an analytical approach can be used to derive the posterior given the likelihood and prior via the following equations.
\[\mu_{posterior} =\frac{\frac{\mu_{prior} }{\sigma_{prior}^2} + \frac{\hat{x}_{data} }{\sigma_{data}^2}}{ \frac{1}{\sigma_{prior}^2} + \frac{1}{\sigma_{data}^2}} \\ \sigma_{posterior}=\sqrt{\frac{1}{\frac{1}{\sigma_{prior}^2}+\frac{1}{\sigma_{data}^2}}}\]
Derivation:
Premise 1)
Bayes rule can be simplified to \[P(\mu|Data) \propto P(Data|\mu) \cdot P(\mu) \\ N(\mu_{posterior}, \sigma_{posterior}^2)=N(\mu_{sample}, \sigma_{sample}^2)\cdot N(\mu_{prior}, \sigma_{prior}^2)\]
Premise 2)
The PDF for the normal distribution is \[f(x)=\frac{1}{2\cdot \sqrt{\sigma \pi}}\cdot exp(-\frac{1}{2}(\frac{x-\mu}{\sigma})^2)\]
Premise 3)
\[Prior: P(\mu_{prior})=\frac{1}{2\cdot \sqrt{\sigma_{prior} \pi}}\cdot exp(-\frac{1}{2}(\frac{\theta-\mu_{prior}}{\sigma_{prior}})^2) \\ Likelihood: P(Data|\mu_{sample})=\frac{1}{2\cdot \sqrt{\sigma_{sample} \pi}}\cdot exp(-\frac{1}{2}(\frac{\mu_{sample}-\theta}{\sigma_{sample}})^2) \]
Premise 4)
Both \(\frac{1}{2\cdot \sqrt{\sigma_{prior} \pi}}\) and \(\frac{1}{2\cdot \sqrt{\sigma_{sample} \pi}}\) are scalars and can be left out of the equation.
Premise 5)
Since both exponent have the same base we can add the exponent \[(a^2+b^2=a^{2+2})\] resulting in
\[exp(-\frac{1}{2}\cdot[(\frac{\theta-\mu_{prior}}{\sigma_{prior}})^2+(\frac{\mu_{sample}-\theta}{\sigma_{sample}})^2]\]
After which brackets can be moved \[exp(-\frac{1}{2}\cdot[\frac{(\theta-\mu_{prior})^2}{\sigma_{prior}^2}+\frac{(\mu_{sample}-\theta)^2}{\sigma_{sample}^2}])\]
Premise 6)
Expanding the brackets terms
\[(a^2+b^2)=(a-b)\cdot(a-b)=a^2-ab-ab+b^2=a^2-2ab+b^2\] This means \[(\theta-\mu_{prior})^2=\theta^2-2\theta\mu_{prior}+\mu_{prior}^2\] and \[(\mu_{sample}-\theta)^2=\mu_{sample}^2-2\mu_{sample}\theta+\mu_{sample}^2\] which can be replaced in premise 5 \[exp(-\frac{1}{2}\cdot[\frac{\theta^2-2\theta\mu_{prior}+\mu_{prior}^2}{\sigma_{prior}^2}+\frac{\mu_{sample}^2-2\mu_{sample}\theta+\mu_{sample}^2}{\sigma_{sample}^2}])\]
Premise 7)
Separating each term by dividing by \(\sigma_{prior}^2\) and \(\sigma_{sample}^2\)
\[exp(-\frac{1}{2}\cdot\frac{\theta^2}{\sigma_{prior}^2}+\frac{-2\theta\mu_{prior}}{\sigma_{prior}^2}+\frac{\mu_{prior}^2}{\sigma_{prior}^2}+ \frac{\mu_{sample}^2}{\sigma_{sample}^2}+\frac{-2\mu_{sample}\theta}{\sigma_{sample}^2}+\frac{\mu_{sample}^2}{\sigma_{sample}^2})\]
Premise 8)
Group each term by the nominator
\[\frac{\theta^2}{\sigma_{prior}^2}+\frac{-2\theta\mu_{prior}}{\sigma_{prior}^2}+\frac{\mu_{prior}^2}{\sigma_{prior}^2}+ \frac{\mu_{sample}^2}{\sigma_{sample}^2}+\frac{-2\mu_{sample}\theta}{\sigma_{sample}^2}+\frac{\mu_{sample}^2}{\sigma_{sample}^2}= \\ \theta^2(\frac{1}{\sigma_{prior}^2}+\frac{1}{\sigma_{sample}^2})+ -2\theta(\frac{\mu_{prior}}{\sigma_{prior}^2}+\frac{\mu_{sample}}{\sigma_{sample}^2}) +(\frac{\mu_{prior}^2}{\sigma_{prior}^2}+\frac{\mu_{sample}^2}{\sigma_{sample}^2}) \] Since the last group is not dependent on \(\theta\) it is not in our focus \[ exp(-\frac{1}{2}\cdot[\frac{\theta^2}{\sigma_{prior}^2}+\frac{-2\theta\mu_{prior}}{\sigma_{prior}^2}+\frac{\mu_{prior}^2}{\sigma_{prior}^2}+ \frac{\mu_{sample}^2}{\sigma_{sample}^2}+\frac{-2\mu_{sample}\theta}{\sigma_{sample}^2}+\frac{\mu_{sample}^2}{\sigma_{sample}^2}= \\ exp(-\frac{1}{2}\cdot\theta^2(\frac{1}{\sigma_{prior}^2}+\frac{1}{\sigma_{sample}^2})+ -2\theta(\frac{\mu_{prior}}{\sigma_{prior}^2}+\frac{\mu_{sample}}{\sigma_{sample}^2}) +not\ dependent\ on\ \theta]) \]
Premise 9)
The goal is to derive \(P(\mu|Data)\) from \(P(\mu|Data) \propto P(Data|\mu) \cdot P(\mu)\) An the general exponential form of the normal distribution is given in Premise 2 and the premises 6, 7 and 9 lead to \[\frac{1}{2}\cdot \theta^2 (\frac{1}{\sigma^2})+\theta(\frac{\mu}{\sigma^2})+C= \\ \frac{1}{2}\cdot\theta^2A+\theta B+C\] the general exponential form for the normal distribution is always \(\frac{1}{2}\cdot\theta^2A+\theta B+C\) meaning that \(A=\frac{1}{\sigma^2}\) and \(B=\frac{\mu}{\sigma^2}\) and to obtain the standard deviation \(A\) needs to be re-arranged to \(\sigma = \sqrt{\frac{1}{A}}\) and to obtain the mean \(\mu=\frac{B}{A}=\frac{\frac{\mu}{\sigma^2}}{\frac{1}{\sigma^2}}\)
Conclusion)
In Premise 8 \[ exp(-\frac{1}{2}\cdot\theta^2(\frac{1}{\sigma_{prior}^2}+\frac{1}{\sigma_{sample}^2})+ -2\theta(\frac{\mu_{prior}}{\sigma_{prior}^2}+\frac{\mu_{sample}}{\sigma_{sample}^2})+C) \] In Premise 9 \[ \sigma = \sqrt{\frac{1}{A}}, A=\frac{1}{\sigma^2}\\ \mu=\frac{B}{A}=\frac{\frac{\mu}{\sigma^2}}{\frac{1}{\sigma^2}} \] Which implies that \[ \sigma_{posterior}=\sqrt{\frac{1}{\frac{1}{\sigma_{prior}^2}+\frac{1}{\sigma_{sample}^2}}}\\ \mu_{posterior}=\frac{\frac{\mu_{prior}}{\sigma_{prior}^2} + \frac{\mu_{sample}}{\sigma_{sample}^2}}{\frac{1}{\sigma_{prior}^2} + \frac{1}{\sigma_{sample}^2}} \] Another way to obtain the posterior including the sample size is via: \[\mu_{posterior}=\frac{\frac{\mu_{prior}}{\sigma_{prior}^2}+\mu_{sample}*\frac{n}{\sigma_{sample}^2}} {\frac{1}{\sigma_{prior}^2}+\frac{n}{\sigma_{sample}^2}}\]
Derivation:
Premise 1)
\[ Prior: P(\mu_{prior})=\frac{1}{2\cdot \sqrt{\sigma_{prior} \pi}}\cdot exp(-\frac{1}{2}(\frac{\theta-\mu_{prior}}{\sigma_{prior}})^2) \\ Likelihood: P(Data|\mu_{sample})=\prod_{i=1}^n \frac{1}{2\cdot \sqrt{\sigma_{sample} \pi}}\cdot exp(-\frac{1}{2}(\frac{x_i-\theta}{\sigma_{sample}})^2) \]
Premise 2)
Both \(\frac{1}{2\cdot \sqrt{\sigma_{prior} \pi}}\) and \(\frac{1}{2\cdot \sqrt{\sigma_{sample} \pi}}\) are scalars and can be left out of the equation.
Premise 3)
The likelihood is the product of \(n\)>1 random variables \(exp(a)\cdot exp(b) = exp(a+b)\) thus \(exp(a_i)\cdot, ...,\cdot exp(a_n)=exp(\sum_{i=1}^n(a_i))\).
\[ exp(\sum_{i=1}^n-\frac{1}{2}\cdot(\frac{x_i-\theta}{\sigma_{sample}})^2)=exp(-\frac{1}{2}\cdot\sum_{i=1}^n(\frac{x_i-\theta}{\sigma_{sample}})^2) \] Premise 4)
As in premise 6 of the previous derivation we expand all terms and ignore terms independent of \(\theta\).
\[ \sum_{i=1}^n(x_i-\theta)=\sum_{i=1}^nx_i^2-2x_i\cdot \theta +\theta^2 =\sum_{i=1}^nx_i-\sum_{i=1}^n2x_i\cdot\theta+\sum_{i=1}^n\theta^2= \sum_{i=1}^nx_i-2\theta\sum_{i=1}^nx_i+\sum_{i=1}^n\theta^2= -2\theta\sum_{i=1}^nx_i+n\theta^2 \]
Premise 5)
Substitute the expression back into the equation.
\[ exp(\frac{1}{2}\cdot[\frac{-2\theta\sum_{i=1}^nx_i+n\theta^2}{\sigma^2_{sample}}]) \]
Premise 6)
The posterior can then be rewritten as \(P(\mu|Data) \propto P(Data|\mu) \cdot P(\mu)\)
\[ exp(\frac{1}{2}\cdot[\frac{-2\theta\sum_{i=1}^nx_i+n\theta^2}{\sigma^2_{sample}}]) *exp(-\frac{1}{2}\cdot(\frac{\theta-\mu_{prior}}{\sigma^2_{prior}})^2)=\\ exp(\frac{1}{2}\cdot[\frac{-2\theta\sum_{i=1}^nx_i+n\theta^2}{\sigma^2_{sample}}+\frac{\theta-\mu_{prior}}{\sigma^2_{prior}})^2]) \]
Premise 7)
Expanding the term of the nominator in the prior and substitute it back in the previous equation. \[ (\theta-\mu_{prior})^2=\theta^2-2\theta\mu_{prior}+\mu_{prior}^2 \\ \frac{\theta^2}{\sigma_{prior}^2}+\frac{-2\theta\mu_{prior}}{\sigma_{prior}^2}+\frac{\mu_{prior}^2}{\sigma_{prior}^2} \\ exp(-\frac{1}{2}\cdot[\frac{\theta^2}{\sigma_{prior}^2}+\frac{-2\theta\mu_{prior}}{\sigma_{prior}^2}+\frac{\mu_{prior}^2}{\sigma_{prior}^2}+ \frac{-2\theta\sum_{i=1}^nx_i+n\theta^2}{\sigma^2_{sample}}]) \]
Premise 8)
Expand the last term and divide by \(\sigma^2_{sample}\)
\[ exp(-\frac{1}{2}\cdot[\frac{\theta^2}{\sigma_{prior}^2}+\frac{-2\theta\mu_{prior}}{\sigma_{prior}^2}+\frac{\mu_{prior}^2}{\sigma_{prior}^2}- \frac{2\theta\sum_{i=1}^nx_i}{\sigma^2_{sample}}+\frac{n\theta^2}{\sigma^2_{sample}}]) \]
Premise 9)
Group each term by its nominator
\[ exp(-\frac{1}{2}\cdot\theta^2(\frac{1}{\sigma_{prior}^2}+\frac{n}{\sigma_{sample}^2})+ -2\theta(\frac{\mu_{prior}}{\sigma_{prior}^2}+\frac{\sum_{i=1}^nx_i}{\sigma_{sample}^2}) +not\ dependent\ on\ \theta])\]
Since: \(\sum_{i=1}^nx_i=\mu_{sample}\cdot n\)
\[ exp(-\frac{1}{2}\cdot\theta^2(\frac{1}{\sigma_{prior}^2}+\frac{n}{\sigma_{sample}^2})+ -2\theta(\frac{\mu_{prior}}{\sigma_{prior}^2}+\frac{\mu_{sample}\cdot n}{\sigma_{sample}^2}) +not\ dependent\ on\ \theta]) \]
Conclusion)
From the steps 8 and 9 in the previous derivation we arive at
\[ \sigma_{posterior}=\sqrt{\frac{1}{\frac{1}{\sigma_{prior}^2}+\frac{n}{\sigma_{sample}^2}}}\\ \mu_{posterior}=\frac{\frac{\mu_{prior}}{\sigma_{prior}^2} + \frac{\mu_{sample}\cdot n}{\sigma_{sample}^2}}{\frac{1}{\sigma_{prior}^2} + \frac{n}{\sigma_{sample}^2}} \]
As might be clear this is less computational heavy than MCMC methods. For more then two parameter such an analytically approach becomes more cumbersome. And, if conjugacy is not satisfied no closed form solution is available. In this regards, Laplacian approximation is also computational easy. Yet, the equation clearly formulate the idea what happens in Bayes theorem.
Approximate Bayesian Computation with rejection sampling (ABC-rejection) is a computationally expensive method for approximating the posterior distribution. However, when the number of parameters is relatively small, the posterior can still be approximated quite well. ABC-rejection is especially useful when the likelihood function cannot be computed or approximated accurately.
One example is the use of ABC to explore potential bias in the EcoPostView package. In a simplified case, assuming both the prior and the data-generating model are normally distributed, the ABC-rejection algorithm begins by simulating a parameter from the prior distribution.
\[ \mu_{i}^*\sim N(\mu_{prior},\sigma_{prior}^2) \\ \sigma_{i}^{2*}\sim Exp(rate) \]
The asterisk (\(^*\)) denotes that these parameters are temporary, and this will become important later.Next, a data-generating model is used to simulate data based on these temporary parameters. We assume the observed data is approximately normally distributed, though any model could be used. For each simulation, we generate \(n_{data}\) values.
\[ x_{i}\sim N(\mu^*, \sigma^{2*}) \]
Depending on the parameter of interest (e.g., \(\mu\), \(\sigma\), mode, or median), a summary statistic is computed from the simulated data. In this example, we focus on estimating \(\mu\).
\[ \hat{x}_{sim, i}=\frac{\sum_{i=1}^n(x_i, ..., x_n)}{n_{data}} \]
Each simulated mean \(\hat{x}_{sim, i}\) (typically out of 100,000 simulations) is compared to the observed mean \(\hat{x}_{data}\) using the Euclidean distance.
\[ E_{i}=\sqrt{(\hat{x}_{sim, i} - \hat{x}_{data})^2} \]
A tolerance threshold is then selected to determine which simulated values are accepted. Simulations with \(E_i > tolerance\) are rejected, while those with \(E_i \leq tolerance\) are retained. While a tolerance of zero would yield the most accurate posterior, it would typically result in rejecting all simulations. On the other hand, setting the tolerance too high would allow in too many poor matches.
Each accepted simulation corresponds to an accepted pair of simulated parameters \(\mu_{i}^*, \sigma_{i}^{2}*\). Since all \(\mu_{i}^*\) were originally drawn from the prior, the subset of accepted values approximates the posterior distribution of \(\mu\).
Instead of \(P\) the function ‘\(f\)’ are used this to highlight that the probability is a mapping function. A mapping function being a ‘rule’ that maps \(x\) to \(y\) and so \(y=f(x)\). \[ f(\beta \mid Data, Info) = \frac{f(Data \mid \beta) \cdot f(\beta \mid Info)} {\int f(Data \mid \beta) \cdot f(\beta \mid Info)} \] The integral in the denominator is used to scale the posterior probability to one. This expression is sometimes simplified to \[f(\beta \mid Data, Info) = f(Data \mid \beta) \propto f(\beta \mid Info)\] Where the \(\propto\) symbol indicates ‘proportional to’ highlighting the idea of exchangeability. Therefore, the posterior is nothing more than a function that describes the probability \(y\) as a function of \(\beta\) conditional on \(Data\) and \(Info\) (\(y=f(\beta \mid Data, Info)\)). This cannot be solely conditional on the \(Data\) as the \(Data\) is not uncertain our information/believe is uncertain about a none existing object \(\beta\) (unless Platonism is true).
In the previous part a single prior model was used. Bayesian Model Averaging (BMA) has the advantages that it allows multiple (\(k\)) functions to be utilized as prior. I specifically choose the use of \(f\) so multiple priors as \(f_k\) in the equation below can be seen nothing more as multiple functions (or models). This in my opinion makes it easier to see that there is only optimized between multiple functions. It sound weird to say to optimize between probabilities. Hence, multiple possible scenarios that could have been responsible for \(\beta\) can be introduced as below. \[ f(\beta \mid Data,Info) = \frac{f(Data \mid \beta) \cdot f_k(\beta \mid Info)}{\int \left( \sum_{k=1}^{k} f(Data \mid \beta) \cdot f_k(\beta \mid Info) \right)} \] Now it should be clear that each \(\beta\) contained within \(g(E(y \mid x_{ij})) = \sum_{j=1}^{v} \beta_j \cdot x_{ij}\) is being restricted by the prior models. While in frequentism it is unrestricted and ‘complete indifference’ towards the possibility of \(\beta\). All these methods can be used in a meta-analysis.
A standard meta-analysis uses a measure of location (mean) and scale (precision) to estimate a pooled value based on all parameters. For a fixed meta-analysis the pooled parameter is derived via the following equation. \[\theta_{pooled} = \frac{\sum_{i=1}^{k}(\theta_i\cdot w_i)}{\sum_{k=1}^kw_i}\] \(\theta_i\) is the extracted effect-size for a study \(i\). The \(w_i\) is the weight per study \(i\) for allk \(k\) studies, derived from the precision \(1/se_i^2\) via the equation below. \[w_i = \frac{1}{se_i^2}\] The standard error for the pooled effect-size can then be derived via the formula given below.
\[se(\theta_{pooled})=\frac{1}{\sqrt\sum_{i=1}^{k}(w_i)}\] For a random-effect meta-analysis the variance between studies is separately modeled. In the metafor package REML or (Restricted Maximum Likelihood) is used to estimate this between study variance. However it is also possible using the DerSimonian and Laird method. \[ \tau^2=max(0, \frac{Q-(k-1)}{\sum_{i=1}^{k}\frac{1}{w_i}-\frac{\sum_{i=1}^{k}1/w_i^2}{\sum_{i=1}^{k}1/w_i}})\ \\ w^*_i=\frac{1}{(\frac{1}{w_i}+\tau^2)} \\ \theta_{pooled} = \frac{\sum_{i=1}^{k}(\theta_i\cdot w^*_i)}{\sum_{i=1}^{k}(w^*_i)} \\ se(\theta_{pooled})=\frac{1}{\sqrt(\sum_{i=1}^{k}w^*_i)} \] If we now go back to how we analytically derived the posterior we can devise a function that can analytically perform a fixed effect meta-analysis with ease. I have placed this in a function called ‘abmeta’. In in simple cases it approximates the results of metafor and the meta function inf EcoPostView relatively well. Of course the variance component slightly differs with that from metafor and the ‘meta’ function due to the different method of estimation.
In a meta-analysis we do not talk about \(\beta\) but about a set of estimates \(\beta=\{\beta_{i}, ..., \beta_{n}\}\) meaning that \(f(Meta-data\mid\{\beta_{i}, ..., \beta_{n}\})\). Hereby the flexibility allows that these estimates are either likelihood estimates (\(\hat{\beta}\)) or posterior estimates (\(\beta\)). and we end up with an expression that should capture the inference to an underlying pooled model parameter. \[ f(\beta_{poolded} \mid Meta-data,Info) = \frac{f(Meta-data \mid \{\beta_{i}, ..., \beta_{n}\}) \cdot f_k(\beta_{pooled} \mid Info)}{\int \left( \sum_{k=1}^{m} f(Meta-data \mid \{\beta_{i}, ..., \beta_{n}\}) \cdot f_k(\beta_{pooled} \mid Info) \right)} \] Assuming the pooled parameter \(\beta_pooled\)is derived the equation layed out before the variance of the pooled parameter can be analytically derived as given by Hoeting et al. (1999):
\[ SE(\beta_{pooled}) = \sqrt{\sum^m_{k=1}( w_{prior} \cdot (\beta_k^2+SE(\beta_k)^2))-\beta_{pooled}^2}\\ \]
Bayesian sequential updating refers to the practice of re-using the derived posterior of a previous model as the prior for the new model. For this the assumption of conditional independence between the the datasets is assumed. The parameter of interest is \(\theta\) based on a dataset \(Data_1\) and we derive the posterior. \[P(\theta|Data_1) = \frac{P(Data_1|\theta) \cdot P(\theta)}{P(Data_1)}\] The next would be \[P(\theta|Data_1, Data_2) = \frac{P(Data_2|\theta) \cdot P(\theta|Data_1)}{P(Data_2)}\] till \[P(\theta|Data_n) = \frac{P(Data_n|\theta) \cdot P(\theta|Data_1,\cdots,Data_{n-1})}{P(Data_n)}\]
For example, we would like to know what \(\mu\) from a population of interest. Our example population has \(\mu=0.5\), \(\sigma=5\) and each study would have an error of \(\alpha = 40\%\) when when we assume \(\alpha=5\%\) (meaning that our heterogeneity is larger than expected). Our first prior starts with \(N(0, 5)\) after which the posterior of previous is sequentially re-used visually represented in Fig. 2a below. Where more studies increase the precision of the estimated posterior.
If the focus lies on objectivity and the error control over the different studies and assume iid then the curve between studies would follow that of Fig. 2b below.
In a less formal way is the Bayesian framework more focused on transfer of information an precision. On the other hand the frequentist framework is more interested in objectivity, consistency and error among studies.
Figure 2: Sequential updating with credibility intervals on the left panel and a long-run of means with confidence intervals on the right The left panel.
I do not believe statistics reflects uncertainty about events; rather, it reflects the information in the data under a particular model or the uncertainty about our belief in a parameter (\(\theta, \beta, \mu\), etc.). The later concept is often vague and confusing because, if one assumes the parameter does not exist independently of the mind, then what exactly is uncertain - our belief? The claim to ‘objective probability’ is already compromised by the assumption that the parameter is objective. However, if the parameter does not exist outside the mind, the meaning of ‘objective’ in this context becomes questionable.
When people refer to objectivity, they often mean that the data itself is the most ‘objective’ part of the process. However, if some conditions are not met, such as (1) the data is not randomly sampled from a population of interest, (2) the model is not pre-selected in advance, (3) a sufficiently large sample size is not chosen based on the model, and (4) confounding variables are present, then even the data cannot be considered truly objective unless these limitations are explicitly acknowledged. Moreover, model selection procedures further contaminate the objectivity of the data, meaning that the estimated model parameters no longer fully reflect the objectivity of the data which is often implied in our conclusions (Gelman and Loken, 2013; Tong, 2019).
In Bayesian updating, the prior reflects the extent to which we want to sacrifice over the objectivity of the likelihood by using information which cannot be formalized into the likelihood. This is captured by the relationship \(f(\theta \mid Data, Info) = f(Data \mid \theta) \propto f(\theta \mid Info)\)
The posterior, therefore, is merely the weighted combination of the prior and likelihood. It represents the relationship (e.g., \(0.25\) as \(0.5 \cdot 0.5\)) between the prior and the likelihood. There is no invalidity in a logical argument such as:(Premise 1.) All unicorns are orange. (Premise 2.) I have a unicorn. (Conclusion) Therefore, my unicorn is orange.
While this argument may be unsound — because unicorns do not exist — the reasoning itself is not flawed. The issue lies with the premises, not the structure of the argument. Hence, Uncertainty does not exist in the ‘real’ world; it resides solely in our minds. We cannot be ‘wrong’ or ‘correct’ about \(f(\beta \mid \text{Data, Info})\) because it does not exist as a tangible entity. Even if it did, its existence would have no impact on reality because uncertainty is unrelated to the way reality operates. In the real world, events either occur or they do not. If my unicorn does not exist, I will never see it, and it was never orange in the first place.
We should also avoid treating models as a definitive representation of reality. Models are tools that convey information and serve as pragmatic instruments. The the model itself is not the result, the strength of the results relies on the argument, and how well the premises within the argument are clarified and supported by the model.
Anat, Biletzki, and Matar Anat. 2021. “Ludwig Wittgenstein.” Stanford Encyclopedia of Philosophy: 46. https://plato.stanford.edu/entries/wittgenstein/.
Baguley, Thom. 2009. “Standardized or Simple Effect Size: What Should Be Reported?” British Journal of Psychology 100(3): 603–17. doi: 10.1348/000712608X377117.
Bárdos, Dániel, and Adam Tamas Tuboly. 2025. Science, Pseudoscience, and the Demarcation Problem. Cambridge: Cambridge University Press. doi: 10.1017/9781009429597.
Barnes, Jonathan. 2007. The Toils of Scepticism. Digitally print. version. Cambridge: Cambridge Univ. Press.
Barwise, Jon, John Etchemendy, Gerard Allwein, Dave Barker-Plummer, and Albert Liu. 2002. Language, Proof, and Logic. Stanford, Calif: CSLI Publications.
Bronzo, Silver. 2022. Wittgenstein on Sense and Grammar. Cambridge New York (N.Y.): Cambridge university press.
Carthwright, Nancy, Jordi Cat, Lola Fleck, and E. Thomas Uebel. 1996. Otto Neurath: Philosophy between Science and Politics. Cambridge: Cambridge university press.
Cramer, J. S. 1991. The Logit Model: An Introduction for Economists. London: Edward Arnold.
Csilléry, Katalin, Michael G.B. Blum, Oscar E. Gaggiotti, and Olivier François. 2010. “Approximate Bayesian Computation (ABC) in Practice.” Trends in Ecology & Evolution 25(7): 410–18. doi: 10.1016/j.tree.2010.04.001.
de Groot, W.T. 1998. “Problem-In-Context: A Framework for the Analysis, Explanation and Solution of Environmental Problems.” In Environmental Management in Practice: Vol 1: Instruments for Environmental Management, Oxford: Routledge, 22–43.
Dewey, John. 1997. How We Think. Mineola, N.Y: Dover Publications.
Ferrari, Filippo, and Massimiliano Carrara. 2025. Logic and Science: An Exploration of Logical Anti-Exceptionalism. Cambridge: Cambridge University Press. doi: 10.1017/9781009233897.
Feyerabend, Paul. 1975. Against Method: Outline of an Anarchistic Theory of Knowledge. 4th ed. London and New York: Verso.
Fisher, R. A. (1949). The Design of Experiments (5th ed.). Oliver and Boyd.
Foucault, Michel. 1981. Power / Knowledge: Selected Interviews and Other Writings 1972 - 1977. ed. Colin Gordon. New York: Pantheon Books.
Fumerton, Richard A. 1995. Metaepistemology and Skepticism. Lanham, Md.: Rowman & Littlefield.
Gelman, Andrew, and Eric Loken. 2013. “The Garden of Forking Paths: Why Multiple Comparisons Can Be a Problem, Even When There Is No ‘FIshing Expedition’ or ‘p-Hacking’ and the Research Hypothesis Was Posited Ahead of Time.” 348(3): 1–17. doi: 10.1007/978-3-658-12153-2_7.
Haack, Susan. 1978. Philosophy of Logics. 1st ed. Cambridge University Press. doi: 10.1017/CBO9780511812866.
Harman, Gilbert. 1986. Change in View: Principles of Reasoning. A Bradford Book, The MIT press Cambridge, Massachusetts, London England.
Hartig, Florian, Justin M. Calabrese, Björn Reineking, Thorsten Wiegand, and Andreas Huth. 2011. “Statistical Inference for Stochastic Simulation Models - Theory and Application: Inference for Stochastic Simulation Models.” Ecology Letters 14(8): 816–27. doi: 10.1111/j.1461-0248.2011.01640.x.
Heidegger, Martin. 1929. “What Is Metaphysics.” https://www.stephenhicks.org/wp-content/uploads/2013/03/heideggerm-what-is-metaphysics.pdf.
Heidegger, Martin. 2016. Logic: The Question of Truth. First paperback edition. Bloomington Indianapolis: Indiana University Press.
Hinne, Max, Quentin F. Gronau, Don Van Den Bergh, and Eric-Jan Wagenmakers. 2020. “A Conceptual Introduction to Bayesian Model Averaging.” Advances in Methods and Practices in Psychological Science 3(2):200–215. doi: 10.1177/2515245919898657.
Hoeting, Jennifer A., David Madigan, Adrian E. Raftery, and Chris T. Volinsky. 1999. “Bayesian Model Averaging: A Tutorial.” Statistical Science 14(4):382–417. doi: 10.1214/ss/1009212519.
Hume, David. 1739. A Treatise of Human Nature [2007, Dutch Translated Version: Traktaat over de Menselijk Natuur]. Amsterdam: Uitgeverij Boom.
Kale, Alex, Francis Nguyen, Matthew Kay, and Jessica Hullman. 2019. “Hypothetical Outcome Plots Help Untrained Observers Judge Trends in Ambiguous Data.” IEEE Transactions on Visualization and Computer Graphics 25(1): 892–902. doi: 10.1109/TVCG.2018.2864909.
Labukt, Ivar. 2021. “Is Logic Distinctively Normative?” Erkenntnis 86(4): 1025–43. doi: 10.1007/s10670-019-00142-1.
Lamme, V. 2011. Vrije Wil Bestaat Niet. Amsterdam: Prometheus.
Laudan, Larry. 1983. “The Demise of the Demarcation Problem.” In Physics, Philosophy and Psychoanalysis, Boston Studies in the Philosophy of Science, eds. R. S. Cohen and L. Laudan. Dordrecht: Springer Netherlands, 111–27. doi: 10.1007/978-94-009-7055-7_6.
Lazurca, Vladimir. 2025. “Scepticism about Meaning in the German Enlightenment.” International Journal for the Study of Skepticism 15(1): 1–31. doi: 10.1163/22105700-bja10095.
Lee, Siu-Fan. 2017. Logic: A Complete Introduction. Great Britain: Hodder & Stoughton.
Lynch, Michael P. 2001. The Nature of Truth: Classic and Contemporary Pespectives. Cambridge, Massachusetts, London, England: The MIT press.
Maier, Maximilian, František Bartoš, and Eric-Jan Wagenmakers. 2023. “Robust Bayesian Meta-Analysis: Addressing Publication Bias with Model-Averaging.” Psychological Methods 28(1): 107–22. doi: 10.1037/met0000405.
Mayo, D. G. (2018). Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars. Cambridge University Press.
Moreno, Santiago G., Alex J. Sutton, Ae Ades, Tom D. Stanley, Keith R. Abrams, Jaime L. Peters, and Nicola J. Cooper. 2009. “Assessment of Regression-Based Methods to Adjust for Publication Bias through a Comprehensive Simulation Study.” BMC Medical Research Methodology 9(1):2. doi: 10.1186/1471-2288-9-2.
Markowitz, David M., and Jeffrey T. Hancock. 2016. “Linguistic Obfuscation in Fraudulent Science.” Journal of Language and Social Psychology 35(4): 435–45. doi: 10.1177/0261927X15614605.
Pearl, J. (2009). Causality. Cambridge university press.
Pearl, Judea, and Dana Mackenzie. 2019. The Book of Why: The New Science of Cause and Effect. London: Penguin Books.
Peters, Jaime L., Alex J. Sutton, David R. Sones, Keith R. Abrams, and Lesley Rushton. 2006. “Comparison of Two Methods to Detect Publication Bias in Meta-Analysis.” JAMA 295(6):676. doi: 10.1001/jama.295.6.676.
Popper, Karl Raimund. 1968. The Logic of Scientific Discovery. Basic Books.
Puppo, Federico. 2019. Informal Logic: A “Canadian” Approach to Argument. Windsor Studies in Argumentation.
Shrader-Frechette, K. S., and Earl D. Mccoy. 1994. “How the Tail Wags the Dog: How Value Judgments Determine Ecological Science.” Environmental Values 3(2): 107–20. doi: 10.3197/096327194776679764.
Smith, Peter. 2021. An Introduction to Formal Logic. Second edition, reprinted with corrections. Monee, IL: Logic Matters.
Stanley, T. D., and Hristos Doucouliagos. 2014. “Meta-Regression Approximations to Reduce Publication Selection Bias.” Research Synthesis Methods 5(1):60–78. doi: 10.1002/jrsm.1095.
Stefan, Sienkiewicz. 2019. Five Modes of Scepticism: Sextus Empiricus and the Agrippan Modes. Great Clarendon Street, Oxfored, OX2 6DP, United Kingdom: Oxford University Press.
Stone, James V. 2015. Information Theory: A Tutorial Introduction. First edition. Sheffield, UK: Sebtel Press.
Tukey, John W. 1969. “Analysing Data: Sanctification or Detective Work?” American Psychologist 24(2): 83–91. doi: 10.1037/h0027108.
Tong, Christopher. 2019. “Statistical Inference Enables Bad Science; Statistical Thinking Enables Good Science.” The American Statistician 73(sup1): 246–61. doi: 10.1080/00031305.2018.1518264.
Usó-Doménech, J. L., and J. Nescolarde-Selva. 2016. “What Are Belief Systems?” Foundations of Science 21(1): 147–52. doi: 10.1007/s10699-015-9409-z
Van Ditmarsch, Hans, Wiebe Van Der Hoek, and Barteld Kooi. 2008. Dynamic Epistemic Logic. Dordrecht: Springer Netherlands. doi: 10.1007/978-1-4020-5839-4.
Van Fraassen, Bas C. 1980. The Scientific Image. Oxford University Press.
Van Zwet, E.W., Cator, E.A., 2021. “The significance filter, the winner’s curse and the need to shrink.” Stat. Neerlandica 75, 437–452. doi: 10.1111/stan.12241
Wasserstein, Ronald L., and Nicole A. Lazar. 2016. “The ASA Statement on p -Values: Context, Process, and Purpose.” The American Statistician 70(2): 129–33. doi: 10.1080/00031305.2016.1154108.
Wittgenstein, Ludwig. 1921. Tractatus Logico-Philosophicus [English Translation: Ogden C.K.]. Garden city, New York: Dover Publications.
Woolridge, Jeffery M. 2001. Econometric Analysis of Cross Section and Panel Data. Cambridge, Massachusetts, London, England: The MIT press.