1 Overview

This site reads like an incomplete book or essay. It serves as a way to organize my thoughts and collect fragments of information. The project originally started around the EcoPostView R package, but has since expanded into several loosely connected chapters that explore broader ideas.

On the left, you will find these chapters in the drop down menu. They begin with an explorations of the EcoPostView package, move through topics or reasoning, logic and belief some topics in and end with a chapter on statistics (focused analytically and empirical Bayes). The order is mixed: where a logical structure would start with abstract foundations and move toward application, here it begins with practice. Many readers will find it easier to start with the R functions and skip the more abstract chapters. That is completely fine — the site is meant to support that kind of entry point.

The reversed order reflects something about how we come to understand the world. We often begin with practice, then build frameworks around our experiences. But if we trace our tools and methods back far enough, we find ourselves dealing with metaphysical questions — about language, logic, and what we believe about reality. Reasoning, in my opinion, is not about discovering truth directly, but about trying to contain or express it through propositions. The point is not to assume these propositions are true, but to see what follows if we treat them as if they were.

Ideally, one might begin with reasoning and logic, then move through statistical ideas, and finally arrive at practical application. But going in reverse — from code and data to abstract beliefs — it might even be confronting. These themes can feel more existential than scientific, but the boundary between those categories is itself unclear to me (Bárdos and Tuboly 2025; Heidegger 1929; Laudan 1983; Popper 1968).

This site is still a work in progress — a kind of evolving rather than a polished presentation.

2 EcoPostView

2.1 Introduction

Ecological data is scattered throughout the literature in varying formats. It is often case-specific and not representative of the full range of ecological conditions encountered in the real world. As a result, a large amount of noise is introduced, limiting the generalizability of results. Moreover, there is no coherent framework to re-use, generalize, and integrate large amount of ecological data into understandable and predictive models. This lack of structure hampers ecological understanding and leads to a loss of valuable information.

As ecologists, we are not only interested in isolated, case-specific explanations or post-hoc rationalizations. We aim to build cumulative knowledge and apply it across diverse contexts. Working with logic, metadata, and stochastic processes can help solidify our understanding of ecological relationships. These tools enable us to make probability based statements and generate predictions. This R package is designed to assist with exactly that. Its goal is to facilitate the integration of multiple sources of information, combining their strengths to generate broader insights.

The core function of the package utilizes outputs from multiple fitted Linear and Generalized Linear Models ((G)LMs), leveraging Bayesian methods to generalize their effects into a single, meta-analytic (G)LM. This synthesized model can then be visualized (see Fig. 1 below) and used to make predictions on new data. The ultimate aim is to provide a robust, generalized understanding of ecological relationships by drawing from all available sources of information.

Figure 1: Expected evenness of aquatic invertebrates as a function of conductivity and fine sediment fraction within a river reach. The left panel shows the relationship on the response scale, while the right panel presents it on the log scale for improved visualization.

This package offers tools for ecological statistical modeling, with a strong emphasis on Bayesian methods. While a basic understanding of R is sufficient to use the core functions, a more in-depth theoretical background is provided in chapter 2 for those interested in exploring the underlying concepts. Please note that this is an actively developed R package, and improvements or updates may be introduced over time.

2.2 Data and models

The information required for a meta-analytic approach can often be extracted from figures, tables, datasets, or combinations of these sources. However, such information is rarely used in a consistent or standardized way. Put simply, multiple datasets are needed on which (G)LMs can be fitted (see Kaijser et al., …). This process often reveals that ecological data is noisy, potentially biased, and exhibits considerable heterogeneity across studies.

These challenges pose difficulties for drawing causal inferences or controlling statistical error. Both error control and causal conclusions require controlled environments, well-designed experiments, and the identification or modeling of confounding variables. While it may not be possible to impose such controls retrospectively (i.e., a posteriori), the existing information is still highly valuable.

This data can be used to make probabilistic statements, generate predictions, and estimate the a priori power required to design future studies that do focus on error control or causal inference. The purpose of this package is to enable such posterior analysis—allowing users to generalize ecological effects from the literature, visualize emerging patterns, and make informed predictions.

2.3 (Generalized) Linear Models (G)LMs and extracting data

To use this package effectively, data is required, and (G)LMs need to be fitted to that data. This (meta-)data can be obtained from figures (e.g., using tools like WebPlotDigitizer), tables (e.g., by converting PDFs to Excel files), datasets, or combinations of these sources (see Kaijser et al., …).

The so-called “effects” we refer to are, more precisely, model parameters, commonly known as the intercept and slope - typically denoted as (b0 or β0) and (b1 or β1). These parameters define the equation:

response variable = b0 + b1 · predictor variable

This package is built on the underlying philosophy that if we accept a reported parameter (e.g., (b1) to represent an “effect” of the predictor, then such an effect should ideally be generalizable. For instance, the relationship between chlorophyll-a and total phosphorus is widely considered generalizable across aquatic systems.

From here on, the term model parameter will be used instead of “effect.” By collecting estimates of (b0) and (b1) from various studies, we can build a pooled model that predicts responses for one or more new values (xi or \(x_i\)). This approach allows us to understand the magnitude of the relationship, assess its variability, and make informed predictions.

In the context of Generalized Linear Models (GLMs), the response variable is linked to the linear predictor through a link function, commonly denoted as g(…).For example, when using the identity link function, no transformation is applied. In this case, the expected value of y - written as E(y) or (E(y|x)) - is directly related to the linear component, just as in a standard LM. \[g(E(y_{i} \mid x_{i})) = \beta_0 + \beta_1 \cdot x_{i}\] However, in a GLM with, log- or logit-link it is easier to talk about log-linear relations \[log(E(y_{i} \mid x_{i})) = \beta_0 + \beta_1 \cdot x_{i}\] or logit-linear relations \[logit(E(y_{i} \mid x_{i})) = \beta_0 + \beta_1 \cdot x_{i}\] In a Generalized Linear Model (GLM), the slope is not a “true” slope in the geometric sense, since the relationship between the response variable (y) and predictor variable (x) is no longer a straight line.However, the model is still considered linear because the parameters are incorporated linearly in the linear predictor. As such, the terms coefficient or regression coefficient typically refer to these model parameters, denoted as (\(\beta\)).

In practice, I often prefer to work with elasticity or semi-elasticity coefficients (Wooldridge, 2001), which can offer an interpretable measures in log-linear or logit-linear models. That said, their use is context-dependent and may not always be appropriate. The elasticity coefficient quantifies the percentage change in \(y\) associated with a 1% change in \(x\). For example, an elasticity of 0.2 implies a 0.2% increase in \(y\) for every 1% increase in \(x\). In a log-log model: \(y\) given 1% in \(x\). Hence, for a log-linear model \(log(E(y \mid x)) = \beta_0 + \beta_1 \cdot log(x)\) and thus \(\beta_1 = \frac{\log(y)}{\log(x)}\). For the semi-elasticity coefficient (i.e., logit-linear) this only accounts partially and values closer 0 are better interpretable because \(logit(E(y \mid x)) = \beta_0 + \beta_1 \cdot log(x)\) and thus \(\beta_1 = \frac{logit(y)}{\log(x)}\). This expressed the change in the log-odds per 1% elasticity

(Cramer, 1991; Wooldridge, 2001). These coefficients allow comparison across models and predictors while maintaining interpretable units for prediction. Since \(x\) is log-transformed, its original units are preserved — unlike in standardized coefficients, where this interpretability is lost.

As an example, consider a decline in benthic invertebrate species richness from 100 to 30 as conductivity increases from 50 till 5000 \(\mu S·cm^-1\) The elasticity is: \[\beta_{elasticity}= (log(100)-log(30))/(log(50)-log(5000))=-0.26\]. This decrease is the same for a decline from 10 till 3 over the same range \[\beta_{elasticity}=(log(10)-log(3))/(log(50)-log(5000))=-0.26\] Although the model’s intercept \(\beta_0\)would differ, this does not affect the interpretation of the regression coefficient \(\beta_1\), nor its uncertainty or visualization.

I use the ‘unofficial expressions’ for b0 and b1, due to the reference in the R-package to The expression from the models above would be more more formally expressed: \[g(E(y_{i} \mid x_{ij})) = \sum_{j=1}^{j} \beta_j \cdot x_{ij}\] Where \(x_{ij}\) refers to the \(j\) the predictor variable (e.g., salinity is \(j\)=1 and light is \(j\)=2) and \(i\) is the \(i\)-th observation. This expression will later be utilized in the explanation of the visualization.

2.4 Data and R-package

To start using this R-package, both JAGS and devtools must be installed. JAGS can be installed from https://sourceforge.net/projects/mcmc-jags/ and devtools can be installed in R via CRAN. The most recent version of the EcoPostView it can be installed from GitHub. Of course, any problems, questions or possible improvements can be directed to me.

#install.packages("devtools")
library(devtools)
## Loading required package: usethis
#install.github("snwikaij/EcoPostView")
library(EcoPostView)

At this stage, we assume that multiple (G)LMs have been fitted. From these models, the parameter estimates and their standard errors have been extracted and compiled into a dataset. For each estimate, it is useful to record relevant metadata including the source (e.g., DOI), the type of predictor variable (e.g., conductivity), group of the response type (e.g., benthic-invertebrates), link-function and if the model parameters is the intercept b0 or a regression coefficient b1. When models include multiple predictor variables, all corresponding regression coefficients are denoted as b1, distinguishing them from the intercept b0. The example below in R demonstrates the expected structure of this data frame.

data(example1)
head(example1)
## # A tibble: 6 × 9
##   doi                link  group predictor parameter     est    se     n    mean
##   <chr>              <chr> <chr> <chr>     <chr>       <dbl> <dbl> <dbl>   <dbl>
## 1 10.1127/1863-9135… log   Inve… Salinity  b1         0.0697 0.202    10 870.   
## 2 10.1127/1863-9135… log   Inve… Oxygen    b1         0.319  0.196    10   8.15 
## 3 10.1127/1863-9135… log   Inve… Sediment  b1         0.258  0.164    10   0.385
## 4 10.1127/1863-9135… logit Inve… Salinity  b1        -0.950  0.723    10 870.   
## 5 10.1127/1863-9135… logit Inve… Oxygen    b1        -0.136  0.353    10   8.15 
## 6 10.1127/1863-9135… logit Inve… Sediment  b1         0.266  0.309    10   0.385

In the example above, the est column contains the estimated model parameters, while the se column holds the standard error of those estimates. The group column can represent an organism group, specific species, or taxon (or any other category you wish to use for grouping). The predictor column denotes the specific predictor variable, and the parameter column indicates whether the estimate corresponds to the intercept (b1) or a regression coefficient (b1). The link column specifies the link function used in the model. Additionally, it is recommended to include the sample size (n) in your dataset to adjust for ‘small-sample effects’ if needed (Peters et al., 2006; Moreno et al., 2009).

2.5 Basic model structures for meta-analysis

The meta function can include a random effect, by setting the argument RE=TRUE (default is TRUE). The structure is then \[\{\beta_{i}, ..., \beta_{n}\}= \beta_{\text{pooled}} + u_i\] It has the option of placing a single or multiple random effect as a vector or matrix using the argument ‘random’ the structure then becomes then \[\{\beta_{i}, ..., \beta_{n}\} = \beta_{\text{pooled}} + u_i +r_i\] for a single random effect. It can adjust for the relation between the \(se\) and model parameters using the the squared standard error \(se^2\) often refered to as Precision-Effect Estimate with Standard Errors or short PEESE (method=1, Stanley and Doucouliagos, 2014). The structure is then \[\{\beta_{i}, ..., \beta_{n}\} = \beta_{\text{pooled}} + u_i + \alpha_{i} \cdot se^2\] or inverse of the sample size \(1/n\) (method=2, the latter option is performed below) with the structure \[\{\beta_{i}, ..., \beta_{n}\} = \beta_{\text{pooled}} + u_i + \alpha_{i} \cdot \left(\frac{1}{n}\right)\] Of course if bias is considered neglect non can be performed (method=0). I still would like to include a third 4th option to utilize Robust Bayesian Model Averaging (RoBMA: Maier et al. 2023). But this sometimes adjust extremely when including \(se^2\) and therefore I left this option open for now.

2.6 The meta-function

The meta-function can be applied over the example data via the following argument.

mod1 <- meta(estimate=example1$est,         #Model estimate
             stderr=example1$se,            #Standard error of the model estimate
             parameter=example1$parameter,  #Model parameter (b0 or b1)
             predictor=example1$predictor,  #Predictor variable  (independent variable)
             link_function=example1$link,   #Link function
             grouping=example1$group,       #Group
             Nsamp=example1$n,              #Sample size (optional, for adjustment 2=Peters (1/n)),
             method=2)                      #Adjustment method (0=none, 1=Egger's (1/se), 2=Peters (1/n))
## Warning in meta(estimate = example1$est, stderr = example1$se, parameter =
## example1$parameter, : The Rhat or/and effective sample size for some >1.01
## or/and <1000. Chains might not be mixing.

The meta-function can return a warning that the MCMC-chains are not properly mixing. This can be an issue due to various reasons. Where this warning originates from can be assessed by looking at the ‘raw’ JAGS model output (mod1$model$JAGS_model). This could show that a parameter of interested ‘mu[.]’ Has a a large Rhat or small effective sample size. Most of these issues can be resolved by thinning the chains, increasing the number of chains or setting more informed or stronger priors. Moreover, if the issue is not an issue of the estimated ‘mu’ parameter, it could be decided to ignore it. These choices are ultimately up to the user. An option to prevent warnings would be to set the warning level for Eff_warn lower i.e., Eff_warn = 500.

The meta-function may return a warning indicating that the MCMC chains are not mixing properly. This issue can arise for various reasons. To diagnose the source of the warning, examine the raw JAGS model output (mod1$model$JAGS_model). Specifically, look for cases where a parameter of interest, such as mu[.], has a large Rhat value or a small effective sample size.

Most of these issues can be addressed by thinning the chains, increasing the number of chains, or specifying more informed or stronger priors. If the problem is not related to the mu parameter, you may choose to disregard the warning. Ultimately, the decision on how to address these issues lies with the user.

To prevent the warning from being raised, you can lower the threshold for the Eff_warn parameter (e.g., Eff_warn = 500).

2.7 Standard meta-analysis

Meta-analysis is often performed using standardized effect sizes (SES). While I do not endorse this practice, I believe it offers limited benefits for field ecology, applied ecology, and the generalization of real-world ecological relations (Baguley, 2009; Tukey, 1969). Therefore, I will provide a brief introduction to meta-analysis and demonstrate how bias correction methods perform. To illustrate this, I will compare the results with those obtained using my preferred metafor package in R.

#First run and check standard output for metafor
library(metafor)
## Loading required package: Matrix
## Loading required package: metadat
## Loading required package: numDeriv
## 
## Loading the 'metafor' package (version 4.4-0). For an
## introduction to the package please type: help(metafor)
data("example3")

#Run metafor
standard_metafor <- metafor::rma(yi=example3$est, sei=example3$se)

#Run EcoPostView
standard_meta    <- meta(estimate=example3$est, stderr=example3$se)

#Results metafor
print(standard_metafor)
## 
## Random-Effects Model (k = 83; tau^2 estimator: REML)
## 
## tau^2 (estimated amount of total heterogeneity): 0.0182 (SE = 0.0049)
## tau (square root of estimated tau^2 value):      0.1350
## I^2 (total heterogeneity / total variability):   65.66%
## H^2 (total variability / sampling variability):  2.91
## 
## Test for Heterogeneity:
## Q(df = 82) = 250.8709, p-val < .0001
## 
## Model Results:
## 
## estimate      se      zval    pval    ci.lb    ci.ub      
##  -0.2283  0.0199  -11.4886  <.0001  -0.2673  -0.1894  *** 
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Results EcoPostView
print(standard_meta$Summary)
##   parameter      predictor     link          group     map      mu     se
## 1        b1 none-specified identity none-specified -0.2277 -0.2281 0.0207
##        ll      ul     I2  n
## 1 -0.2623 -0.1939 0.3561 83

Both the metafor and EcoPostView packages yield similar means (-0.22) and standard errors. This suggests that, in many cases, not specifying priors may be uninformative. However, we can further assess the models performance by examining the bias through the residuals.

par(mfrow=c(1,2))
plot(1/example3$se, resid(standard_metafor), 
     xlab = "1/se", ylab="Residuals", main="metafor")
abline(a=0, b=0, col="red", lty=2)

plot(1/example3$se, standard_meta$Residuals, 
     xlab = "1/se", ylab="Posterior mean residuals", main="EcoPostView")
abline(a=0, b=0, col="red", lty=2)

Both functions reveal a clear diagonal pattern, which is nearly identical in both cases. However, the posterior means are pulled closer to the overall mean (a phenomenon known as shrinkage) for estimates with weaker standard errors. This bias can also be assessed using the rescheck function from EcoPostView

#Use the residual check function within EcoPostView
res_bias <- rescheck(standard_meta)

print(res_bias$bias_se)

This bias is clearly a result of excluding or selectively retaining ‘significant’ results, often through practices such as manually dropping ‘non-significant’ variables or using stepwise model selection methods like forward/backward AIC or BIC (Gelman & Loken, 2013). These practices can lead to a significant overestimation of the parameter (or ‘effect size’). This bias can be corrected using Method 1 presented by Stanley and Doucouliagos (2014).

#Run EcoPostView (increased the chain thinning interval and number of iterations to improve mixing)
adjusted_meta    <- meta(estimate=example3$est, stderr=example3$se, method = 1, 
                         n_thin = 5, 
                         n_iter = 30000)

print(adjusted_meta$Summary)
##   parameter      predictor     link          group     map      mu    se
## 1        b1 none-specified identity none-specified -0.0676 -0.0721 0.026
##        ll      ul     I2  n
## 1 -0.1132 -0.0286 0.1982 83
res_bias2 <- rescheck(adjusted_meta)
print(res_bias2$bias_se)

In the method outlined above, the bias has been adjusted (not removed), resulting in a much lower pooled estimate. This adjustment allows for a clearer assessment of the relationship between the standard errors by examining the residuals. While this is commonly done using funnel plots, it can also be done by directly checking the residuals.

However, the adjustment should only be applied when clear patterns of bias are present, as it can lead to over-corrections even when no bias is evident. That said, it is highly effective when a bias is present in the data. In the future, I aim to incorporate additional methods to better assess the strength of any bias.

The bias displayed in this example is extreme, and in such cases, it may be beneficial to further explore the nature of the bias to determine if a ‘publication gap’ exists. This gap can be highlighted by examining the z-distribution derived from the p-value. Normally, the p-value is derived from the z-value, but when a clear gap is visible (as observed in Zwet & Cator, 2021), we should be able to model the absolute z-value as a mixture of two half-normal distributions - one truncated at 0 and the other at 1.96. Since the likelihood of such a mixture is challenging to estimate, I employ an Approximate Bayesian Computation (ABC) algorithm based on rejection sampling (ABC-rejection). This method is further described in Csilléry et al. (2010) and Hartig et al. (2011), and the model is formally presented in the theoretical section.

#From the dataset calculate the p-value from the effect-sizes and standard errors
pvalues <- ptoz(estimate=example3$est, stderr=example3$se)

#run the ABC-rejection model
result_abc  <- abctoz(p=pvalues$data$p, nsim = 250000)

#Extract the information from the results based on a selected threshold
extract_abc <- extrabc(result_abc, dist_threshold = 0.052)
##   Statistic   Mean     SE     ll     ul
## 1         c 0.7783 0.1032 0.6025 0.9030
## 2     mu(z) 1.2380 0.4984 0.3935 1.9004
## 3     sd(z) 0.9531 0.1872 0.6713 1.2675
#Plot the histogram of the z-values with the simulated density lines of the posterior
plot(extract_abc$hist)

Based on the distribution of z-values, we can clearly observe that values where |z|>1.96 are more frequently published than those with lower absolute values. This would be reasonable if no publication gap existed; however, the sharp boundary at |z| > 1.96 suggests otherwise. The z-values can be reasonably modeled, and the model appears to fit the data well, with an R-squared typically above 0.6–0.7, which is considered acceptable. Additionally, the density curves align well with the histogram, further supporting the model’s fit.

The proportion of observations explained by the censored component of the model is 0.76 (76%). This does not imply that 76% of the data is censored, but rather that the model’s censored component captures a substantial portion of the observed pattern. The goal here is to determine whether the model provides a good fit to the data under the assumption of selective reporting. If the residuals suggest severe bias, this model fit offers additional information of a selection process at play.

To ensure the robustness of the results, one should verify that the model fit remains adequate and that the number of accepted simulations is heuristically sufficient (typically > 100).

2.7.1 Setting priors

A key advantage of the Bayesian approach is the ability to incorporate prior information, thereby explicitly shifting the posterior estimates toward more plausible values for the pooled model parameter. To define a single prior for each relation and parameter, a specific structure is required. By default, model parameters are assumed to follow a normal (Gaussian) distribution with a mean (\(\mu\), prior_mu) of 0 and a standard deviation (\(\sigma\), prior_mu_se) of 0.5. At present, the prior distribution for model parameters is limited to the normal distribution. The prior for the residual standard deviation (\(\sigma\)) is defined as a uniform distribution, with the upper bound (prior_sigma_max) set to 5 by default.

As discussed in Section, I often prefer to work heuristically with elasticity or semi-elasticity coefficients. However, this is not required, and the choice of prior should reflect your modeling preferences and domain knowledge. In fact, failing to think carefully about the priors — even when even limited prior information is available — means the analysis is not truly Bayesian in nature.

Users can specify their own prior values for both the mean and standard deviation. To obtain a structured overview of the required prior inputs, set get_prior_only = TRUE. This will return a data frame containing a level column, as well as columns for the prior mean (\(\mu\)) and standard deviation (\(se\)). These values can then be tailored to the specific context of the analysis using available prior information. Details on how to incorporate this prior data frame into your model are provided later.

only_priors <- meta(estimate=example1$est,        
                    stderr=example1$se,            
                    parameter=example1$parameter,  
                    predictor=example1$predictor,  
                    link_function=example1$link,   
                    grouping=example1$group,       
                    Nsamp=example1$n,            
                    method=2,
                    get_prior_only=TRUE) #Only show the structure of the priors

print(only_priors)
##                             Levels Prior_mu Prior_se
## 1                   b0_NA_log_Fish        0       10
## 2          b0_NA_log_Invertebrates        0       10
## 3                 b0_NA_logit_Fish        0       10
## 4        b0_NA_logit_Invertebrates        0       10
## 5               b1_Oxygen_log_Fish        0       10
## 6      b1_Oxygen_log_Invertebrates        0       10
## 7             b1_Oxygen_logit_Fish        0       10
## 8    b1_Oxygen_logit_Invertebrates        0       10
## 9             b1_Salinity_log_Fish        0       10
## 10   b1_Salinity_log_Invertebrates        0       10
## 11          b1_Salinity_logit_Fish        0       10
## 12 b1_Salinity_logit_Invertebrates        0       10
## 13            b1_Sediment_log_Fish        0       10
## 14   b1_Sediment_log_Invertebrates        0       10
## 15          b1_Sediment_logit_Fish        0       10
## 16 b1_Sediment_logit_Invertebrates        0       10

2.7.2 Setting multiple priors for Bayesian Model Averaging

An important advantage of the Bayesian framework is the ability to incorporate multiple prior distributions (\(k\)), enabling Bayesian Model Averaging (BMA; Hoeting et al., 1999; Hinne et al., 2020). This approach allows one to represent multiple plausible scenarios that could have explained the observed data, and to average over these competing models based on their relative credibility.

To implement BMA, a dataset similar in structure to the single-prior setup is required, but extended to include multiple prior specifications. Each prior distribution typically includes a mean (\(\mu\)) and standard error (\(se\)), just as before.

In many cases, prior weights are assigned to reflect how strongly each prior contributes to the model. These weights range between 0 and 1 and ideally sum to 1 (or 100%). For simplicity, especially when no strong preference among priors exists, equal weighting can be used (e.g., with three priors, each receives a weight of 1/3). Alternatively, when the weights themselves are uncertain, they can be treated as random variables and modeled using a Dirichlet distribution: \(weight \sim Dir(\alpha_i)\) where \(\alpha_i = 1\) for each prior (\(i\)), yielding a uniform Dirichlet distribution.

In the example below, I illustrate this approach using priors with varying values of \(\mu\) and \(se\). For intercept parameters, a broader prior such as \(N(\mu = 0, se = 10)\) is often reasonable, reflecting higher uncertainty.

data("example2")
print(example2)
## # A tibble: 16 × 7
##    Levels                            mu1   se1   mu2   se2   mu3   se3
##    <chr>                           <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1 b0_NA_log_Fish                      0  10     0    10     0   10   
##  2 b0_NA_log_Invertebrates             0  10     0    10     0   10   
##  3 b0_NA_logit_Fish                    0  10     0    10     0   10   
##  4 b0_NA_logit_Invertebrates           0  10     0    10     0   10   
##  5 b1_Oxygen_log_Fish                  0   0.5   0.3   0.5   0.3  0.15
##  6 b1_Oxygen_log_Invertebrates         0   0.5   0.3   0.5   0.3  0.15
##  7 b1_Oxygen_logit_Fish                0   0.5   0.3   0.5   0.3  0.15
##  8 b1_Oxygen_logit_Invertebrates       0   0.5   0.3   0.5   0.3  0.15
##  9 b1_Salinity_log_Fish                0   0.5  -0.3   0.5  -0.3  0.15
## 10 b1_Salinity_log_Invertebrates       0   0.5  -0.3   0.5  -0.3  0.15
## 11 b1_Salinity_logit_Fish              0   0.5  -0.3   0.5  -0.3  0.15
## 12 b1_Salinity_logit_Invertebrates     0   0.5  -0.3   0.5  -0.3  0.15
## 13 b1_Sediment_log_Fish                0   0.5  -0.3   0.5  -0.3  0.15
## 14 b1_Sediment_log_Invertebrates       0   0.5  -0.3   0.5  -0.3  0.15
## 15 b1_Sediment_logit_Fish              0   0.5  -0.3   0.5  -0.3  0.15
## 16 b1_Sediment_logit_Invertebrates     0   0.5  -0.3   0.5  -0.3  0.15
mod2 <- meta(estimate=example1$est,        
                    stderr=example1$se,            
                    parameter=example1$parameter,  
                    predictor=example1$predictor,  
                    link_function=example1$link,   
                    grouping=example1$group,
                    prior_mu=example2[c(2,4,6)],          #prior for the mean
                    prior_mu_se=example2[c(3,5,7)],       #prior for the standard error of the mean
                    Nsamp=example1$n,            
                    method=2,
                    n_thin=10,                            #thinning the chains
                    n_chain=4)                            #changing the number of chains from 2 to 4
## Warning in meta(estimate = example1$est, stderr = example1$se, parameter =
## example1$parameter, : The Rhat or/and effective sample size for some >1.01
## or/and <1000. Chains might not be mixing.
#Display the summarized results
mod2$Summary
##    parameter predictor  link         group     map      mu     se      ll
## 1         b1    Oxygen   log          Fish  0.3060  0.3402 0.1519  0.0941
## 2         b1    Oxygen   log Invertebrates  0.2537  0.2349 0.1139  0.0576
## 3         b1    Oxygen logit          Fish  0.2346  0.1559 0.2316 -0.2226
## 4         b1    Oxygen logit Invertebrates  0.1751  0.1220 0.1724 -0.1803
## 5         b1  Salinity   log          Fish -0.1537 -0.1220 0.1573 -0.3809
## 6         b1  Salinity   log Invertebrates -0.1564 -0.1472 0.0903 -0.2934
## 7         b1  Salinity logit          Fish -0.2948 -0.2813 0.1845 -0.5972
## 8         b1  Salinity logit Invertebrates -0.3222 -0.3316 0.1155 -0.5121
## 9         b1  Sediment   log          Fish -0.2880 -0.2255 0.2068 -0.5574
## 10        b1  Sediment   log Invertebrates -0.1708 -0.1569 0.1141 -0.3491
## 11        b1  Sediment logit          Fish -0.3056 -0.3102 0.2266 -0.6593
## 12        b1  Sediment logit Invertebrates -0.2397 -0.2462 0.1294 -0.4481
##         ul     I2  n
## 1   0.5699 0.8418 27
## 2   0.4243 0.8736 59
## 3   0.5149 0.9824 14
## 4   0.3865 0.9408 38
## 5   0.1238 0.9699 29
## 6  -0.0060 0.8120 83
## 7  -0.0168 0.8031 13
## 8  -0.1316 0.6598 52
## 9   0.0823 0.9836  9
## 10  0.0205 0.9775 46
## 11  0.0261 0.9136  6
## 12 -0.0391 0.6865 26

The results of the meta-analysis are summarized in a table that includes the Maximum A Posteriori (MAP) estimates, the posterior mean (\(\mu\)), standard error (\(se\)), and the High Density Interval (HDI), which by default is set to 90%. Additionally, the heterogeneity among studies is quantified using the \(I^2\) statistic.

The prior for the between-study variance (\(\tau^2\)) is, by default, specified as a uniform distribution ranging from 0 to 5. This choice has the benefit of producing wider intervals, which can be conservative—particularly useful when dealing with smaller sample sizes. However, this conservatism can also be a drawback in cases where more precision is desired.

To offer users flexibility, the argument prior_var_fam can be set to "exp" to use an exponential distribution instead of the default "unif" (uniform distribution). When using"unif", the variance prior is specified as \(Unif(0, \text{prior_study_var})\). When “exp” is selected, the prior variance becomes \(Exponential(\frac{1}{\text{prior_study_var})}\)

In general, for complex meta-analyses with many predictors and responses (e.g., Kaijser et al. …), the uniform prior is recommended, provided that convergence is achieved. In contrast, for more focused analyses with fewer predictors and responses - and particularly with small datasets (e.g., \(n < 10\)) — an exponential prior (e.g., with a mean of 1000) may be more appropriate.

These recommendations are heuristic — they are grounded in practical experience and prior applications, but should not be treated as universally optimal. Users are strongly encouraged to conduct sensitivity analyses to assess how prior assumptions influence the results.

2.7.3 rescheck-function and bias

After analyzing the meta-data, it is essential to check for bias, which can arise from multiple sources. This check should ideally be part of a sensitivity analysis, employing various methods such as: Display of the z-distribution, Egger’s test, Peters tests and/or funnel plots. Bias is nearly always present to some extent, but its magnitude may vary depending on the dataset.

A straightforward first step is to visually assess the relationship between the residuals and the inverse of the standard error (\(1/se\)). If sample sizes are available, one can also assess the relationship with \(1/n\).

If a clear diagonal pattern between \(\beta\) to \(1/se\) can indicate the selection larger effects with broader intervals, p-hacking, HARKing, data dredging, noise in the data, etc. A relation with \(1/n\) often occurs when small sample sizes an noise result in so called ‘small-study-effects’.

The residuals can be assessed across the total dataset, per group or per predictor.

Below the residuals per group in relation to \(1/se\).

res_mod2 <- rescheck(mod2)

print(res_mod2$bias_se_group)

And the residuals per predictor in relation to \(1/se\).

print(res_mod2$bias_se_predictor)

The dotted red line should approximately overlay the solid blue line, which represents the expected relationship with an intercept and slope of 0. However, small sample sizes can substantially influence the slope of the red line, potentially leading to misleading inferences. When clear bias is detected, it is advisable to either apply a bias correction method or specify stronger priors to mitigate the consequences.

2.8 senscheck-function and prior sensitivity

Sensitivity checks play an important role in assessing the robustness of model results. For simpler models with highly informative data, these checks may not always be necessary. However, one cannot assume that identical results would be obtained in a subsequent study under different conditions. Consequently, drawing strong conclusions based solely on whether an interval includes zero is arbitrary and often misleading.

A more informative approach involves directly inspecting the posterior distribution, along with the estimate, its uncertainty, and a visualization of the variance in patterns revealed by the data. This is particularly important in ecology, where data tend to be noisy and often exploratory in nature—meaning the posterior distribution can vary substantially between studies. Still, sensitivity checks can serve as a useful reality check.

In this framework, a sensitivity check evaluates the difference between a fully specified model with informed priors (mod1) and a baseline model with vague or weak priors (mod0). This comparison is made by computing the posterior odds ratio: \(Log(P(Mod1|Data, Info)/P(Mod0|Data, Info))\) assuming that all other model hyper parameters are held constant except for the priors.

An alternative approach is to assess the extent to which prior information in mod1 versus mod0 contributes to a shift in the posterior away from zero: \(Log(P(Mod1>0|Data, Info)/P(Mod0>0|Data, Info))\) This can be transformed into a probability between 0 and 1, where 0.5 indicates no net influence of the prior on the posterior, 0 indicates complete negative influence, and 1 complete positive influence. However, I am not the biggest fan of this way of assessing sensitivity, as this again treats the posterior as a form of dichotomous hypothesis test.

In the earlier example, mod2 was treated as mod1. A corresponding weakly informed mod0 model can be created by setting all prior means (\(\mu\)) to 0 and their standard errors (\(se\)) to 100.

#Create a model with minimal prior information
mod0 <- meta(estimate=example1$est,        
                    stderr=example1$se,            
                    parameter=example1$parameter,  
                    predictor=example1$predictor,  
                    link_function=example1$link,   
                    grouping=example1$group,
                    prior_mu=0,                           #prior for the mean
                    prior_mu_se=100,                      #prior for the standard error of the mean
                    Nsamp=example1$n,            
                    method=2,
                    n_thin=10,
                    n_chain=4)

#Perform the sensitivity check
sens_check <- senscheck(mod2, mod0)

#Plot the posterior odds
print(sens_check$posterior_odds)

The vertical black line in the plot represents the threshold where there is no difference between the models, i.e., where: \(0=Log(P(Mod1|Data, Info)/P(Mod0|Data, Info))\) This corresponds to equal support for both mod1 and mod0. Notably, the results show that for log-linear models, the fish–oxygen relationship, and for logit-linear models, the salinity–sediment relationship, the inclusion of prior information in mod1 shifts the posterior distributions toward more negative values.

To quantify the strength and direction of this shift, the inverse logit of the Maximum A Posteriori (MAP) value can be taken: To quantify the strength and direction of this shift, the inverse logit of the Maximum A Posteriori (MAP) value can be taken: \(logit^{-1}(MAP)\) This transformation expresses the shift as a probability where smaller then 0.5 indicates negative shift, bigger than 0.5 a positive shift and 0.5 none.

#Select only predictor and link function
inv_df <- sens_check$table[c(3:4)]

#Calculate the probability
inv_df$prob <-plogis(sens_check$table$mu[sens_check$table$group=="Fish"])

#Print the table
print(inv_df)
##    predictor  link      prob
## 1         NA   log 0.5009339
## 2         NA   log 0.5009827
## 3         NA logit 0.4599414
## 4         NA logit 0.4981049
## 5     Oxygen   log 0.5542536
## 6     Oxygen   log 0.4632618
## 7     Oxygen logit 0.5025395
## 8     Oxygen logit 0.3847435
## 9   Salinity   log 0.5009339
## 10  Salinity   log 0.5009827
## 11  Salinity logit 0.4599414
## 12  Salinity logit 0.4981049
## 13  Sediment   log 0.5542536
## 14  Sediment   log 0.4632618
## 15  Sediment logit 0.5025395
## 16  Sediment logit 0.3847435

The table shows that for sediment in a logit-linear model related to fish, the posterior probability shifts from 0.38 in mod0 to 0.50 in mod1. This corresponds to a 12% increase, i.e., 0.50−0.38=0.12. This indicates that prior information contributes additional support for a negative relationship between fine sediment and lotic fish species — an relation is not fully supported by the data alone.

This is not a limitation but rather reflects what occurs in practice: prior information often drawn from empirical studies or domain expertise tends to be more directional. This highlights the value of incorporating prior information from the literature when building and refining Bayesian models.

Admittedly, this is the most demanding phase of the workflow: gathering and extracting data, fitting multiple (G)LMs, defining and implementing priors, assessing potential biases, and optimizing the model to ensure stable and interpretable results. Once complete, presenting the results or making predictions from the fitted models is considerably more straightforward.

Additionally, overlaying the probability density distribution of both models. Gives another way to asses the influence of the priors. It shows posterior density of both mod1 (M1) and mod0 (M0).

#Print to overlay the posterior density for only the log model
print(sens_check$overlay$log)

2.9 pdplot-function

To visualize posterior results, a common approach is to display point estimates along with credible intervals. However, this method imposes sharp boundaries on a continuous distribution of uncertainty, which may not fully reflect the nature of a posterior probability distribution.

An alternative—and often more informative—approach is to plot the Posterior Density Distribution (PDD). This plot combines the point estimate, interval range, and the full shape of the posterior, offering a richer picture of the uncertainty and possible parameter values.

The PDD represents the distribution of the pooled parameter estimate conditional on the meta-data and prior information: \(f(\beta_{\text{pooled}} \mid Meta-data, Info)\). This is conceptually the inverse of the likelihood, which tells us how likely the observed data are given a set of parameter values: \(f(Meta-data \mid \{\beta_{i}, ..., \beta_{n}\})\). In practice, this visualization can be generated using the pdplot() function, which overlays the posterior density curve with interval and point estimates, allowing for intuitive interpretation of the central tendency and uncertainty of the pooled estimate.

pdd <- pdplot(mod2, 
              label_size=4,      #setting the label size larger
              point_size=2)      #large point

The object contains the figures generated for both the log

#For the models with the log-link
print(pdd$posterior_density$log)             

and logit functions.

#For the models with the logit-link
print(pdd$posterior_density$logit)

And, a summary belonging to the figures.

#summary belonging to the figures
print(pdd$summary)
##        map      mu     se      ll      ul
## 1   0.3060  0.3402 0.1519  0.0941  0.5699
## 2   0.2346  0.1559 0.2316 -0.2226  0.5149
## 3  -0.1537 -0.1220 0.1573 -0.3809  0.1238
## 4  -0.2948 -0.2813 0.1845 -0.5972 -0.0168
## 5  -0.2880 -0.2255 0.2068 -0.5574  0.0823
## 6  -0.3056 -0.3102 0.2266 -0.6593  0.0261
## 7   0.2537  0.2349 0.1139  0.0576  0.4243
## 8   0.1751  0.1220 0.1724 -0.1803  0.3865
## 9  -0.1564 -0.1472 0.0903 -0.2934 -0.0060
## 10 -0.3222 -0.3316 0.1155 -0.5121 -0.1316
## 11 -0.1708 -0.1569 0.1141 -0.3491  0.0205
## 12 -0.2397 -0.2462 0.1294 -0.4481 -0.0391

For larger datasets with multiple groups and predictor variables, it may be useful to adjust the order of the Posterior Density Distributions (PDDs) for better clarity and comparison. You can control the ordering of predictors and groups by using the arguments order_predictor and order_group. These arguments expect a character vector containing the names of the predictors or groups in the desired order.

By adjusting these arguments, you can organize the PDDs in a way that facilitates easier interpretation, especially when working with complex models involving numerous predictors and groupings.

2.10 hop-function (Hypothetical Outcome Plots)

Hypothetical Outcome Plots (HOPs) are a valuable tool for visualizing how the expected value of a response variable might change in response to variations in a predictor variable, while keeping all other variables constant (Kale et al., 2019).

In a HOP, each line represents the marginal change in the expected value of the response variable as the predictor variable changes. This allows for a clear understanding of how the relationship between the predictor and response behaves, without the influence of other variables.

\[ g(E(y_{i} \mid x_{ij)})) = \beta_{pooled,j=0,m} + \beta_{pooled,j=1,m} \cdot x_{i,j=1}+ \sum_{j=1}^{j} \beta_{pooled,j} \cdot \hat{x}_{j} \]

The hypothetical prediction is generated using the posterior estimates, denoted as \(f(\beta_{pooled}|Data, Info)\). This parameter reflects the influence of a one-unit change in the predictor on the response variable.

To display this change, a set of sequential values for the predictor \(x_{i,j=1}\) can be generated where \(x_{i,j=1}=\{i, ..., n\}\) represents realistic values for the observed gradient of the predictor. The function \(f(\beta_{pooled,j}|Data, Info)\) provides the most plausible values for the pooled regression coefficients \(\beta_{pooled,j}\). By drawing a sample \(m\) from this distribution, \(\beta_{pooled,j,m}\sim f(\beta_{pooled,j}|Data, Info)\) and repeating this process multiple times, a range of hypothetical outcomes can be generated.

For this process, all other parameters are held constant at their estimated values, \(\sum_{j=1}^{j} \beta_{pooled,j}\), while the predictor \(x_j\) is varied. Therefore, the HOP lines represent simulations of possible marginal changes in the response from the posterior distribution.

A slight difference from classical HOPs is that, in the case of meta-analysis, \(\hat{x}=1\) is assumed. This approach still illustrates the marginal change in \(y_i\) given a change in \(x_{i,j}\) , but only under the condition where all other predictors are set to their estimated values \(\hat{x}_j=1\).

\[ g(E(y_{i} \mid x_{ij)})) = \beta_{pooled,j=0,m} + \beta_{pooled,j=1,m} \cdot x_{i,j=1}+ \sum_{j=1}^{j} \beta_{pooled,j} \cdot 1 \]

Note that the function operates under the assumption of log-transformed variables. This implies that the xlim argument, which defines the limits of the x-axis in the plot, is given in log-transformed units. For example, for the fraction of fine sediment, the log-transformation of the limits is exp(−4.6)=0.01 and exp(0)=1. These values correspond to the range of the fraction of fine sediment on a logarithmic scale.

The y-axis, on the other hand, is presented on the response scale, meaning that the values are not transformed, and they represent the actual predicted responses (e.g., taxonomic richness in this case).

In the future, I plan to implement an argument that allows users to set the xlim on the response scale directly, without needing to manually convert to log-transformed values.

Below is an example of the outcomes showing the response of invertebrate taxonomic richness along the gradient of fine sediment.

log_sal1 <- hop(mod2,                                  #Object from the meta function
    group="Invertebrates",                             #Select group to out of Invertebrates and Fish
    predictor = "Sediment",                            #Select group to out of Salinity and Oxygen
    xlab= "Fine sediment (%)",                         #Give x-axis a name
    link_function = "log",                             #Which link function out of log and logit
    ylab="Invertebrate taxonomic richness",            #Give y-axis a name
    xlim=c(-4.6, 0),                                   #Give y-axis a name
    ylim=c(0, 80),                                     #Set  limits y-axis
    hop_lwd = 0.3)                                     #Set width of the hop lines

#Display the HOPs
print(log_sal1)

It is also possible to scale the x-axis to display the exponentiated values using the argument exp_axis. This option will transform the values on the x-axis back from their log-transformed scale to their original scale, providing a more intuitive representation of the predictor variable.

Additionally, you may notice that the intercept could appear unusually high. This might be due to the fact that the intercept represents the average intercept across all studies, with all other variables held constant at a value of 1. If you wish to adjust the position of the intercept, you can use the shift_b0 argument. This allows you to shift the intercept value for a more accurate representation based on the specific context of your analysis.

log_sal2 <- hop(mod2,                                   
    group="Invertebrates",                             
    predictor = "Sediment",                            
    xlab="Fine sediment (%)",   
    link_function = "log",                            
    ylab="Invertebrate taxonomic richness",            
    xlim=c(-4.6, 0),                                      
    ylim=c(0, 50),                                     
    hop_lwd = 0.3,                                    
    exp_axis = T,                                     #Exponentiate the x-axis notations
    round_x_axis = 2,                                 #Round the notation to full integers 
    shift_b0 = -1)                                    #Shift the intercept by -1     

#Display the HOPs
print(log_sal2)

2.11 hop function (and Partial Dependency Plots)

When multiple predictors are included in the dataset, it is possible to visualize the effect of a change in one predictor while holding the other predictors constant. This can be achieved by creating Partial Dependency Plots (PDPs), which show how the predicted response variable changes as a particular predictor varies, keeping other predictors fixed.

The Partial Dependency Plot is a powerful tool for understanding the marginal effect of each predictor on the response variable. In the context of a meta-analysis, it helps illustrate how the relationship between a specific predictor and the response is shaped by the collective data from multiple studies, while controlling for the influence of other variables.

log_sal3 <- hop(mod2,                                              
    group="Invertebrates",                            
    predictor = c("Sediment", "Oxygen"),                     #Select both Sediment and Oxygen
    xlab= "Fine sediment fraction",                          #Give x-axis a name
    ylab= expression(Oxygen ~ mg ~ L^-1),                    #Give y-axis a name
    gradient_title = "MAP Invertebrate \ntaxonomic richness",#Give the y-axis gradient a name
    pdp_resolution = 100,                                    #Set resolution of the grid
    link_function = "log",
    exp_axis = T, 
    round_x_axis = 2,
    round_y_axis = 0,
    xlim=c(-4.61, -0.92),
    ylim=c(1.61, 2.77)) 

#Display the PDP
print(log_sal3)

2.12 Prediction

This part is still under construction.

3 Reasoning, logic and beliefs

3.1 Introduction

I consider reasoning and logic (and epistemics) to be among the most important aspects of life. They should also play a central role in science. Yet, in practice, reasoning — especially formal logic — is often not a major concern for many scientists. To be fair, some of the most significant scientific breakthroughs were not the result of formal reasoning (e.g., the discovery of penicillin; see Feyerabend, 1975). Nevertheless, logic and structured reasoning guide our thinking, shape clear arguments, and offer clarity to our thoughts. Perhaps the most immediate benefit of logic is not its guarantee of ‘truth’, but the aesthetic and coherence it brings to our arguments.

Despite the previous values, many conclusions drawn and claims made in contemporary society are neither logically valid nor sound. This becomes especially evident when one closely examines the structure of arguments in scientific papers and popular articles. Perhaps an immediate consequence of understanding reasoning and logic is the impossibility (or near impossibility) for science to develop valid and sound arguments. Logic, in its formal sense, concerns itself with the structure of arguments: for example, if B follows from A, and C follows from B, then C follows from A. Logic does not concern itself with the ‘truth’ of A, B, or C — those could just as well be X, Y, Z or \(\alpha, \beta, \gamma\). What matters is that the form of reasoning obeys certain principles.

It is often argued that reasoning, to some extent, depends on the principles of logic — though some (e.g., Harman, 1986) challenge this view. In everyday life, strict logical principles may not always be applicable or followed. Still, this does not diminish their potential instrumental value. I propose that applying logical principles can improve our ability to (1.) Decide what to believe about the world, (2.) clearly communicate our thoughts to others, and (3.) better understand the reasoning of others. This makes no attempt to ‘justify’ why one should use logic it might lighten the burden of cognitive effort it demands in shaping a picture of the world.

As someone with a background in ecology, I increasingly find that confusion in scientific reasoning does not stem from the world itself but from the way we use (or not use) logic, reason and language specifically, from issues in syntax and semantics. Hence, I see the choice of words (semantics) and the way we lay connection between them (syntatical structure) is more central to scientific practice—including ecology — than is commonly acknowledged.

Throughout this work, I will use formal expressions — though not strictly within the syntax of formal predicate logic. Instead, I will make use of meta-language and well-formed formulas based on predicate-logic that draw on the basic principles of deductive logic. These expressions help expose the structure of beliefs and arguments, making their internal connections explicit. They are concise and avoid the ambiguities of natural language, aiming for a more neutral and formal presentation of meaning. There is less ambiguity about the form of the argument proposed and is useful to highlight confusion. While this is not intended to be a full introduction to logic, it will address key concepts such as validity, soundness, cogency, ambiguity, vagueness and the formalization of arguments. For a more comprehensive treatment, see Barwise et al. (2002), Lee (2017), and Smith (2021).

Following the path from reasoning and logic to application and ultimately to belief, one may recognize that many — if not most — of our beliefs are not justified under particular lines of reasoning. This raises deeper philosophical questions: (1.) What are beliefs (2.) what do we mean by justified (3.) How do we justify our beliefs? (4.) What should we believe (5.) Why do we try to justify them? And perhaps more drastic (6.) How should we behave in light of this information?

3.2 Syllogism, propositional and predicate logic

A simple syllogistic argument consisting of two premises.

\[ \text{Major premise 1)} \ all \ insects \ have \ six \ legs,\\ \text{Minor premise 2)} \ x \ is \ an \ insect,\\ \text{Conclusion) }\ therefore \ x \ has \ six \ legs \] The argument above is called “modus ponens”, which follows a specific logical structure. In this form, the conclusion is entailed by the premises — that is, the truth of the premises guarantees the truth of the conclusion. Entailment means that in all cases where the premises are true, the conclusion cannot be false. If an argument has this feature, it is considered logically valid.

This argument can be further condensed by abstracting away the content and focusing only on its structure. In this way, the validity of the argument depends solely on the form of the entailment, not on the specific subject matter.

Modens Ponens \[ \text{Major premise 1)} \ all \ P \ are \ Q,\\ \text{Minor premise 2)} \ x \ is \ P,\\ \text{Conclusion)} \ therefore \ x \ is \ Q \] In an argument like the one above, all premises are declarative statements, also known as propositions — sentences that are either True or False, but not both. Each premise typically consists of a subject (or antecedent), a copula (such as is, are, will be, etc.), and a predicate (or consequent). These components express a logical relationship between concepts.

An argument is said to be valid if, assuming all the premises are true, the conclusion cannot be false. In other words, the conclusion is logically entailed by the premises. An argument is sound if it is both valid and all of its premises are in factually true. Therefore, soundness implies both logical structure and factual accuracy.

For example: P1: All unicorns fly when eating carrots, P2: my unicorn eat a carrot, C: therefore my unicorn will fly. This is a valid argument because if the premises were true, the conclusion would necessarily follow. However, it is not sound because unicorns do not exist (as far as I know), so No consider the following argument: P1: most grass is green P2: the sky is blue C: you are reading this text. Even if all three propositions happen to be true, this argument is not valid because the conclusion is not entailed by the premises — there is no logical connection between them. An argument where the conclusion does not follow from the premises is considered invalid, regardless of whether its statements are true.

In propositional logic, the argument as given before can be formally expressed as below. In this formal expression \(\rightarrow\) indicates and implication (as “implies”) and the three dots \(\therefore\) indicates therefore. Similar \(\vee\) means “or”, \(\wedge\) means “and” and \(\neg\) is the negation symbol. Thus for example \(\neg P\) means not-P. Other operators include \(\vee\) indicates “or” (disjunction) and \(\wedge\) indicates “and” (conjunction). Using these basic operators, many more valid argument forms can be constructed—such as disjunctive syllogism, hypothetical syllogism, modus tollens, and more. These forms serve as the foundation for formal reasoning in logic.

In meta-language it is adrresses how we speak about the object language. The meta-language discusses how the object language should and is used. It can contain more operators as long as it is defined what they imply. For example \(\equiv\) and \(\leftrightarrow\) indicates equivalence which can be understood as \(A \leftrightarrow B\) or \((A \rightarrow B) ∧ (B \rightarrow A)\). Hera \(A \equiv B\) have equivalent logical status, but are not the same. \(=\) indicates \(is\) or an outcome as identity i.e., \(A = B\) indicating A and B are the same. The symbol \(/\) indicates over or divided by.

In propositional logic modens ponens is then expressed as follows: \[P \rightarrow Q, P \therefore Q\]

Modens Tollens \[P \rightarrow Q, \neg Q \therefore \neg P\]

Hypothetical Syllogism \[P \rightarrow Q, Q \rightarrow Z, P \therefore Z \]

Disjunctive Syllogism \[Either \ P \vee Q, \neg P \therefore Q\]

Semantic equivalent structures \[ \text{Modens Ponens:} for \ all \ x \ that \ are \ P \ it \ implies \ they \ are \ Q, x \ is \ P \ therefore \ x \ is \ Q\\ \text{Modens Tollens:} for \ all \ x \ that \ are \ P \ it \ implies \ they \ are \ Q, x \ is \ \neg Q \ therefore \ x \ is \ \neg P \\ \text{Hypothetical Syllogism:} for \ all \ x \ that \ are \ P \ it \ implies \ they \ are \ Q, for \ all \ x \ that \ are \ Q \ it \ implies \ they \ are \ Z, \ x \ is \ P \ therefore \ x \ is \ Z \\ \text{Disjunctive Syllogism:} either \ x \ is \ P \ or \ x \ is \ Q, x \ is \neg P \ therefore \ x \ is \ Q \]

An valid argument - in the semantic interpretation of validity - in case \(𝓟_n\) \(𝓟_i\) is true in all interpretations in which \(𝓟_1, ..., 𝓟_{n-1} ,𝓟_n\) are true. Simplistically, if all propositions are true the conclusion cannot be false. Such a valid argument can be notated as \[𝓟_1, …, 𝓟_{n-1} \models 𝓟_n\] (see also Haack, 1978)

In predicate logic modens ponens is expressed as \(\forall(P(x)\rightarrow Q(x))\) where ‘\(\forall\)’ indicates ‘for all’ and ‘\(\models\)’ therefore. Each proposition referred to as \(𝓟_i\) consisting of P and Q called properties and x and element from a set \(x\in X\). The benefit of predicate logic (or approximating languages such as deontic-logic or a meta-language)is its possibility to quantify over different elements form a set \(x \in X\) can be assessed.

Predicated equivalent structures

\[ \text{Modens Ponens:} \forall x(P(x)\rightarrow Q(x)), P(x) \vdash Q(x) \\ \text{Modens Tollens:} \forall x(P(x)\rightarrow Q(x)), \neg Q(x) \vdash \neg P(x) \\ \text{Hypothetical Syllogism:} \forall x(P(x)\rightarrow Q(x)) \wedge \forall x(Q(x) \rightarrow Z(x)), P(x) \rightarrow Z(x) \\ \text{Disjunctive Syllogism:} P(x) \vee Q(x), \neg P(x) \vdash Q(x) \\ \]

3.3 Reasons, Logic and rules

There exist some rules (axioms) of logic that are self evident. There are no clear rules of reasoning as far as I am aware. Of course there are ‘laws’ in logic that can be derived from within the object language. There some *normative?) rules of rationality that make intuitively sense if one is aware of them. Yet, it remains unclear if we actually use them. I will lay them out here for further support. This does not mean they are not normative, but they will be useful down the line.

Law of identity Informal: The law of identity states that any object that is equal to itself. \[ \text{Formal:}\ A=A \ or \ A(x)=A(x) \]

Law of excluded middle Informal: A proposition is either true or false (there is no middle ground) \[ \text{Formal:}\ A(x)\vee \neg A(x) \]

Law of non-contradiction Informal: One cannot rationally hold logically inconsistent beliefs. \[ \text{Formal:}\ \neg (A(x)\wedge \neg A(x)) \] It could further be expressed as a meta-language in the style of propositional logic as below. This would indicate there exists no subject rational subject (\(s\)) that believes both \(𝓟\) and \(\neg𝓟\). \[ \text {Meta-language}: \neg\exists s \forall 𝓟(Rational(s) \rightarrow Believe(s, 𝓟) \wedge Believe(s, \neg 𝓟)) \]

Norm of logical consequence Informal: If one believes the premises of a valid argument, one must believe the conclusion. \[ \text {Meta-language}: \forall s \forall 𝓟_i \forall 𝓟_n Valid(s, 𝓟_i, ..., 𝓟_{n-1} \models 𝓟_n) \wedge Belief(s, 𝓟_i) \rightarrow Belief(s, 𝓟_n) \] Both the law of non-contradiction and norm of logical consequence are invoked albeit in different forms as a basis of ‘correct’ reasoning (not logic) by Harman (1986). This has been adapted by (Labukt 2021). I am unaware if this being actually taken serious, but they all make intuitively sense. The law of identity is also could be added as it is self evident that something that is different cannot be itself. Otherwise the law of excluded middle is largely supporting of the law of non-contradiction. Hence, if we cannot hold the belief that it is probable a proposition is either 80% true and 20% false but either 100% or 0% then it follows that we can neither belief both A or B to be true. This would seem to clash with the idea that beliefs come in degrees. Yet, there is a difference in believing something yes/no or expressing and modeling the belief.

Informal: If the probability of \(A\) can only take 1 and 0 \(P(A(x)) \in \{ 0, 1 \}\) (which is not a probability) and the probability of \(B\) can only take 0 or 1 \(P(B(x)) \in \{ 0, 1 \}\) which implies that either A or B. \[ \text{Formal:}\ (P(A(x))=1 \ \vee P(A(x))=0) \ \wedge (P(B(x))=1-P(A(x)) ) \rightarrow A(x)\vee \ B(x) \]

3.4 Deduction, induction and abuction

Deduction

Deductive reasoning moves from general principles to specific conclusions. It begins with premises that are assumed to be universally true and applies them to individual cases. If the premises are true and the reasoning is valid, the conclusion cannot be false. For example: P1: All humans are mortal, P2: I am a human C: Therefore I am mortal. \[ \forall x (H(x) \rightarrow M(x)), H(x) \models M(x) \]

All the argument displayed before where deductive moving from axiomatic assumptions to individual elements or objects.

Induction

In contrast, inductive reasoning moves from specific observations to broader generalizations. It attempts to infer a universal rule or principle from limited data, and thus does not guarantee the truth of the conclusion — even if all observed instances support it. Induction tries to generalize from the instances (x) we observe to all \(\forall\).

As such, if I observe twenty white swans therefore all swans are white.

\[ (Observe(x) \rightarrow WhiteSwan(x)) \models \forall(Observe(x) \rightarrow WhiteSwan(x)) \] This reasoning is not valid. The observation of one swan with any other colour would negate the whole argument. This issue is known as the problem of induction. Hence, how can I justify my conclusion to be universally true based on a limited set of instances? This means inductive arguments might not be valid or sound.

Abduction

Abduction might be better seen in light of the Bayesian and looks more like affirming the consequent. If we state \(W1(a)\) 50% of the swans are white or \(W2(a)\) 90% of swans are white. We take 20 observations and see all are white we might therefore infer 10% of the swans are black.

\[ WS1(a) \vee WS2(a), ((Observe(x) \rightarrow WhiteSwan(x)) \models WS2(x) \] Of course also this is invalid because we might actual have sampled a wrong location where only white swans occurred. However, it can be derived more quantitative. This makes abduction appealing in practice, although often not applied due to its subjective perception of how WS1 or WS2 are derived.

3.5 Belief, information and facts

3.5.1 Senses and emotions

I believe that beliefs are formed through impressions and sensory experiences, shaped by our emotions. The way I perceive the world is influenced by what I experience through my senses, and these perceptions are further colored by past experiences and emotional states — both present and past. Beliefs are not static; they can also play a functional role in guiding future actions and decisions.

Moreover, beliefs may be formulated or sustained to produce desirable emotional states — such as comfort, hope, or confidence — or to achieve social benefits, such as identity reinforcement or enhanced status. This suggests that beliefs are not always purely reflective of objective reality but may instead be constructed or maintained for normative or instrumental reasons.

For example, cognitive biases often arise when new sensory experiences conflict with existing beliefs. In such cases, we may suppress or reinterpret this new information in order to preserve older, emotionally anchored beliefs. This behavior indicates that belief is not only descriptive but also normative and, at times, self-serving.

3.5.2 Representation and exprience of beliefs

In general beliefs that in this moment are hold in the mind are termed explicity believes. Believes that are current not in the mind are termed implicit believes. For example, we know that grass will not often be purple or red. Therefore if we are asked about non common colours of grass we might give the answer green. Yet we do not actively think about as grass not having a white colour.

Furthermore that I remember my past in the form of mental representations-perception (figures or movies) is regarded not the same as believing that I read that ‘most grass is green’. While both might be classified as believes the former described as episodic believes and the other as semantical believes. Both are believes but manifest in different ways. That most believes are communicated via semantics is a consequence of not being able to directly share particular beliefs.

It might also be possible to regard the belief about a proposition also possible for paintings, figures and other forms of expressions. For example propaganda during th WWII can be seen a propositions to generate belief about the Allies or Axis. This seemingly more acceptable in informal logic than in formal logic (Puppo 2019).

Through this text I speak about the believe in propositions this however does not mean that believe is only possible in and about propositions.

3.5.3 Belief

Some philosophical theories — such as the dispositional account of belief (Silva 2024) — argue that beliefs are characterized by a tendency to act: to assert the proposition, to deliberate with it, or to behave in ways consistent with it. Yet this view seems to rely on a prior assumption: that the proposition being acted upon is taken to represent something about the world. If a belief had no representational content — no relevance to one’s understanding of reality (including social, cultural, or religious contexts) — it is unclear why it would influence behavior at all. Thus, even dispositionalist theories seem to rest upon a representational core.

A belief, then, can be understood as an attitude toward a proposition (or other abstract representation as discussed before). It is a mental stance regarding how that proposition relates to the world. In this way, belief can be best described as a mental model or picture of reality. Whether this model reflects the world as it is or as it ought to be remains an open question. Often, belief is treated dichotomously: we either accept a proposition as true or we do not. For instance, if I believe the proposition “the grass is green”, then I regard it as a true statement about the world.

3.5.4 Belief and information

Belief does not have to represent ‘knowledge’. Assume I have thermometer in my aquarium that indicates always the correct temperature. I can fool the aquarium thermometer by placing it in a small plastic box with higher water temperature in the aquarium, or directly next to the heating element. The thermometer does not indicate the temperature of the aquarium, and this would surely indicate a false believe in the aquarium. Yet we gave it the sticker aquarium temperature. If would give it the name hydrometer to measure the air humidity. Indeed we might say that is not incorrect but we make the who placed it in the wrong positions. Hence, it is our job to place and read the information the thermometer represents us correctly .

The thermometer (or instrument) transfers information not believes. The information that is transferred is the information of the fact that the temperature is x degrees Celsius. In the same sense assume we have a frog and a frog has the instinct to try to jump and eat every small shade that moves by. Assume that small shades are created by the leaves of the trees and the frogs tries to jump and eat it. We would not suggest the frog is broken. The information that was transferred was insufficient for the frog to distinguish the dark spot of a shadow from a fly (adapted from, Bernecker and Dretske 2009).

Does this mean the frog is broken or not functioning properly? I think not, the frog simply jumps and tries to eat it. It does not see these spots as more than they are unless it believes they are flies. The only difference is that the shadows cannot be eaten. There is nothing wrong as long there is no expectation (or goal): all shadows are flies. Similar we are not wrong by suggesting the medium in the thermometer stopped at the symbol of 40 degrees Celcius as long a there is no expectation (or goal): the symbol represents the temperature in the aquarium which is 25 degrees Celcius.

The only thing that the liquid in the thermometer (for us) and black dots (for the frog) have in common is it relation to the state-of-affairs in the world. The information in itself not ‘wrong’ or ‘right’, ‘true’ or ‘false’. Information interpreted in the context of information theory transfers information through cables or wireless signals (Stone 2015). Assume that a person calls or text me, the voice I hear or the text I get is not the person actually speaking or reading. Even if I was to hear the person speaking the movement of the air reaches my ears where the movement of the … transfers it to signals which the brain interpreted. It could be said that through our senses we cannot truly experience the world. The flow that might reflect the state-of-affairs to me is then suggested to be information.

This information also does not have to be believed to be or stand in some relation to the world. Assume I cannot speak mandarin. The person in front of me can perfectly write mandarin an gives me a text. In a corresponding book I search for the proper text for a response and write this back (akin to the Chinese Room argument). In this sense this might be similar to what Large Language Models (LLM) do. Furthermore, I would not know if I would not do the same under the assumption the universe is purely deterministic (Lamme (2011) presents ‘the frog’ example as well). Yet, the argument shows that we do not have to hold believes about the relation of the information to the world at all.

Assume I am unaware a thermometer of my aquarium is broken, however it does indicate the correct temperature. After reading the temperature I would say I ‘know’ the correct temperature. Everybody unaware that the thermometer is broken would assume I am justified to suggest the temperature reading is correct.

Of course, some approaches — such as Bayesian epistemology — interpret belief in terms of degrees of credence, assigning probabilities to propositions. This works well for events that can be framed in terms of empirical frequency or proportions — such as the likelihood of drawing a black marble from a jar. However, this probabilistic interpretation becomes less coherent when applied to abstract entities. For instance, I prefer to express the information I have about a slope of a linear regression in possibilities. Yet, the slope is a theoretical construct derived from data — it does not ‘exist’ in the same way a marble does. Thus, beliefs about such constructs are not directly about world but about beliefs of what these models represent of the data. This raises questions about what these beliefs actually represent and related to the world.

Contemporary debates on causality, such as those involving Judea Pearl’s framework (Pearl 2009), often result in finger-pointing between philosophers, statisticians, and scientists (Pearl and Mackenzie 2019). These disputes frequently bypass a fundamental question: what exactly are ‘cause’ and ‘effect’ in the physical world? Seemingly because it prevents progress (Pearl and Mackenzie 2019). Yet, without clear definitions, critiques across disciplines often lack any substance. If we are arguing about something that has no content no definition it is impossible to discuss it. The same account is about ‘knowledge’.Consider I claim to ‘know’ that I wrote this text, then I am expressing that I remember writing it — i.e., I possess information tied to that act of writing. To distinguish ‘knowledge’ from ’held information (i.e., slope=y/x) leads to greater confusion rather than clarity.

The classical definition of knowledge as Justified True Belief (JTB)—though long challenged by Gettier cases—complicates the matter further. A more straightforward solution might be to discard JTB and the ideas defining of knowledge (Lemos (2021) does not consider this to beneficial) and instead treat ‘knowledge’ as a historical term. For example, consider this Gettier-case like scenario that the thermometer of the previous example is in the aquarium, the aquarium is 25 degrees Celsius (Truth). I read thermometer indicating 25 degrees (Justified), I believe it is 25 degrees Celsius (Believe), but the thermometer is broken. Clearly I would be wrong at any other temperature, but since the water temperature is so constant this is not the case. Can we consider this than ‘knowledge’? For the information provided it matters how we acquired a piece of information and what source supports it. Importantly, information need not be true or justified in a strict sense. While we may question whether information is reliable enough to inform a model of the world, there is no objective requirement for such reliability — only a normative one tied to specific goals.

If we insist that knowledge must be justified, then we are assuming that a sentence must be either a ‘true reflection’ or an ‘adequate representation’ of reality. Hence, the sentence has to stand in relation to something it needs to have sense. In such a framework, justification is the relational bridge between information and a particular purpose. But this leads directly to the problem of infinite regress: we must then define what we mean by ‘true’ and ‘adequate’, and ensure that these definitions are not themselves subject to the same ambiguities and vagueness. Otherwise, they may be applied inappropriately to cases or conditions to which they do not belong.

Take, for instance, the cliché ‘correlation is not causation’. A correlation alone does not belong to an objects (case or condition) causation. Thus, ‘cause’ and ‘effect’ are part of a linguistic framework — a language game, in Wittgensteinian terms. We must define the conditions under which these terms are justified. Pearl’s (2009) work, then, can be viewed as an attempt to formalize the syntactic structure under which the semantics ‘cause’ and ‘effect’ are descriptors, but for what exactly? Pearl (2009) refers to the ontological structure of reality: “because causal relationships are ontological, describing objective physical constraints in our world”. This would mean ‘cause’ and ‘effect’ are physical. However, I could perfectly argue that the properties of an object P of object x interact with the properties Q of object y as in \(P(x) \wedge P(y) \rightarrow R(x,y)\), for example precipitating \(Ca^{2+}\) salt as \(CaCO_3\)from water by adding \(Na_2CO_3\) via \[ Ca^2(aq) + CO_{3}^{2-}(aq) \rightarrow CaCO_3(s) \] The implication sign is than nothing more that a way to describe ‘cause’ and the effect would be \(R(x, y)\). This means ‘cause’ and ‘effect’ are linguistic games. We could as well suggest that cause and effect are meta-physical transcendental entities. That are there but cannot be observed by us (like Platonism).

Therefore we need to clarify what ‘cause’ and ‘effect’ are:

Descriptive (1.): Cause and effect are a language to describe that observation \(y\) is often or always followed by \(x\),

Physical (2.): Cause and effect are physical cases/objects

Metaphyscial/Transcedental (3.): Cause and effect exist beyond empirical observations.

Each position carries its own philosophical commitments and pitfalls. My point is not to resolve the debate (I will end it here as I am not here to discuss causality), but to highlight that ambiguity and vagueness in the use of the terms ‘cause’ and ‘effect’ signify. Perhaps as well as knowledge we coul also abandon it in most cases as it leads to mis-communication about what is meant and what expectations arise. The consequence of this ambiguity and vagueness is the use of statistical tools (p-values, Bayes factors and intervals) to imply causes via ‘significant effects’ — not the tools themselves, but the assumptions smuggled in through their interpretation are an issue.

A as a consequence to believe something about the world (given a proposition) is to make an ontological (and epistemic) commitment. We express our beliefs through symbols — words, numbers, figures — and thereby assume a relation between these symbols and the world. One possible philosophical position is to adopt a form of extreme realism (e.g., Platonism), where symbols such as the correlation coefficient reflect imperfect versions of ideal forms accessible only through reason. This view implies that relationships between variables like \(x\) and \(y\) are metaphysical in nature, and our human faculties only approximate them. The language we use reveals our ontological stance. In this context, I highlight two prevalent orientations:

Nihilism: An indifference to philosophical coherence or ontological reflection, often guided only by hedonistic utility (adapted from Gertz, 2019). Dogmatism:An appeal to authority or fixed ideology—whether realist, anti-realist, pragmatist, etc.—used to justify beliefs and actions (Sienkiewicz, 2019).

I intentionally exclude Skepticism from this classification. I contend that no one is a true skeptic in practice, though skepticism remains a valuable methodological tool for examining beliefs, ontologies, and openness to alternate viewpoints. As such, skepticism—though not adequately defined here—serves as a recurring theme throughout this text.

Consequently, I adopt a more traditional, dichotomous view of belief: we either hold a proposition to be true or we do not. This view aligns most closely with representationalism, which I take as foundational. The core of this position can be summarized formally:

\[ \text{At least part of what it is for an moral agent S to believe proposition } \mathcal{P}_i \text{ is for S} \text{ to take } \mathcal{P}_i \text{ as an adequate representation of the world }\omega \]

In contrast, dispositionalism claims that to believe \(P_i\) is to be disposed to assert it, act on it, or treat it as a premise in deliberation (Silva 2024). Yet this still presumes that the agent regards \(P_i\) as a adequate representation of \(\omega\). One would not be disposed to act on a proposition unless one first represents it as true or significant in relation to the world. Therefore, I argue that even dispositionalist accounts depend on — and cannot escape — an underlying representationalist framework.

3.5.5 Knowledge

\[\emptyset\]

3.6 Truth

3.7 Meaning

3.8 Objective language and normativity

Language and the beliefs shaped by it play a role in sharing information. The objectivity of the language and chosen words conveying the beliefs, intent, status, knowledge and moral and political position of the author. It is therefore used to convey meaning. Broadly there could be three levels of objective uses of language.

(1.) Objective language uses statements of objects, stones, fish, rivers and Subjective language to feelings and emotions such as anger, happiness, good or despair.

(2.) Objective language is used when one speaks of objects other than what is in the mind of a moral agent. For example, I saw a fish swimming, or your dad was angry are objective, while Subjective language would be, I think your dad is angry or I think I feel sad.

(3.) Objective language is used in so far it is not coloured by emotions. For example an ‘Temperature is negatively correlated to invertebrate species number’ is objective (see also the lexicon used by (Markowitz and Hancock 2016)), while Subjective language of the same information would be ‘Worldwide anthropological altered temperature increase causes extreme negative effects on biodiversity’.

Most statements in ecology are largely objective in 2. but subjective in 3. Also, while objectivity to 1. is often implied utterances contain subjectivity connected to 3. Hence, ‘good’ ecological quality, ‘natural’ or ‘balanced’ ecosystems are in itself judgments on ‘naturalness’ and ‘goodness’ (Shrader-Frechette and Mccoy 1994). Yet, superficial reasoning does not not make ends meet. It needs either full proof of why this moral (normative) judgment is required and why articles presenting such claims are more desirable than the ones that pursue a sense of objectivity.

Additional 2. provides difficulties in that we end up in peril when talking about our data and models. Assume we sample the number of species in relation to an independent variable (e.g., temperature), then the set of numbers is subjective in 2 and 3 if we would speak about the outside world. A correlation coefficient is not the world, the data behind it is not the world. There are not data and correlation coefficients floating around in the universe. At least, that is the belief to what I onto-logically commit myself. Unless it can be shown (and proven) the correlation coefficient is part of the world independent of the mind of the moral agent (realism versus anti-realism).

We end up only being able to use 1 by speaking of them as what their definitions imply, they do not are what meta-physical or ethical concepts we believe they ‘ought’ to represent of the world (causes, effects, importance, meaning, value, reality). This means that all conclusions derived are subjective and even invalid as there are no causes, effects, importance, meaning, value or reality involved.

The issue being that the object language - the set of numbers and model - is not the meta-language but interpreted as such. Indicating we end up with the issue of model reification where the sets and models are not the objects they describe, but treated as they are reality (meta-physical entities). The earlier sentence ‘Temperature is negatively correlated to invertebrate species number’ is then subjective because in the objective sense what can be assessed is that ‘the estimated correlation coefficient is negative’. Hence, we choose to believe that what ought or should to be the case is that invertebrate species number is negatively relates to temperature, we cannot state it is negatively related because we never observed ‘the decline’ or ‘negative relation’ with species richness. We only observed a number (correlation coefficient) that represents a relational predicate. The problem is that observing a relation predicate does not necessarily have to dogmatically generate a belief of the world, followed by an observing negative estimate.

3.9 Cogency of arguments

As explained before deduction on empirical data is not possible so no valid or sound argument can be generated. The universal quantifier ‘\(\forall\)’ (largely?) unsupported and therefore \(\exists x(P(x) \rightarrow Q(x))\) where ‘\(\exists\)’ some. This indicates that propositions are merely acceptable. And in most cases ∃ can be ignored and as it would be wises to explicitly assume it. Meaning that \(𝓟_i: (P(x)\rightarrow Q(x))\) or \(P \rightarrow Q\) and referred to as a conjecture. A conjecture is a proposition which is accepted as true via a valuation function \(v(𝓟_i)=T\) (True) or false \(v(𝓟_i)=F\) (False).

Empirical science developing theories built upon conjectures. Where a conjecture has to be supported by either party in an argument. If a conjecture cannot be accepted by one of the parties, discussion is meaningless and the source of the rejection of one of the parties should be resolved (Popper 1968).

Arguments are either weakly or strongly cogent (weak or strong arguments) as the conjectures cannot be guaranteed in empirical sciences. Strong arguments are then arguments where most conjectures are acceptable (e.g., temperature is 0 degrees when the thermometer works and indicates 0 degrees).

The arguments proposed are indicated as \(𝒜\). And \(𝒜: 𝓟_1, …, 𝓟_{n-1} \models 𝓟_n\) should entail the truth of the conjectures within the derived conclusion. And when this is the cases such an argument is then valid \(valid(𝒜)\) when all parties agree on the truthood \(v(𝓟_i)=T\) (True) of the conjectures involved.

The problem is that the empirical sciences has no axiomatic foundation from which the proof of an argument can be logically deduced. In the sense that I believe the the conclusion (\(𝓟_3\)) of the following argument. \(𝓟_1\): All humans die, \(𝓟_2\): I am human, conclusion (\(𝓟_1 \therefore 𝓟_2\)), therefore I die (\(𝓟_1→𝓟_2, 𝓟_1 \therefore 𝓟_2\)). The conclusion forming a conjecture in my belief system and that of others. However, it does not ensure the soundness of the argument, where soundness refers to the factuality of all propositions. Yet, having not known all humans ever lived or going to live the argument is not sound. Moreover, it is questionable if reasoning deals with valid conclusions. Where one better can speak about cogency of the argument.

The agreement among parties - rather informally - forming a belief system that would be a function of acceptable (cogent) conjectures (Usó-Doménech and Nescolarde-Selva 2016). The conjectures tied together for each individual moral agent (Carthwright et al. 1996; van Fraassen 1980, Closely representing the nodes and edges in a network).

The moral agent then entertains the relations between propositions and arguments. Counter intuitively, the moral agents also appoints a property to an element x. Hence, a colour blind person would not see the specific colours a person that is not color blind would see, therefore not appoint the same colour property. Or, if a person does not know the function of a wrench, and therefore does not appoint the property wrench to it. If a property is not appointed to an element then the cogency (or validity) of an argument does not follow because without a property on x, the element x remains inert and undefined. Therefore,\(v(𝓟_i)=T\) or \(v(𝓟_i)=F\) remains undecided because \((x \rightarrow Q(x)), x \models Q(x)\) does not follow. Hence, each element is appointed a property, each proposition is valued, woven into an a cogent or not cogent (strong or weak) and formulated as a belief system.

3.10 Belief structure of reasoning

As highlighted earlier, the agreement among the beliefs of individuals and groups forms what we might call a belief system — a structure dependent on a set of acceptable propositions called conjectures. The way our beliefs are organized directly influences how we reason.

Figure 2: Examplified structure of a belief system

A ‘string; of beliefs can be traced from a foundational or “hinge” proposition (labeled ’A’ in Fig. 2), which supports subsequent beliefs. If this foundational belief turns out to be false, all dependent beliefs — B, C, D, etc. — are undermined. This reflects the logical principle that if any proposition in a chain of implications \(𝓟_1, …, 𝓟_{n-1}\) is false, then the conclusion \(𝓟_n\) in \(𝓟_1, …, 𝓟_{n-1}\models 𝓟_n\) does not follow.

For example, if \(A(x) \rightarrow B(x)\) and the valuation \(v(A(x))=F\) then logically \(A(x) \rightarrow \neg B(x)\). This view reflects what Harman (1986) refers to as the foundational theory of belief revision: a belief in a proposition \(𝓟_n\) needs to be supported by another proposition \(𝓟_{n-1}\).

It turns out that people do not follow this system most of the time and actually search to minimal reconstruct their believes system (Harman, 1986). This is termed the ‘maxim of minimal mutilation’ (Ferrari and Carrara 2025). For example, suppose we have \(F(x) \rightarrow H(x)\) and \(v(F(x))=F\). This leads to \(F(x) \rightarrow \neg H(x)\) threatening our belief in \(H(x)\). To preserve our believe in \(H(x)\) we might adopt a new proposition \(G(x) \rightarrow H(x)\), provided it does not conflict with existing beliefs. This is the basis of coherence theory in belief revision (Harman, 1986): people tend to make minimal changes that preserve the overall structure and coherence of their belief system. This can also be refered to all. We can draw parallels to a system van Fraassen (1980) sketches.

Consider a practical example. Suppose I believe I am bad at cooking because people dislike my food: \(DislikeFood(x) \rightarrow BadCooking(x)\) However, I later realize that some people dislike vegetarian food specifically. In contrast, vegetarians enjoy my cooking. The original belief no longer holds because: \(\neg DislikeFood(x) \rightarrow BadCooking(x)\). Therefore I need to revise all my beliefs that follow from my \(BadCooking\). According to the foundational theory of belief revision I need to drop the belief I am bad at cooking. According to the coherence theory of belief revision I could rationalize that vegetarians “have no taste,” preserving my original belief. This mirrors the logic shift from \(F(x) \rightarrow \neg H(x)\) to \(G(x) \rightarrow H(x)\).

A hybrid view can integrate both the foundational and coherence models. Beliefs can be seen as clusters (e.g., D, E, F, H, I, J), connected via strings of core beliefs (A, B, D). Different set of strings and clusters can be connected. For example, the left cluster connects to the right cluster (A2, …, J2) bridging beliefs.

Some beliefs (propositions) are accepted tentatively — neither fully embraced nor rejected. However, in practice, people often fully accept propositions because suspending judgment creates cognitive dissonance. Dewey (1997) notes that suspending belief is uncomfortable; full acceptance gives us peace of mind and reduces cognitive load (Harman, 1986).

Yet, full acceptance of a belief implies the need for justification. We must ask: Why do we accept this proposition? And how do we justify it?

3.11 Skepticism

I will here focus my skepticism to skepticism against philosophy (or meta-physics), although I am convinced that most things we believe are meta-physical. Skepticism proposes a system in which the properties, propositions, or arguments are ‘attacked’. Considering the structure of a valid semantic argument \(𝓟_1, …, 𝓟_{n-1} \models 𝓟_n\) an attack tries to invoke disagreement so that the ground on which the properties within a proposition, and the propositions on which they are built cannot be considered true or false. Since valid argument is a believable argument unable to valuate the truth or falsehood of a proposition makes it impossible to asses its validity \(𝓟_1, …, 𝓟_{n-1} \models 𝓟_n\) or invalidity \(𝓟_1, …, 𝓟_{n-1} \not\models 𝓟_n\). Since, neither truth or falsehood can be asses one can neither judge the conclusion to follow from it. Unable to close an argument leads to suspense in judgement on 𝓟_n$. What remains unclear is that whether the skeptic aims at the suspense of judgment on the factuality of the propositions, the validity of the argument or both. I would assume both are a target because it is eventually we want to talk about the factual universe not about the validity of an argument.

The dogmatist - as opposed to the skeptic - always claims to have ‘knowledge’ or ‘truth’ but never able to show it upon attack. Nobody is really a full skeptic, nobody a full dogmatist therefore an absurd skeptic is unable to accept the suspense of judgement always in pursuit of refinement. The absurd skeptic searches for a different kind of meaning to ever regress propositional knowledge, perhaps to the annoyance of peers and willing ignorance of others due to the painful suspense of judgment (Dewey, 1979). It is willing to challenge its belief by entertaining ideas beyond dogma. It is both a revolt against to stagnation (status quo) and dogmatism.

3.11.1 First line of attack

I will place the following sections under the first line of attack. These sections highlight different methods (3 modes) to causing doubt in the believed conjecture. Hence, if the conjecture cannot be believed to hold then it logically follows that \(𝓟_1, …, 𝓟_{n-1} \not\models 𝓟_n\).

The problem being the attachment of identity to knowledge (belief a proposition is true forming a conjecture). Hence, if one of the conjectures does not hold anymore and any belief that follow from it to \(v(𝓟_i)=F\). This attacks the identity of the moral agent often leading to conflict. As such the logical counter argument cannot be accepted which in my experience often invokes an appeal to common practice or authority. I believe this to be the case because we believe the social hierarchy holds with identity. But to admit that most of the arguments we pose are invalid means to give up on social hierarchy.

3.11.2 Mode of disagreement

The generation of an initial proposition \(𝓟_1\) and counter proposition \(𝓟_2\) generates a source of disagreement. For example, species x (P(x)) occurs more in freshwater (Q(x)) than in brackish water (~Q(x)). The latter can be utilized in the mode of disagreement either based on a dogmatic argument to disagreement between ‘experts’ of what brackish is, or on the basis that there is no demarcation line as salt in surface water is a concentration not a dichotomous anthropological defined boundary. Falling to the mode of relativity. Since experts decide whether x is Q(x) and ~Q(x) the ground of it relative. Either the expert provides full proof without contradiction or the expert cannot use the belief system with falling back to an appeal to authority.

The grounds for \(v(𝓟_1)=T\) not known \(𝓟_1, …, 𝓟_{n-1} \not\models 𝓟_n\). What is questioned is the valuation function \(v(...)\) assigning a ‘truth’ value to a proposition and based on what grounds does the ecologist decide \(v(𝓟_i)=T\) or \(v(𝓟_i)=F\). The state of this proposition called a ‘final variable’ or by De Groot (1998),describing the acceptance of \(v(𝓟i)=T\) without requiring any more normative justification in similar fashion Wittgenstein referred to it as a ‘hinge proposition’. Hence, a source of disagreement is almost sure to appear when searching for foundation or support of a ‘basic belief’ (in foundationalism source).

Equipollence

I consider the mode of disagreement a way to start the ‘attack’. I do no specifically consider it a mode. The principle of equipollence is for example a way to start or end an ‘attack’. Perhaps equipollence is to be situated on similar grounds as the mode of hypothesis, which will be addressed later. A skeptical ‘attack’ starts with often proposing an equipollent argument. An equipollent argument being with the same structure and weight, but absurd conclusion. Often they can be used to highlight an informal fallacy, such as an appeal to authority, common practice or reduction to the absurd . If an equal weighted equipollent argument has the same persuasion power and cannot be accepted on the same grounds, suspense of judgement should reasonably follow on the original argument.

Equipollent arguments are however, not easily related to only the formal structure of an argument of a logical valid argument. A logical valid argument is \(C(x)→W(x), C(x) \models W(x)\) has the same form as \(U(x)→G(x), U(x) \models G(x)\) which are both valid. In plain language the former argument could be, A cactus (C) needs water to grow (W), there is no water, therefore it will not grow (modes tollens). This is as valid as the argument, my unicorn (U) can only fly when eating grass (G), my unicorn did not eat grass, therefore it cannot fly. An equipollent argument, thus attacks not only the validity of the argument, but mainly the context of the predicates or hidden premises. If I give the cactus salt brine water it will grow, which is absurd, in this case the argument is incomplete. As the argument is not complete it does not guarantee the the conclusion under all conditions. It is then reasonable to suggest it is internally valid iff these propositions would make up the context of the argument the moral agent refers to, but this is also absurd because that is not how the world works. Hence, the skeptic also attacks the soundness of the argument by pointing to the non-factual status of the propositions. Just as equipollent arguments highlight the incompleteness of premises, the problem of induction demonstrates that a finite set of observations cannot determine universal truths. This reinforces the skeptic’s claim that absolute certainty is unattainable.

3.11.3 Mode of hypothesis

The mode of hypothesis asserting that if observing worldwide biodiversity decline (B) is caused by humans (H) it is equally evident that observing no worldwide biodiversity (~B) occur if no humans are present (~H) is equally valid: \(𝓟_1: \forall x (B(x) \rightarrow H(x))\) or \(𝓟_2: \forall x (~B(x)→~H(x))\). If one has not proposed the counter proposition (hypothesis) and did not consider observations on \(𝓟_1\), how to then to unbiased (with a straight face and conscience) suggest \(𝓟_2\) is not equally likely?

Any proposition (\(𝓟_1\)) accepted as conjecture in an argument where a counter proposition (\(𝓟_2\)) has not been addressed can only be tautological. Since (\(𝓟_2\)) might be a possibility, and \(𝓟_2\) has been asserted then the commitment to believe \(𝓟_1\) might be the wrong. Since this commitment might be potential wrong, it does not guarantee the conclusion, suspense of judgment should reasonably follow.

3.11.4 Mode of circularity

If we can define support for \(𝓟_1\) via \(𝓟_2\) till \(𝓟_n\), where \(𝓟_n\) is support for \(𝓟_1\) we have a circular argument. For example, \(𝓟_1 :(D0(x)\rightarrow F(x))\), \(𝓟_2: (F(x) \rightarrow C(x))\) and \(𝓟3: (C(x) \rightarrow F(x))\). Here \(𝓟_1\): When it is 0 degrees D0(x) water (x) freezes F(x). And \(𝓟_2\): when the water is frozen F(x) it crystallizes C(x) and when the water is crystallized it is 0 degrees D0(x). Thus, when it is 0 degrees water freezes and when the water is frozen it is zero degrees \((D0(x)\leftrightarrow F(x) \leftrightarrow C(x))\). We cannot verify beyond the propositions. The same argument is used to defend a dogmatic boundary for ‘significance’ highlighted by Wasserstein and Lazar (2016).

Q: Why do so many colleges and grad schools teach p = .05? A: Because that’s still what the scientific community and journal editors use. Q: Why do so many people still use p = .05? A: Because that’s what they were taught in college or grad school.

The same issue accounts for other forms of boundaries. Hence, a BayesFactor01 < .05 could be raised to the same dogmatic status.

3.11.5 Mode of infinite regress

The claim to any proposition \(𝓟_1\) requires support by \(𝓟_2\) and \(𝓟_2\) by \(𝓟_3\) \(...\) \(𝓟_{\infty}\). Assume the following example \(𝓟_1\): Grass is truely green, \(𝓟_2\) green light are particles having a particular wavelength, \(𝓟_3\) the particles have this wavelength because \(...\). When we regress till infinity we cannot support \(𝓟_1\) because \(𝓟_{\infty}\) is unknown. Therefore we cannot suggest that the grass is truly green because we do not know what greenness truly consists of \(𝓟_{\infty}, ..., 𝓟_2 \not\models 𝓟_1\). Since, the soundness of \(𝓟_1\) cannot be supported based on a known foundation, judgement should reasonably follow. This does not mean infinite regress is a ’bad; thing. It highlights the weak basis arguments are founded on. Another example, \(𝓟_1\): A BayesFactor10>20 is strong support, \(𝓟_2\) >20 indicates >95%, \(𝓟_3\) we use >95% because \(...\). Even a pragmatic reason for 95% would need a supporting proposition. For example, 95% is uses because 99% is too unstable in the iterations of the Markov-Chain-Monte-Carlo and then we can invoke the mode of hypothesis why is not 96% used? or mode of relativity what does 95% represent of reality?

This highlight any believe in a proposition \(𝓟_n\) under some condition \(c \in C\) in an argument \(𝓟_1, …, 𝓟_{n-1} \models 𝓟_n\). Can be suspended based on any other premise that is not certain/conclusive and therefore \(𝓟_n\) does not follow.

More formally, for any claim from a subject (\(s \in S\)) there exists at least some propositions of disagreement over the claim under particular conditions so that the disagreement can be exploited in the mode of hypothesis (MH), Mode of Circularity (MC) and mode of infinite regress (MI). \[ \forall s \forall 𝓟 \exists (Claim(s, 𝓟_n) \rightarrow Disagreement(Claim(s, 𝓟_n), c) \rightarrow \{MH \vee MC \vee MI\} \\ MH: Pr(𝓟_n, s)/Pr(\neg𝓟_n, s) \neq \{0, \infty\} \rightarrow \neg Valid(𝓟_n) = Undecided \\ MC: Valid(s, 𝓟_1, …, 𝓟_{n-1} \models 𝓟_n) \ 𝓟_n \in \{ 𝓟_1, …, 𝓟_{n-1} \}) \rightarrow \neg Valid(𝓟_n) = Circulair \\ MI: \forall𝓟_m \exists𝓟_n (Valid(s, 𝓟_m \models 𝓟_n) \wedge 𝓟_m < 𝓟_n ) \rightarrow \neg Valid(𝓟_m)= Non-foundational \]

This formal expression can be more clearly visualized in a diagram as in Fig. 3 below.

Figure 3: Structure of a the skeptical network that can be applied to suspend judgment on propositional claims.

There were previous chapters addressing ‘objective language’, the relation between ‘information, belief and facts’, ‘truth’ (still under construction) and ‘normativety’. They indicate the concepts that are normative such as what should be and what is.

4 Statistics (analytical and empirical Bayes)

In most cases the theory and background is often not provided in many cookbooks, but makes it impossible to interpreted, criticize our results. This section introduces some theory. However, if you are already familiar with it or find it too technical, feel free to skip it. Since this R-package primarily uses Bayesian inference, an introduction to Bayes Theorem and methods used may be of interest. While it is not necessary needed for the utilization of the R-package itself.

4.1 Classical statistics and estimation

All statistics focuses on estimating the parameter of interest \(\theta\) which in most (G)LMs is denoted as the parameter \(\mu\) or \(\beta\). For consistency I will use the parameter of interest \(\mu\).

The population parameter \(\mu\) is fixed but unknown. To investigate plausible values for \(\mu\), we collect measurements or samples. These observed data points, denoted as \(x = {x_1, \dots, x_n}\), are realizations of an underlying random variable \(X = {X_1, \dots, X_n}\), where each \(X_i \in \mathbb{R}\). We assume that these observations are independently and identically distributed (i.i.d.) from a common distribution.

\[\stackrel{\text{iid}}{\sim} N(\mu, \sigma^2)\]

Since we do not have \(X\) but only a set of realizations we need an estimator, which isthe sample mean \(\bar{x}(x)\). Hence, the sample mean would be

\[\bar{x}=\frac{\sum_{i=1}^n(x_i)}{n}\]

If x is indeed i.i.d. then \(\hat{x}\) would serve as an unbiased estimator for \(\mu\). Therefore, the sample mean has certain properties.

\[\mathbb{E}[\bar{x}] \ and \ Var(\bar{x})=\frac{\sigma^2}{\sqrt{n}}\].

Then according to the weak law of large numbers suggest that the probability of deviation from the population parameter decreases when sample size \(n\) increases till it eventually converges.

\[\lim_{n\to\infty} P\left(|\bar{X}_n-\mu|\geq \epsilon\right)=0\] This means that according to the central limit theorem \[Z_n=\frac{\hat{x}-mu}{\sigma/\sqrt(n)}\] converges to the standard normal under \(mu=0\). The probability of observing \(Z\) under a long-run of repetitions

\[P(-1.96\lesssim Z \lesssim 1.96)=0.95\].

Similar, the probability of the intervals of \(\bar{x}\) to cover \(\mu\) in a long-run of repeated experiments at 95% is

\[1 - c = P(\bar{x} > \mu - 1.96 \cdot \frac{\sigma}{\sqrt{n}}) ~\text{and}~ P(\bar{x} < \mu - 1.96 \cdot \frac{\sigma}{\sqrt{n}})\].

A visual explanation of this concept is provided in the following Shiny app: https://snwikaij.shinyapps.io/shiny/.

Furthermore, it is clear that statistics does do nothing with causality and the focus on error-control. Causality starts by satisfying theoretical conditions needed the arrive at believes in these concepts. Hence, error-control and causality start a-priori (Fisher 1949, Pearl 2009, Mayo 2018). Such a focus and framework is extremely useful if objectivity over repetitions are the goal and favorable. While classical statistics focuses on fixed but unknown parameters, error-control and objectivity of information, Bayesian methods extend this perspective by introducing prior information and viewing parameters as random variables. This shift opens the door to more flexible and informative inference, as explained in the next section.

4.2 Bayes Theorem and probablistic estimation

4.2.1 Bayes Theorem

Informally Bayes Theorem would be notated \(\text{Posterior}\, \text{probability} = \frac{\text{Likelihood} \cdot \text{Prior}}{\text{Evidence}}\). More formally Bayes theorem is often notated with A and B where P indicates probability and ‘|’ given or conditional on. \(P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}\).Other expression such as \(P(\theta|Data, Info) = \frac{P(Data|\theta) \cdot P(\theta|info)}{P(Data)}\) are to highlight that the posterior describes the information of that conditional on the prior information that is given in there.

The derivation of Bayes theorem relies on the axioms probability theory.

Premise 1)

\[ P(A | B) = \frac{P(A \cap B)}{P(B)} \] similarly

\[ P(B |A) = \frac{P(B \cap A)}{P(B)} \] Premise 2)

Also, the joint probability, expressed as a set-theoretic relationship on \(z\), indicates that element of both sets are the same.

\[ z = \{x : x \in A \cap B : x \in B \cap A\} \] thus

\[ P(A \cap B) = P(B \cap A) \] Premise 3)

In accordance with the previous

\[ P(A| B) \cdot P(A) = P(A \cap B) \] and

\[ P(B | A) \cdot P(B) = P(B \cap A) \] Conclusion)

Therefore

\[ P(A | B) \cdot P(A) = P(B | A) \cdot P(B) \] \[ P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)} \]

4.2.2 Bayes Theorem in use

The previous expression can help us with answer simple questions. Assume that there is a likelihood of 0.7 \(P(Species|<Threshold)\) that a species if found below a certain threshold. Furthermore, we also know that that the environment will only be found 0.3 or 30% of the time as \(P(<Threshold)\).How probable would it then be we are below the threshold if we observe the species \(P(<Threshold|Species)\)?

\[ P(Species|<Threshold) = 0.7\\ P(<Threshold) = 0.3\\ P(<Threshold|Species) = ?\\ \]

Expressing this in Bayes theorem would result in

\[ P(<Threshold|Species)=\frac{P(<Threshold|Species)\cdot P(Species) }{P(<Threshold)} \]

The only thing still required is \(P(Species)\) often called the ‘evidence’. Yet, this evidence is simply the total probability of observing a species, below and above the threshold. For this we can assume that this is the reverse of the \(P(Species|<Threshold)\) and \(P(<Threshold)\)

\[ P(Species)=[P(<Threshold|Species)\cdot P(Species)] + [(1-P(<Threshold|Species))\cdot(1-P(Species))]\\ 0.42=[0.7\cdot0.3]+[0.3\cdot0.7] \]

Then it is simply filling in the blanks

\[ P(<Threshold|Species)=\frac{P(<Threshold|Species)\cdot P(<Threshold) }{P(Species)}=\frac{0.7\cdot0.3}{0.42}=0.5 \]

The answer not very satisfying as the probability is simply a ‘coin toss’. This could be improved if we would introduce more species with the same indicative potential.

\[ P(<Threshold|Species)=\frac{P(<Threshold|Species)\cdot P(<Threshold) }{P(Species)}=\frac{(0.7^2*0.3)}{[(0.7^2*0.3)+(1-0.7)^2*(1-0.3)]}=0.7 \] We would need around five species to get an indicative potential of >0.95 (it would be 0.97).

4.2.3 Bayes theorem and conjugate priors

The previous example works for simpler approximations yet if we want to derive an interval for a particular parameter \(\theta\) then we can approach this analytically using conjugate priors. Where a prior is conjugate to a likelihood if the resulting posterior is in the same family as the prior.

As introduced, in statistics and estimation is about finding out the value for \(\theta\) which is assumed be \(\mu\). Where in the frequentist framework this is considered fixed and unknown, this is in the Bayesian framework considered to be random ‘and approximately’ known. Of course also in the Bayesian framework samples \(x\) are taken. Assume that we already know something about \(\mu\) then it is possible to restrict to exclude unreasonable values or for the information we have on \(\mu\) to more acceptable values.

\[ P(\mu|Data) = \frac{P(Data|\mu) \cdot P(\mu)}{P(Data)} \]

For a simple mean and variance an analytical approach can be used to derive the posterior given the likelihood and prior via the following equations.

\[\mu_{posterior} =\frac{\frac{\mu_{prior} }{\sigma_{prior}^2} + \frac{\hat{x}_{data} }{\sigma_{data}^2}}{ \frac{1}{\sigma_{prior}^2} + \frac{1}{\sigma_{data}^2}} \\ \sigma_{posterior}=\sqrt{\frac{1}{\frac{1}{\sigma_{prior}^2}+\frac{1}{\sigma_{data}^2}}}\]

Derivation:

Premise 1)

Bayes rule can be simplified to \[P(\mu|Data) \propto P(Data|\mu) \cdot P(\mu) \\ N(\mu_{posterior}, \sigma_{posterior}^2)=N(\mu_{sample}, \sigma_{sample}^2)\cdot N(\mu_{prior}, \sigma_{prior}^2)\]

Premise 2)

The PDF for the normal distribution is \[f(x)=\frac{1}{2\cdot \sqrt{\sigma \pi}}\cdot exp(-\frac{1}{2}(\frac{x-\mu}{\sigma})^2)\]

Premise 3)

\[Prior: P(\mu_{prior})=\frac{1}{2\cdot \sqrt{\sigma_{prior} \pi}}\cdot exp(-\frac{1}{2}(\frac{\theta-\mu_{prior}}{\sigma_{prior}})^2) \\ Likelihood: P(Data|\mu_{sample})=\frac{1}{2\cdot \sqrt{\sigma_{sample} \pi}}\cdot exp(-\frac{1}{2}(\frac{\mu_{sample}-\theta}{\sigma_{sample}})^2) \]

Premise 4)

Both \(\frac{1}{2\cdot \sqrt{\sigma_{prior} \pi}}\) and \(\frac{1}{2\cdot \sqrt{\sigma_{sample} \pi}}\) are scalars and can be left out of the equation.

Premise 5)

Since both exponent have the same base we can add the exponent \[(a^2+b^2=a^{2+2})\] resulting in

\[exp(-\frac{1}{2}\cdot[(\frac{\theta-\mu_{prior}}{\sigma_{prior}})^2+(\frac{\mu_{sample}-\theta}{\sigma_{sample}})^2]\]

After which brackets can be moved \[exp(-\frac{1}{2}\cdot[\frac{(\theta-\mu_{prior})^2}{\sigma_{prior}^2}+\frac{(\mu_{sample}-\theta)^2}{\sigma_{sample}^2}])\]

Premise 6)

Expanding the brackets terms

\[(a^2+b^2)=(a-b)\cdot(a-b)=a^2-ab-ab+b^2=a^2-2ab+b^2\] This means \[(\theta-\mu_{prior})^2=\theta^2-2\theta\mu_{prior}+\mu_{prior}^2\] and \[(\mu_{sample}-\theta)^2=\mu_{sample}^2-2\mu_{sample}\theta+\mu_{sample}^2\] which can be replaced in premise 5 \[exp(-\frac{1}{2}\cdot[\frac{\theta^2-2\theta\mu_{prior}+\mu_{prior}^2}{\sigma_{prior}^2}+\frac{\mu_{sample}^2-2\mu_{sample}\theta+\mu_{sample}^2}{\sigma_{sample}^2}])\]

Premise 7)

Separating each term by dividing by \(\sigma_{prior}^2\) and \(\sigma_{sample}^2\)

\[exp(-\frac{1}{2}\cdot\frac{\theta^2}{\sigma_{prior}^2}+\frac{-2\theta\mu_{prior}}{\sigma_{prior}^2}+\frac{\mu_{prior}^2}{\sigma_{prior}^2}+ \frac{\mu_{sample}^2}{\sigma_{sample}^2}+\frac{-2\mu_{sample}\theta}{\sigma_{sample}^2}+\frac{\mu_{sample}^2}{\sigma_{sample}^2})\]

Premise 8)

Group each term by the nominator

\[\frac{\theta^2}{\sigma_{prior}^2}+\frac{-2\theta\mu_{prior}}{\sigma_{prior}^2}+\frac{\mu_{prior}^2}{\sigma_{prior}^2}+ \frac{\mu_{sample}^2}{\sigma_{sample}^2}+\frac{-2\mu_{sample}\theta}{\sigma_{sample}^2}+\frac{\mu_{sample}^2}{\sigma_{sample}^2}= \\ \theta^2(\frac{1}{\sigma_{prior}^2}+\frac{1}{\sigma_{sample}^2})+ -2\theta(\frac{\mu_{prior}}{\sigma_{prior}^2}+\frac{\mu_{sample}}{\sigma_{sample}^2}) +(\frac{\mu_{prior}^2}{\sigma_{prior}^2}+\frac{\mu_{sample}^2}{\sigma_{sample}^2}) \] Since the last group is not dependent on \(\theta\) it is not in our focus \[ exp(-\frac{1}{2}\cdot[\frac{\theta^2}{\sigma_{prior}^2}+\frac{-2\theta\mu_{prior}}{\sigma_{prior}^2}+\frac{\mu_{prior}^2}{\sigma_{prior}^2}+ \frac{\mu_{sample}^2}{\sigma_{sample}^2}+\frac{-2\mu_{sample}\theta}{\sigma_{sample}^2}+\frac{\mu_{sample}^2}{\sigma_{sample}^2}= \\ exp(-\frac{1}{2}\cdot\theta^2(\frac{1}{\sigma_{prior}^2}+\frac{1}{\sigma_{sample}^2})+ -2\theta(\frac{\mu_{prior}}{\sigma_{prior}^2}+\frac{\mu_{sample}}{\sigma_{sample}^2}) +not\ dependent\ on\ \theta]) \]

Premise 9)

The goal is to derive \(P(\mu|Data)\) from \(P(\mu|Data) \propto P(Data|\mu) \cdot P(\mu)\) An the general exponential form of the normal distribution is given in Premise 2 and the premises 6, 7 and 9 lead to \[\frac{1}{2}\cdot \theta^2 (\frac{1}{\sigma^2})+\theta(\frac{\mu}{\sigma^2})+C= \\ \frac{1}{2}\cdot\theta^2A+\theta B+C\] the general exponential form for the normal distribution is always \(\frac{1}{2}\cdot\theta^2A+\theta B+C\) meaning that \(A=\frac{1}{\sigma^2}\) and \(B=\frac{\mu}{\sigma^2}\) and to obtain the standard deviation \(A\) needs to be re-arranged to \(\sigma = \sqrt{\frac{1}{A}}\) and to obtain the mean \(\mu=\frac{B}{A}=\frac{\frac{\mu}{\sigma^2}}{\frac{1}{\sigma^2}}\)

Conclusion)

In Premise 8 \[ exp(-\frac{1}{2}\cdot\theta^2(\frac{1}{\sigma_{prior}^2}+\frac{1}{\sigma_{sample}^2})+ -2\theta(\frac{\mu_{prior}}{\sigma_{prior}^2}+\frac{\mu_{sample}}{\sigma_{sample}^2})+C) \] In Premise 9 \[ \sigma = \sqrt{\frac{1}{A}}, A=\frac{1}{\sigma^2}\\ \mu=\frac{B}{A}=\frac{\frac{\mu}{\sigma^2}}{\frac{1}{\sigma^2}} \] Which implies that \[ \sigma_{posterior}=\sqrt{\frac{1}{\frac{1}{\sigma_{prior}^2}+\frac{1}{\sigma_{sample}^2}}}\\ \mu_{posterior}=\frac{\frac{\mu_{prior}}{\sigma_{prior}^2} + \frac{\mu_{sample}}{\sigma_{sample}^2}}{\frac{1}{\sigma_{prior}^2} + \frac{1}{\sigma_{sample}^2}} \] Another way to obtain the posterior including the sample size is via: \[\mu_{posterior}=\frac{\frac{\mu_{prior}}{\sigma_{prior}^2}+\mu_{sample}*\frac{n}{\sigma_{sample}^2}} {\frac{1}{\sigma_{prior}^2}+\frac{n}{\sigma_{sample}^2}}\]

Derivation:

Premise 1)

\[ Prior: P(\mu_{prior})=\frac{1}{2\cdot \sqrt{\sigma_{prior} \pi}}\cdot exp(-\frac{1}{2}(\frac{\theta-\mu_{prior}}{\sigma_{prior}})^2) \\ Likelihood: P(Data|\mu_{sample})=\prod_{i=1}^n \frac{1}{2\cdot \sqrt{\sigma_{sample} \pi}}\cdot exp(-\frac{1}{2}(\frac{x_i-\theta}{\sigma_{sample}})^2) \]

Premise 2)

Both \(\frac{1}{2\cdot \sqrt{\sigma_{prior} \pi}}\) and \(\frac{1}{2\cdot \sqrt{\sigma_{sample} \pi}}\) are scalars and can be left out of the equation.

Premise 3)

The likelihood is the product of \(n\)>1 random variables \(exp(a)\cdot exp(b) = exp(a+b)\) thus \(exp(a_i)\cdot, ...,\cdot exp(a_n)=exp(\sum_{i=1}^n(a_i))\).

\[ exp(\sum_{i=1}^n-\frac{1}{2}\cdot(\frac{x_i-\theta}{\sigma_{sample}})^2)=exp(-\frac{1}{2}\cdot\sum_{i=1}^n(\frac{x_i-\theta}{\sigma_{sample}})^2) \] Premise 4)

As in premise 6 of the previous derivation we expand all terms and ignore terms independent of \(\theta\).

\[ \sum_{i=1}^n(x_i-\theta)=\sum_{i=1}^nx_i^2-2x_i\cdot \theta +\theta^2 =\sum_{i=1}^nx_i-\sum_{i=1}^n2x_i\cdot\theta+\sum_{i=1}^n\theta^2= \sum_{i=1}^nx_i-2\theta\sum_{i=1}^nx_i+\sum_{i=1}^n\theta^2= -2\theta\sum_{i=1}^nx_i+n\theta^2 \]

Premise 5)

Substitute the expression back into the equation.

\[ exp(\frac{1}{2}\cdot[\frac{-2\theta\sum_{i=1}^nx_i+n\theta^2}{\sigma^2_{sample}}]) \]

Premise 6)

The posterior can then be rewritten as \(P(\mu|Data) \propto P(Data|\mu) \cdot P(\mu)\)

\[ exp(\frac{1}{2}\cdot[\frac{-2\theta\sum_{i=1}^nx_i+n\theta^2}{\sigma^2_{sample}}]) *exp(-\frac{1}{2}\cdot(\frac{\theta-\mu_{prior}}{\sigma^2_{prior}})^2)=\\ exp(\frac{1}{2}\cdot[\frac{-2\theta\sum_{i=1}^nx_i+n\theta^2}{\sigma^2_{sample}}+\frac{\theta-\mu_{prior}}{\sigma^2_{prior}})^2]) \]

Premise 7)

Expanding the term of the nominator in the prior and substitute it back in the previous equation. \[ (\theta-\mu_{prior})^2=\theta^2-2\theta\mu_{prior}+\mu_{prior}^2 \\ \frac{\theta^2}{\sigma_{prior}^2}+\frac{-2\theta\mu_{prior}}{\sigma_{prior}^2}+\frac{\mu_{prior}^2}{\sigma_{prior}^2} \\ exp(-\frac{1}{2}\cdot[\frac{\theta^2}{\sigma_{prior}^2}+\frac{-2\theta\mu_{prior}}{\sigma_{prior}^2}+\frac{\mu_{prior}^2}{\sigma_{prior}^2}+ \frac{-2\theta\sum_{i=1}^nx_i+n\theta^2}{\sigma^2_{sample}}]) \]

Premise 8)

Expand the last term and divide by \(\sigma^2_{sample}\)

\[ exp(-\frac{1}{2}\cdot[\frac{\theta^2}{\sigma_{prior}^2}+\frac{-2\theta\mu_{prior}}{\sigma_{prior}^2}+\frac{\mu_{prior}^2}{\sigma_{prior}^2}- \frac{2\theta\sum_{i=1}^nx_i}{\sigma^2_{sample}}+\frac{n\theta^2}{\sigma^2_{sample}}]) \]

Premise 9)

Group each term by its nominator

\[ exp(-\frac{1}{2}\cdot\theta^2(\frac{1}{\sigma_{prior}^2}+\frac{n}{\sigma_{sample}^2})+ -2\theta(\frac{\mu_{prior}}{\sigma_{prior}^2}+\frac{\sum_{i=1}^nx_i}{\sigma_{sample}^2}) +not\ dependent\ on\ \theta])\]

Since: \(\sum_{i=1}^nx_i=\mu_{sample}\cdot n\)

\[ exp(-\frac{1}{2}\cdot\theta^2(\frac{1}{\sigma_{prior}^2}+\frac{n}{\sigma_{sample}^2})+ -2\theta(\frac{\mu_{prior}}{\sigma_{prior}^2}+\frac{\mu_{sample}\cdot n}{\sigma_{sample}^2}) +not\ dependent\ on\ \theta]) \]

Conclusion)

From the steps 8 and 9 in the previous derivation we arive at

\[ \sigma_{posterior}=\sqrt{\frac{1}{\frac{1}{\sigma_{prior}^2}+\frac{n}{\sigma_{sample}^2}}}\\ \mu_{posterior}=\frac{\frac{\mu_{prior}}{\sigma_{prior}^2} + \frac{\mu_{sample}\cdot n}{\sigma_{sample}^2}}{\frac{1}{\sigma_{prior}^2} + \frac{n}{\sigma_{sample}^2}} \]

As might be clear this is less computational heavy than MCMC methods. For more then two parameter such an analytically approach becomes more cumbersome. And, if conjugacy is not satisfied no closed form solution is available. In this regards, Laplacian approximation is also computational easy. Yet, the equation clearly formulate the idea what happens in Bayes theorem.

4.2.4 Approximate Bayesian Computation with rejection sampling

Approximate Bayesian Computation with rejection sampling (ABC-rejection) is a computationally expensive method for approximating the posterior distribution. However, when the number of parameters is relatively small, the posterior can still be approximated quite well. ABC-rejection is especially useful when the likelihood function cannot be computed or approximated accurately.

One example is the use of ABC to explore potential bias in the EcoPostView package. In a simplified case, assuming both the prior and the data-generating model are normally distributed, the ABC-rejection algorithm begins by simulating a parameter from the prior distribution.

\[ \mu_{i}^*\sim N(\mu_{prior},\sigma_{prior}^2) \\ \sigma_{i}^{2*}\sim Exp(rate) \]

The asterisk (\(^*\)) denotes that these parameters are temporary, and this will become important later.Next, a data-generating model is used to simulate data based on these temporary parameters. We assume the observed data is approximately normally distributed, though any model could be used. For each simulation, we generate \(n_{data}\) values.

\[ x_{i}\sim N(\mu^*, \sigma^{2*}) \]

Depending on the parameter of interest (e.g., \(\mu\), \(\sigma\), mode, or median), a summary statistic is computed from the simulated data. In this example, we focus on estimating \(\mu\).

\[ \hat{x}_{sim, i}=\frac{\sum_{i=1}^n(x_i, ..., x_n)}{n_{data}} \]

Each simulated mean \(\hat{x}_{sim, i}\) (typically out of 100,000 simulations) is compared to the observed mean \(\hat{x}_{data}\) using the Euclidean distance.

\[ E_{i}=\sqrt{(\hat{x}_{sim, i} - \hat{x}_{data})^2} \]

A tolerance threshold is then selected to determine which simulated values are accepted. Simulations with \(E_i > tolerance\) are rejected, while those with \(E_i \leq tolerance\) are retained. While a tolerance of zero would yield the most accurate posterior, it would typically result in rejecting all simulations. On the other hand, setting the tolerance too high would allow in too many poor matches.

Each accepted simulation corresponds to an accepted pair of simulated parameters \(\mu_{i}^*, \sigma_{i}^{2}*\). Since all \(\mu_{i}^*\) were originally drawn from the prior, the subset of accepted values approximates the posterior distribution of \(\mu\).

4.3 Introduction to Bayesian Model Averaging (BMA)

Instead of \(P\) the function ‘\(f\)’ are used this to highlight that the probability is a mapping function. A mapping function being a ‘rule’ that maps \(x\) to \(y\) and so \(y=f(x)\). \[ f(\beta \mid Data, Info) = \frac{f(Data \mid \beta) \cdot f(\beta \mid Info)} {\int f(Data \mid \beta) \cdot f(\beta \mid Info)} \] The integral in the denominator is used to scale the posterior probability to one. This expression is sometimes simplified to \[f(\beta \mid Data, Info) = f(Data \mid \beta) \propto f(\beta \mid Info)\] Where the \(\propto\) symbol indicates ‘proportional to’ highlighting the idea of exchangeability. Therefore, the posterior is nothing more than a function that describes the probability \(y\) as a function of \(\beta\) conditional on \(Data\) and \(Info\) (\(y=f(\beta \mid Data, Info)\)). This cannot be solely conditional on the \(Data\) as the \(Data\) is not uncertain our information/believe is uncertain about a none existing object \(\beta\) (unless Platonism is true).

In the previous part a single prior model was used. Bayesian Model Averaging (BMA) has the advantages that it allows multiple (\(k\)) functions to be utilized as prior. I specifically choose the use of \(f\) so multiple priors as \(f_k\) in the equation below can be seen nothing more as multiple functions (or models). This in my opinion makes it easier to see that there is only optimized between multiple functions. It sound weird to say to optimize between probabilities. Hence, multiple possible scenarios that could have been responsible for \(\beta\) can be introduced as below. \[ f(\beta \mid Data,Info) = \frac{f(Data \mid \beta) \cdot f_k(\beta \mid Info)}{\int \left( \sum_{k=1}^{k} f(Data \mid \beta) \cdot f_k(\beta \mid Info) \right)} \] Now it should be clear that each \(\beta\) contained within \(g(E(y \mid x_{ij})) = \sum_{j=1}^{v} \beta_j \cdot x_{ij}\) is being restricted by the prior models. While in frequentism it is unrestricted and ‘complete indifference’ towards the possibility of \(\beta\). All these methods can be used in a meta-analysis.

4.4 Meta-analysis

A standard meta-analysis uses a measure of location (mean) and scale (precision) to estimate a pooled value based on all parameters. For a fixed meta-analysis the pooled parameter is derived via the following equation. \[\theta_{pooled} = \frac{\sum_{i=1}^{k}(\theta_i\cdot w_i)}{\sum_{k=1}^kw_i}\] \(\theta_i\) is the extracted effect-size for a study \(i\). The \(w_i\) is the weight per study \(i\) for allk \(k\) studies, derived from the precision \(1/se_i^2\) via the equation below. \[w_i = \frac{1}{se_i^2}\] The standard error for the pooled effect-size can then be derived via the formula given below.

\[se(\theta_{pooled})=\frac{1}{\sqrt\sum_{i=1}^{k}(w_i)}\] For a random-effect meta-analysis the variance between studies is separately modeled. In the metafor package REML or (Restricted Maximum Likelihood) is used to estimate this between study variance. However it is also possible using the DerSimonian and Laird method. \[ \tau^2=max(0, \frac{Q-(k-1)}{\sum_{i=1}^{k}\frac{1}{w_i}-\frac{\sum_{i=1}^{k}1/w_i^2}{\sum_{i=1}^{k}1/w_i}})\ \\ w^*_i=\frac{1}{(\frac{1}{w_i}+\tau^2)} \\ \theta_{pooled} = \frac{\sum_{i=1}^{k}(\theta_i\cdot w^*_i)}{\sum_{i=1}^{k}(w^*_i)} \\ se(\theta_{pooled})=\frac{1}{\sqrt(\sum_{i=1}^{k}w^*_i)} \] If we now go back to how we analytically derived the posterior we can devise a function that can analytically perform a fixed effect meta-analysis with ease. I have placed this in a function called ‘abmeta’. In in simple cases it approximates the results of metafor and the meta function inf EcoPostView relatively well. Of course the variance component slightly differs with that from metafor and the ‘meta’ function due to the different method of estimation.

4.5 BMA and meta-analysis

In a meta-analysis we do not talk about \(\beta\) but about a set of estimates \(\beta=\{\beta_{i}, ..., \beta_{n}\}\) meaning that \(f(Meta-data\mid\{\beta_{i}, ..., \beta_{n}\})\). Hereby the flexibility allows that these estimates are either likelihood estimates (\(\hat{\beta}\)) or posterior estimates (\(\beta\)). and we end up with an expression that should capture the inference to an underlying pooled model parameter. \[ f(\beta_{poolded} \mid Meta-data,Info) = \frac{f(Meta-data \mid \{\beta_{i}, ..., \beta_{n}\}) \cdot f_k(\beta_{pooled} \mid Info)}{\int \left( \sum_{k=1}^{m} f(Meta-data \mid \{\beta_{i}, ..., \beta_{n}\}) \cdot f_k(\beta_{pooled} \mid Info) \right)} \] Assuming the pooled parameter \(\beta_pooled\)is derived the equation layed out before the variance of the pooled parameter can be analytically derived as given by Hoeting et al. (1999):

\[ SE(\beta_{pooled}) = \sqrt{\sum^m_{k=1}( w_{prior} \cdot (\beta_k^2+SE(\beta_k)^2))-\beta_{pooled}^2}\\ \]

4.6 Sequential updating

Bayesian sequential updating refers to the practice of re-using the derived posterior of a previous model as the prior for the new model. For this the assumption of conditional independence between the the datasets is assumed. The parameter of interest is \(\theta\) based on a dataset \(Data_1\) and we derive the posterior. \[P(\theta|Data_1) = \frac{P(Data_1|\theta) \cdot P(\theta)}{P(Data_1)}\] The next would be \[P(\theta|Data_1, Data_2) = \frac{P(Data_2|\theta) \cdot P(\theta|Data_1)}{P(Data_2)}\] till \[P(\theta|Data_n) = \frac{P(Data_n|\theta) \cdot P(\theta|Data_1,\cdots,Data_{n-1})}{P(Data_n)}\]

For example, we would like to know what \(\mu\) from a population of interest. Our example population has \(\mu=0.5\), \(\sigma=5\) and each study would have an error of \(\alpha = 40\%\) when when we assume \(\alpha=5\%\) (meaning that our heterogeneity is larger than expected). Our first prior starts with \(N(0, 5)\) after which the posterior of previous is sequentially re-used visually represented in Fig. 2a below. Where more studies increase the precision of the estimated posterior.

If the focus lies on objectivity and the error control over the different studies and assume iid then the curve between studies would follow that of Fig. 2b below.

In a less formal way is the Bayesian framework more focused on transfer of information an precision. On the other hand the frequentist framework is more interested in objectivity, consistency and error among studies.

Figure 2: Sequential updating with credibility intervals on the left panel and a long-run of means with confidence intervals on the right The left panel.

4.7 A short reflection on uncertainty

I do not believe statistics reflects uncertainty about events; rather, it reflects the information in the data under a particular model or the uncertainty about our belief in a parameter (\(\theta, \beta, \mu\), etc.). The later concept is often vague and confusing because, if one assumes the parameter does not exist independently of the mind, then what exactly is uncertain - our belief? The claim to ‘objective probability’ is already compromised by the assumption that the parameter is objective. However, if the parameter does not exist outside the mind, the meaning of ‘objective’ in this context becomes questionable.

When people refer to objectivity, they often mean that the data itself is the most ‘objective’ part of the process. However, if some conditions are not met, such as (1) the data is not randomly sampled from a population of interest, (2) the model is not pre-selected in advance, (3) a sufficiently large sample size is not chosen based on the model, and (4) confounding variables are present, then even the data cannot be considered truly objective unless these limitations are explicitly acknowledged. Moreover, model selection procedures further contaminate the objectivity of the data, meaning that the estimated model parameters no longer fully reflect the objectivity of the data which is often implied in our conclusions (Gelman and Loken, 2013; Tong, 2019).

In Bayesian updating, the prior reflects the extent to which we want to sacrifice over the objectivity of the likelihood by using information which cannot be formalized into the likelihood. This is captured by the relationship \(f(\theta \mid Data, Info) = f(Data \mid \theta) \propto f(\theta \mid Info)\)

The posterior, therefore, is merely the weighted combination of the prior and likelihood. It represents the relationship (e.g., \(0.25\) as \(0.5 \cdot 0.5\)) between the prior and the likelihood. There is no invalidity in a logical argument such as:(Premise 1.) All unicorns are orange. (Premise 2.) I have a unicorn. (Conclusion) Therefore, my unicorn is orange.

While this argument may be unsound — because unicorns do not exist — the reasoning itself is not flawed. The issue lies with the premises, not the structure of the argument. Hence, Uncertainty does not exist in the ‘real’ world; it resides solely in our minds. We cannot be ‘wrong’ or ‘correct’ about \(f(\beta \mid \text{Data, Info})\) because it does not exist as a tangible entity. Even if it did, its existence would have no impact on reality because uncertainty is unrelated to the way reality operates. In the real world, events either occur or they do not. If my unicorn does not exist, I will never see it, and it was never orange in the first place.

We should also avoid treating models as a definitive representation of reality. Models are tools that convey information and serve as pragmatic instruments. The the model itself is not the result, the strength of the results relies on the argument, and how well the premises within the argument are clarified and supported by the model.

5 References

Baguley, Thom. 2009. “Standardized or Simple Effect Size: What Should Be Reported?” British Journal of Psychology 100(3): 603–17. doi: 10.1348/000712608X377117.

Bárdos, Dániel, and Adam Tamas Tuboly. 2025. Science, Pseudoscience, and the Demarcation Problem. Cambridge: Cambridge University Press. doi: 10.1017/9781009429597.

Barnes, Jonathan. 2007. The Toils of Scepticism. Digitally print. version. Cambridge: Cambridge Univ. Press.

Barwise, Jon, John Etchemendy, Gerard Allwein, Dave Barker-Plummer, and Albert Liu. 2002. Language, Proof, and Logic. Stanford, Calif: CSLI Publications.

Carthwright, Nancy, Jordi Cat, Lola Fleck, and E. Thomas Uebel. 1996. Otto Neurath: Philosophy between Science and Politics. Cambridge: Cambridge university press.

Cramer, J. S. 1991. The Logit Model: An Introduction for Economists. London: Edward Arnold.

Csilléry, Katalin, Michael G.B. Blum, Oscar E. Gaggiotti, and Olivier François. 2010. “Approximate Bayesian Computation (ABC) in Practice.” Trends in Ecology & Evolution 25(7): 410–18. doi: 10.1016/j.tree.2010.04.001.

de Groot, W.T. 1998. “Problem-In-Context: A Framework for the Analysis, Explanation and Solution of Environmental Problems.” In Environmental Management in Practice: Vol 1: Instruments for Environmental Management, Oxford: Routledge, 22–43.

Dewey, John. 1997. How We Think. Mineola, N.Y: Dover Publications.

Ferrari, Filippo, and Massimiliano Carrara. 2025. Logic and Science: An Exploration of Logical Anti-Exceptionalism. Cambridge: Cambridge University Press. doi: 10.1017/9781009233897.

Feyerabend, Paul. 1975. Against Method: Outline of an Anarchistic Theory of Knowledge. 4th ed. London and New York: Verso.

Fisher, R. A. (1949). The Design of Experiments (5th ed.). Oliver and Boyd.

Gelman, Andrew, and Eric Loken. 2013. “The Garden of Forking Paths: Why Multiple Comparisons Can Be a Problem, Even When There Is No ‘FIshing Expedition’ or ‘p-Hacking’ and the Research Hypothesis Was Posited Ahead of Time.” 348(3): 1–17. doi: 10.1007/978-3-658-12153-2_7.

Haack, Susan. 1978. Philosophy of Logics. 1st ed. Cambridge University Press. doi: 10.1017/CBO9780511812866.

Harman, Gilbert. 1986. Change in View: Principles of Reasoning. A Bradford Book, The MIT press Cambridge, Massachusetts, London England.

Hartig, Florian, Justin M. Calabrese, Björn Reineking, Thorsten Wiegand, and Andreas Huth. 2011. “Statistical Inference for Stochastic Simulation Models - Theory and Application: Inference for Stochastic Simulation Models.” Ecology Letters 14(8): 816–27. doi: 10.1111/j.1461-0248.2011.01640.x.

Heidegger, Martin. 1929. “What Is Metaphysics.” https://www.stephenhicks.org/wp-content/uploads/2013/03/heideggerm-what-is-metaphysics.pdf.

Hinne, Max, Quentin F. Gronau, Don Van Den Bergh, and Eric-Jan Wagenmakers. 2020. “A Conceptual Introduction to Bayesian Model Averaging.” Advances in Methods and Practices in Psychological Science 3(2):200–215. doi: 10.1177/2515245919898657.

Hoeting, Jennifer A., David Madigan, Adrian E. Raftery, and Chris T. Volinsky. 1999. “Bayesian Model Averaging: A Tutorial.” Statistical Science 14(4):382–417. doi: 10.1214/ss/1009212519.

Kale, Alex, Francis Nguyen, Matthew Kay, and Jessica Hullman. 2019. “Hypothetical Outcome Plots Help Untrained Observers Judge Trends in Ambiguous Data.” IEEE Transactions on Visualization and Computer Graphics 25(1): 892–902. doi: 10.1109/TVCG.2018.2864909.

Labukt, Ivar. 2021. “Is Logic Distinctively Normative?” Erkenntnis 86(4): 1025–43. doi: 10.1007/s10670-019-00142-1.

Lamme, V. 2011. Vrije Wil Bestaat Niet. Amsterdam: Prometheus.

Laudan, Larry. 1983. “The Demise of the Demarcation Problem.” In Physics, Philosophy and Psychoanalysis, Boston Studies in the Philosophy of Science, eds. R. S. Cohen and L. Laudan. Dordrecht: Springer Netherlands, 111–27. doi: 10.1007/978-94-009-7055-7_6.

Lee, Siu-Fan. 2017. Logic: A Complete Introduction. Great Britain: Hodder & Stoughton.

Maier, Maximilian, František Bartoš, and Eric-Jan Wagenmakers. 2023. “Robust Bayesian Meta-Analysis: Addressing Publication Bias with Model-Averaging.” Psychological Methods 28(1): 107–22. doi: 10.1037/met0000405.

Mayo, D. G. (2018). Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars. Cambridge University Press.

Moreno, Santiago G., Alex J. Sutton, Ae Ades, Tom D. Stanley, Keith R. Abrams, Jaime L. Peters, and Nicola J. Cooper. 2009. “Assessment of Regression-Based Methods to Adjust for Publication Bias through a Comprehensive Simulation Study.” BMC Medical Research Methodology 9(1):2. doi: 10.1186/1471-2288-9-2.

Markowitz, David M., and Jeffrey T. Hancock. 2016. “Linguistic Obfuscation in Fraudulent Science.” Journal of Language and Social Psychology 35(4): 435–45. doi: 10.1177/0261927X15614605.

Pearl, J. (2009). Causality. Cambridge university press.

Pearl, Judea, and Dana Mackenzie. 2019. The Book of Why: The New Science of Cause and Effect. London: Penguin Books.

Peters, Jaime L., Alex J. Sutton, David R. Sones, Keith R. Abrams, and Lesley Rushton. 2006. “Comparison of Two Methods to Detect Publication Bias in Meta-Analysis.” JAMA 295(6):676. doi: 10.1001/jama.295.6.676.

Popper, Karl Raimund. 1968. The Logic of Scientific Discovery. Basic Books.

Puppo, Federico. 2019. Informal Logic: A “Canadian” Approach to Argument. Windsor Studies in Argumentation.

Shrader-Frechette, K. S., and Earl D. Mccoy. 1994. “How the Tail Wags the Dog: How Value Judgments Determine Ecological Science.” Environmental Values 3(2): 107–20. doi: 10.3197/096327194776679764.

Smith, Peter. 2021. An Introduction to Formal Logic. Second edition, reprinted with corrections. Monee, IL: Logic Matters.

Stanley, T. D., and Hristos Doucouliagos. 2014. “Meta-Regression Approximations to Reduce Publication Selection Bias.” Research Synthesis Methods 5(1):60–78. doi: 10.1002/jrsm.1095.

Stefan, Sienkiewicz. 2019. Five Modes of Scepticism: Sextus Empiricus and the Agrippan Modes. Great Clarendon Street, Oxfored, OX2 6DP, United Kingdom: Oxford University Press.

Stone, James V. 2015. Information Theory: A Tutorial Introduction. First edition. Sheffield, UK: Sebtel Press.

Tukey, John W. 1969. “Analysing Data: Sanctification or Detective Work?” American Psychologist 24(2): 83–91. doi: 10.1037/h0027108.

Tong, Christopher. 2019. “Statistical Inference Enables Bad Science; Statistical Thinking Enables Good Science.” The American Statistician 73(sup1): 246–61. doi: 10.1080/00031305.2018.1518264.

Usó-Doménech, J. L., and J. Nescolarde-Selva. 2016. “What Are Belief Systems?” Foundations of Science 21(1): 147–52. doi: 10.1007/s10699-015-9409-z

Van Fraassen, Bas C. 1980. The Scientific Image. Oxford University Press.

Van Zwet, E.W., Cator, E.A., 2021. “The significance filter, the winner’s curse and the need to shrink.” Stat. Neerlandica 75, 437–452. doi: 10.1111/stan.12241

Wasserstein, Ronald L., and Nicole A. Lazar. 2016. “The ASA Statement on p -Values: Context, Process, and Purpose.” The American Statistician 70(2): 129–33. doi: 10.1080/00031305.2016.1154108.

Woolridge, Jeffery M. 2001. Econometric Analysis of Cross Section and Panel Data. Cambridge, Massachusetts, London, England: The MIT press.