---
title: "Omnibus predictive microbiology models"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Omnibus predictive microbiology models}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 4
)
library(predmicror)
```

## Overview

Omnibus modelling combines primary predictive microbiology models with secondary covariate relationships in a single nonlinear mixed-effects model.

In `predmicror`, the omnibus helpers are designed to reuse the primary and secondary models already available in the package. This means that the same primary models used by `fit_growth()` and `fit_inactivation()` can also be used in grouped omnibus models.

The main workflow is:

1. choose a primary growth or inactivation model;
2. decide which primary-model parameters should depend on covariates;
3. define the grouping variable for random effects;
4. fit the model with `fit_omnibus_growth()` or `fit_omnibus_inactivation()`;
5. inspect fitted values, residuals, metrics, and validation results.

Omnibus models are especially useful when observations are grouped by experimental condition, strain, replicate, batch, study, or treatment.

## Omnibus inactivation model

The simplest omnibus inactivation workflow uses a primary inactivation model and allows one of its parameters to depend on an environmental covariate.

In this example, the primary model is `WeibullM()`. The parameter `sigma` is modelled as a function of temperature, while `Y0` and `alpha` are estimated as intercept-only fixed effects. The grouping variable is `Condition`.

```{r omnibus-inactivation-data}
set.seed(1)

inact_data <- do.call(rbind, lapply(1:4, function(g) {
  Time <- c(1, 2, 4, 6, 8, 10)
  Temp <- 55 + g
  sigma <- 5 + 0.4 * g

  data.frame(
    Condition = g,
    Time = Time,
    Temp = Temp,
    logN = WeibullM(Time, Y0 = 7, sigma = sigma, alpha = 1.1) +
      rnorm(length(Time), 0, 0.03)
  )
}))

head(inact_data)
```

```{r omnibus-inactivation-fit}
inact_fit <- fit_omnibus_inactivation(
  data = inact_data,
  primary = "WeibullM",
  time = "Time",
  response = "logN",
  group = "Condition",
  secondary = list(
    sigma = ~ Temp
  ),
  random = Y0 ~ 1,
  start = c(
    Y0 = 7,
    sigma = 1,
    sigma.Temp = 0.08,
    alpha = 1
  )
)

inact_fit
```

The fitted object can be inspected with the same lightweight diagnostic helpers used elsewhere in `predmicror`.

```{r omnibus-inactivation-diagnostics}
head(predmicror_augment(inact_fit))
fit_metrics(inact_fit)
```

```{r omnibus-inactivation-plot, fig.alt="Observed and fitted log10 counts for an omnibus Weibull inactivation model grouped by condition."}
aug_inact <- predmicror_augment(inact_fit)

plot(aug_inact$Time, aug_inact$logN,
  pch = 16,
  xlab = "Time",
  ylab = "log10 count"
)

for (g in unique(aug_inact$Condition)) {
  z <- aug_inact[aug_inact$Condition == g, ]
  z <- z[order(z$Time), ]
  lines(z$Time, z$.fitted, lwd = 2)
}
```

## Omnibus growth model

The same structure can be used for grouped growth data. Here the primary model is `HuangNLM()`, and the maximum specific growth rate `MUmax` is described as a function of temperature.

```{r omnibus-growth-data}
set.seed(1)

growth_data <- do.call(rbind, lapply(1:4, function(g) {
  Time <- c(0, 1, 2, 3, 4, 5, 6)
  Temp <- 20 + g
  MUmax <- 0.6 + 0.04 * g

  data.frame(
    Condition = g,
    Time = Time,
    Temp = Temp,
    lnN = HuangNLM(Time, Y0 = 2, Ymax = 8, MUmax = MUmax) +
      rnorm(length(Time), 0, 0.02)
  )
}))

head(growth_data)
```

```{r omnibus-growth-fit}
growth_fit <- fit_omnibus_growth(
  data = growth_data,
  primary = "HuangNLM",
  time = "Time",
  response = "lnN",
  group = "Condition",
  secondary = list(
    MUmax = ~ Temp
  ),
  random = Y0 ~ 1,
  start = c(
    Y0 = 2,
    Ymax = 8,
    MUmax = 0.1,
    MUmax.Temp = 0.025
  )
)

growth_fit
```

```{r omnibus-growth-diagnostics}
head(predmicror_augment(growth_fit))
fit_metrics(growth_fit)
```

```{r omnibus-growth-plot, fig.alt="Observed and fitted natural log counts for an omnibus Huang growth model grouped by condition."}
aug_growth <- predmicror_augment(growth_fit)

plot(aug_growth$Time, aug_growth$lnN,
  pch = 16,
  xlab = "Time",
  ylab = "ln count"
)

for (g in unique(aug_growth$Condition)) {
  z <- aug_growth[aug_growth$Condition == g, ]
  z <- z[order(z$Time), ]
  lines(z$Time, z$.fitted, lwd = 2)
}
```

## Using parameterised secondary models

Omnibus models can also use secondary models already implemented in `predmicror`. This is useful when a primary-model parameter should follow a mechanistic or cardinal-type relationship instead of a simple linear formula.

In the example below, the primary model is again `HuangNLM()`, but `MUmax` is now described using the cardinal temperature model `CMTI()`.

Because `CMTI()` returns a square-root growth-rate scale, `transform = "square"` is used to convert the secondary-model output to the growth-rate scale used by `MUmax`.

```{r omnibus-secondary-data}
set.seed(2)

cardinal_growth_data <- do.call(rbind, lapply(1:5, function(g) {
  Time <- c(0, 1, 2, 3, 4, 5, 6)
  Temp <- 12 + 3 * g
  MUmax <- CMTI(Temp, Tmax = 45, Tmin = 5, MUopt = 0.9, Topt = 30)^2

  data.frame(
    Condition = g,
    Time = Time,
    Temp = Temp,
    lnN = HuangNLM(Time, Y0 = 2, Ymax = 8, MUmax = MUmax) +
      rnorm(length(Time), 0, 0.02)
  )
}))

head(cardinal_growth_data)
```

```{r omnibus-secondary-fit}
cardinal_growth_fit <- fit_omnibus_growth(
  data = cardinal_growth_data,
  primary = "HuangNLM",
  time = "Time",
  response = "lnN",
  group = "Condition",
  secondary = list(
    MUmax = omnibus_secondary("CMTI", x = "Temp", transform = "square")
  ),
  random = Y0 ~ 1,
  start = c(
    Y0 = 2,
    Ymax = 8,
    Tmax = 45,
    Tmin = 5,
    MUopt = 0.9,
    Topt = 30
  )
)

cardinal_growth_fit
```

```{r omnibus-secondary-diagnostics}
head(predmicror_augment(cardinal_growth_fit))
fit_metrics(cardinal_growth_fit)
```

```{r omnibus-secondary-plot, fig.alt="Observed and fitted natural log counts for an omnibus Huang growth model using a cardinal temperature secondary model."}
aug_cardinal <- predmicror_augment(cardinal_growth_fit)

plot(aug_cardinal$Time, aug_cardinal$lnN,
  pch = 16,
  xlab = "Time",
  ylab = "ln count"
)

for (g in unique(aug_cardinal$Condition)) {
  z <- aug_cardinal[aug_cardinal$Condition == g, ]
  z <- z[order(z$Time), ]
  lines(z$Time, z$.fitted, lwd = 2)
}
```

## Validation metrics

Predictive microbiology studies often use bias and accuracy factors to summarise external validation performance.

```{r omnibus-validation-factors}
observed <- c(7, 6, 5)
predicted <- c(7.1, 5.9, 5.2)

bias_factor(observed, predicted)
accuracy_factor(observed, predicted)
```

A leave-one-condition-out validation workflow can be run with `validate_omnibus_leave_one_out()`. This refits the model after removing one group and predicts the left-out observations at the population level.

```{r omnibus-leave-one-out}
loo <- validate_omnibus_leave_one_out(
  object = inact_fit,
  group_value = 4,
  level = 0
)

loo$bias_factor
loo$accuracy_factor
head(loo$data)
```

## Practical notes

Omnibus models are more flexible than separate primary-model fits, but they require more care.

Good practice includes:

1. checking the response scale expected by the selected primary model;
2. starting with a simple random-effects structure;
3. estimating only parameters supported by the available data;
4. using sensible starting values;
5. comparing formula-based secondary relationships with parameterised secondary models;
6. validating predictions on held-out conditions when possible.

Formula-based secondary relationships are useful for exploratory modelling, for example `MUmax ~ Temp` or `sigma ~ Temp`. Parameterised secondary relationships are useful when an established secondary model is available in `predmicror`, such as `CMTI()`, `CMAW()`, `CMPH()`, or `CMInh()`.
