---
title: "Population Denominators from the Census with healthbR"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Population Denominators from the Census with healthbR}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = FALSE
)
```

## Overview

The **Censo Demografico** (Demographic Census) is the main source of population denominators in Brazil, essential for calculating mortality rates, disease incidence, and other epidemiological indicators.

The `healthbR` package provides direct access to Census population data via the IBGE SIDRA API, covering:

| Function | Description | Years |
|----------|-------------|-------|
| `censo_populacao()` | Population by sex, age, race, situation | 1970-2022 |
| `censo_estimativa()` | Intercensitary population estimates | 2001-2021 |
| `censo_sidra_data()` | Any Census SIDRA table | All available |

## Getting started

```{r setup}
library(healthbR)
library(dplyr)
```

### Check available years

```{r}
censo_years()
#> [1] "1970" "1980" "1991" "2000" "2010" "2022"
```

### Survey information

```{r}
censo_info(2022)
```

## Population by state

The most common use case: getting population by state as a denominator for health indicators.

```{r}
# total population by state, Census 2022
pop_state <- censo_populacao(year = 2022, territorial_level = "state")
pop_state
```

## Population by sex

```{r}
# population by sex, Brazil level
pop_sex <- censo_populacao(
  year = 2022,
  variables = "sex",
  territorial_level = "brazil"
)
pop_sex
```

## Age pyramids

```{r}
# population by age and sex
pop_age_sex <- censo_populacao(
  year = 2022,
  variables = "age_sex",
  territorial_level = "brazil"
)
pop_age_sex
```

## Population by race/color

```{r}
# population by race, 2022
pop_race <- censo_populacao(
  year = 2022,
  variables = "race",
  territorial_level = "state"
)
pop_race
```

## Intercensitary estimates

For years between censuses, IBGE publishes annual population estimates that serve as denominators:

```{r}
# population estimates 2015-2021
estimates <- censo_estimativa(
  year = 2015:2021,
  territorial_level = "state"
)
estimates
```

## Example: calculating a mortality rate

A typical epidemiological workflow combines mortality data (SIM) with Census denominators:
```{r}
# step 1: get population denominator
pop_2010 <- censo_populacao(
  year = 2010,
  variables = "total",
  territorial_level = "state"
)

# step 2: suppose you have mortality data (from SIM or other source)
# deaths_by_state <- sim_data(year = 2010) |> count(state)

# step 3: calculate crude mortality rate
# mortality_rate <- deaths_by_state |>
#   left_join(pop_2010, by = "state") |>
#   mutate(rate_per_100k = (n / population) * 100000)
```

## Exploring Census tables

The Census module includes a catalog of SIDRA tables organized by theme:

```{r}
# list all available tables
censo_sidra_tables()

# filter by theme
censo_sidra_tables(theme = "disability")
censo_sidra_tables(theme = "indigenous")

# search by keyword
censo_sidra_search("quilombola")
censo_sidra_search("saneamento")
```

## Custom SIDRA queries

For full flexibility, use `censo_sidra_data()` to query any Census table:

```{r}
# population by race from table 9605
pop_race_raw <- censo_sidra_data(
  table = 9605,
  territorial_level = "state",
  year = 2022,
  variable = 93,
  classifications = list("86" = "allxt")
)
pop_race_raw
```

## Historical comparisons

```{r}
# compare population across census years
pop_2010 <- censo_populacao(year = 2010, territorial_level = "brazil")
pop_2022 <- censo_populacao(year = 2022, territorial_level = "brazil")

# or use estimates for intercensitary years
pop_series <- censo_estimativa(
  year = 2001:2021,
  territorial_level = "brazil"
)
pop_series
```
