scottishsnow

Many reports from 1 RMarkdown file

https://mytotalofficesolutions.com/paper-everywhere/

Advertisements

I was at the EdinbR talk this week by the RStudio community lead – Curtis Kephart. It was really interesting, but I disagree with his suggestion to point and click different parameters when you want to generate multiple reports from the same RMarkdown file. This might be acceptable if you have one or two, but any more and the chance for error and tedium is greatly increased. This blog post shows you how to loop (yes – an actual for loop!) through a variable to generate different reports for each of its unique values.

For this walk-through I’m using the 2019 stackoverflow developer survey. You can get it here.

First, we need an RMarkdown file (.Rmd). This is largely the same as your usual .Rmd file, and I strongly encourage you to develop it like one. i.e. write your single .Rmd file and convert it into a special use case to be a template. Working like this makes debugging a whole lot easier. Here’s an example of a “normal” .Rmd:

---
title: "SO developers ages"
author: "Mike Spencer"
date: "26 January 2020"
output:
   pdf_document:
      toc: yes
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo=F, message=F, results='hide', warning=F, fig.height=5)
library(tidyverse)
library(RColorBrewer)
library(knitr)
```

```{r data}
df = read_csv("~/Downloads/survey_results_public.csv")
```
## Introduction

This file provides a summary of the 2019 stackoverflow developer survey.

The summary had `r nrow(df)` responses, of these the median age was `r median(df$Age, na.rm=T)`.

```{r histogram}
ggplot(df, aes(Age)) +
   geom_histogram() +
   labs(title = "Distribution of SO developer ages")
```

You can see it’s not far off what you get when you opt to start a new RMarkdown file in RStudio. I’ve abstracted the data reading to a separate file (it has some lengthy factor cleaning and is used in a few different situations), and I’m loading the knitr library so I can make tables with kable().

The next code chunk shows how the file is adapted to be used as a template for many outputs:

---
params:
   new_title: "My Title!"
title: "`r params$new_title`"
author: "Mike Spencer"
date: "26 January 2020"
output:
  html_document:
    toc: yes
---
 
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo=F, message=F, results='hide', warning=F, fig.height=5)
library(tidyverse)
library(RColorBrewer)
library(knitr)
 
df1 = df %>% filter(Gender==v)
df1 = droplevels(df1)
```
 
## Introduction
 
This file provides a summary of the 2019 SO developer survey.
In particular it shows the answers of `r v`.
 
The summary had `r nrow(df1)` responses, of these the median age was `r median(df1$Age, na.rm=T)`.

```{r histogram}
ggplot(df1, aes(Age)) +
  geom_histogram() +
  labs(title = "Distribution of SO developer ages",
       subtitle = v)
```

Pretty similar, but there are some subtle differences. We’re now passing a title parameter to our .Rmd, our data are already loaded and we subset them to df1. This first step happens in the next file.

Finally, we need a separate script to loop through our variable and make some reports!

library("rmarkdown")

df = read_csv("~/Downloads/survey_results_public.csv")

slices = unique(df$Gender)[!is.na(unique(df$Gender))]

for(v in slices){
  render("~/test.Rmd",
         output_file=paste0("~/exploratory_", v, ".html"),
         params=list(new_title=paste("Exploratory analysis -", v)))
}

Note we’re explicitly loading the rmarkdown library here so we can use the render function. We’re also loading our data before our loop, to speed our code up. The object v is passed to the .Rmd file, which is what we use to subset our data.

If you’ve made it this far you should now have the tools to make multiple reports with a lot less effort! Beware that it becomes very easy to make more outputs than anyone could possibly read – with great power comes etc, etc..

Advertisements

Advertisements