Passing dot to ggplot2

coding
R
tidyverse
Author

Paolo Bosetti

Published

2024-Sep-24

I’m starting with a little lifesaver of a trick that I recently learned on hot to reuse the input data in a ggplot2 call.

Wait a minute: we’re actually seeing two useful hints!

Ok, let’s start with the problem. To give some context, suppose that you have this situation:

tibble(
  x = 1:10,
  y = rnorm(10),
  z = rnorm(10)
) %>% 
  ggplot(aes(x, y)) +
  geom_point() +
  geom_line()

A simple plot, generated with data coming from an anonymous tibble: this is rather common in a tidyverse workflow. Now suppose that you want to add some layer, for example a title, that refers to the number of rows in the original tibble. We need to access the nrow function, but it must be called on an annymous object that is not imediately available.

First hint: reuse layers

We are here going to repeat the same plot as above a few times. and here it comes the first hint: reuse layers.

It is possible, and indeed very handy, to reuse the same layers whenever you want to have different plots having the same look and/or sharing a set of common geometries. To do so, just put the layers in a list and pass it to the plot function:

gglayers <- list(
  geom_point(),
  geom_line()
)

tibble(
  x = 1:10,
  y = rnorm(10),
  z = rnorm(10)
) %>% 
  ggplot(aes(x, y)) +
  gglayers

Second hint: pass the dot

Now let’s come to the original issue: how to refer to the anonymous tibble in the forthcoming piped commands of the ggplot2 layers.

We know that we can use the dot in a magrittr pipe to re-use the input data. Typically, you know that you can use the special symbol . (called placeholder) to refer to the input data in a pipe. This is very useful when you want to pass the data not to the first argument of a function (which gets the lhs of the pipe, anyway), but to another argument. As reported in the %>% documentation examples:

"Ceci n'est pas une pipe" %>% gsub("une", "un", .)
[1] "Ceci n'est pas un pipe"

This pipe placeholder (the dot) is indeed handy and tempting, although it does not always works as expected. When you want to build complex ggplot2 plots, for example, you often need to extract info on the original data (say, the number of rows) and use it in some labeling function, like this:

tibble(
  x = 1:10,
  y = rnorm(10),
  z = rnorm(10)
) %>% 
  ggplot(aes(x, y)) +
  gglayers +
  labs(title = paste("n =", nrow(.)))
Error in eval(expr, envir, enclos): object '.' not found

Of course you get an error: the dot placeholder is passed to the ggplot function, and it does not survive the forthcoming + operators.

One way to circumvent this issue would be to assign the tibble to a variable, say dat, and use it in the labs function as nrow(dat). But this is not very elegant, and it is not very tidy either.

A much better solution is to enclose the whole ggplot in its own scope with curly braces, and pass the dot to the labs function:

tibble(
  x = 1:10,
  y = rnorm(10),
  z = rnorm(10)
) %>% {
  ggplot(data=., aes(x, y)) +
  gglayers +
  labs(title = paste("n =", nrow(.)))
}

Note, though, that you need to explicitly pass the placeholder to the ggplot as the data argument.

That’s all, folks.