Tukey’s plots – Paolo Bosetti

See this previous post for more on plotting Tukey’s tests with GGplot2.

Let me discuss another take about Tukey’s plots.

Let’s suppose that we have a two-way experiment (i.e. with two factors) that requires us to do different Tukey’s tests for different levels of a factor.

The following dataset represents the result of a series of experiments for measuring the battery discharge time at different temperatures and for different dielectric material used in the batteries:

knitr::opts_chunk$set(
  fig.align = "center",
  # This for default device
  out.width = "12cm",
  # This two for high quality charts:
  fig.asp = 9/16
)
df <- read.table(
  "https://paolobosetti.quarto.pub/data/battery.dat",
  comment = "#",
  header = TRUE
) %>%
  mutate(Temperature = factor(Temperature),
         Material = factor(LETTERS[Material])) %>%
  select(!(RunOrder:StandardOrder))

df %>%
  pivot_wider(names_from = Temperature, values_from = Response) %>%
  rename_with( ~ paste0(., "°C"), `15`:`125`) %>%
  knitr::kable()

Material	Repeat	15°C	70°C	125°C
A	1	130	34	20
B	1	150	136	25
C	1	138	174	96
A	2	155	40	70
B	2	188	122	70
C	2	110	120	104
A	3	74	80	82
B	3	159	106	58
C	3	168	150	82
A	4	180	75	58
B	4	126	115	45
C	4	160	139	60

Let’s plot the data, with bars ranging from minimum to maximum values:

df %>%
  # mutate(Temperature = as.numeric(Temperature)) %>%
  group_by(Temperature, Material) %>%
  summarise(
    Life = mean(Response),
    min = min(Response),
    max = max(Response)
  ) %>%
  ggplot(aes(
    x = Temperature,
    y = Life,
    color = Material,
    group = Material
  )) +
  geom_point() +
  geom_line() +
  geom_errorbar(aes(ymin = min, ymax = max),
                width = 0.1,
                position = position_dodge(width = 0.1)) +
  labs(y = "Life (hrs)", x = "Temperature (°C)")

Now we want to use a Tukey’s test for comparing those means and identify which ones are significantly different.

We start by building a list of TukeyHSD objects, using levels of Temperature factors as names for the list elements:

alpha <- 0.01
res <- map(levels(df$Temperature),
  ~ TukeyHSD(aov(Response~Material, data=filter(df, Temperature==.)), 
             conf.level = 1-alpha)
)
names(res) <- levels(df$Temperature)

Now res is a list with three elements: res[["15"]], res[["70"]] e res[["125"]]. Each element in turn has one element, which holds a matrix with the data we need to make the plot. For example:

res[["15"]][[1]]

      diff        lwr       upr     p adj
B-A  21.00  -70.20395 112.20395 0.6633090
C-A   9.25  -81.95395 100.45395 0.9205830
C-B -11.75 -102.95395  79.45395 0.8756855

I need a tibble that holds all the three matrices, with an additional column that holds the temperature. Something like:

res[["15"]][[1]] %>%
  fortify() %>%
  rownames_to_column("pair") %>%
  add_column(T = "15", .before = "pair")

   T pair   diff        lwr       upr     p adj
1 15  B-A  21.00  -70.20395 112.20395 0.6633090
2 15  C-A   9.25  -81.95395 100.45395 0.9205830
3 15  C-B -11.75 -102.95395  79.45395 0.8756855

but repeated for the other two levels of Temperature.

I can do this using the purrr::reduce() function, which is similar to map(), but accumulates the elements of the list it operates on into a container specified by the .init parameter, which we set initially to an empty tibble:

names(res) %>%
  reduce(                                       # needs two-args function
    .init = tibble(),                           # initial accumulator
    function(tbl, key) {                        # key is the Temp level
      bind_rows(             
        tbl,                                    # tbl is the accumulator
        res[[key]][[1]] %>%                     # extract the matrix
          fortify() %>%                         # convert to tibble
          rownames_to_column("pair") %>%
          add_column(T = key, .before = "pair") # add the T column
      )
    }) %>%
  mutate(Significant = `p adj` < alpha) %>%
  knitr::kable()

T	pair	diff	lwr	upr	p adj	Significant
15	B-A	21.00	-70.203952	112.20395	0.6633090	FALSE
15	C-A	9.25	-81.953952	100.45395	0.9205830	FALSE
15	C-B	-11.75	-102.953952	79.45395	0.8756855	FALSE
70	B-A	62.50	7.647739	117.35226	0.0045670	TRUE
70	C-A	88.50	33.647739	143.35226	0.0004209	TRUE
70	C-B	26.00	-28.852261	80.85226	0.2177840	FALSE
125	B-A	-8.00	-67.947867	51.94787	0.8673817	FALSE
125	C-A	28.00	-31.947867	87.94787	0.2261123	FALSE
125	C-B	36.00	-23.947867	95.94787	0.1062483	FALSE

Now I can plot the data. This time I am using the purrr shortcut for the inline function, replacing the function(tbl, key) {...} with ~ {...}:

names(res) %>%
  reduce(.init = tibble(), ~ {
    bind_rows(
      .x,
      res[[.y]][[1]] %>%
        fortify() %>%
        rownames_to_column("pair") %>%
        add_column(T = .y, .before = "pair")
    )
  }) %>%
  mutate(Significant = `p adj` < 0.01) %>%
  ggplot(aes(
    x = pair,
    y = diff,
    group = T,
    color = T
  )) +
  geom_point(position = position_dodge(width = 0.2)) +
  geom_hline(yintercept = 0, color = "red") +
  geom_errorbar(
    aes(
      ymin = lwr,
      ymax = upr,
      linetype = Significant
    ),
    width = 0.2,
    position = position_dodge(width = 0.2)
  ) +
  scale_linetype_manual(values = c(2, 1)) +
  coord_flip() +
  labs(y = "Life difference (hrs)", x = "Materials", color = "Temp. (°C)")

The plot shows that the only two cases where the dielectric material actually makes a significant difference in the battery discharge time are the B-A and C-A differences at 70°C.

That’s all, folks!