# Getting started with emmeans

Package **emmeans** (formerly known as **lsmeans**) is enormously useful for folks wanting to do post hoc comparisons among groups after fitting a model. It has a very thorough set of vignettes (see the vignette topics here), is very flexible with a ton of options, and works out of the box with a lot of different model objects (and can be extended to others š).

Iāve started recommending **emmeans** all the time to students fitting models in R. However, Iāve found that often times students struggle a bit to get started using the package, possibly due to the sheer amount of flexibility and information in the vignettes.

Iāve put together some basic examples for using **emmeans**, meant to be a complement to the vignettes. Specifically this post will demonstrate a few of the built-in options for some standard post hoc comparisons; I will write a separate post about custom comparisons in **emmeans**.

*Disclaimer: This post is about using a package in R and so unfortunately does not focus on appropriate statistical practice for model fitting and post hoc comparisons.*

# R packages

I will load **magrittr** for the pipe in addition to **emmeans**.

```
library(emmeans) # v. 1.3.3
library(magrittr) # v. 1.5
```

# The dataset and model

Iāve made a small dataset to use in this example. The response variable is `resp`

, which comes from the log-normal distribution, and the two crossed factors of interest are `f1`

and `f2`

. Each factor has two levels: a control called `c`

as well as a second non-control level.

```
dat = structure(list(f1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("a",
"c"), class = "factor"), f2 = structure(c(1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("1",
"c"), class = "factor"), resp = c(1.6, 0.3, 3, 0.1, 3.2, 0.2,
0.4, 0.4, 2.8, 0.7, 3.8, 3, 0.3, 14.3, 1.2, 0.5, 1.1, 4.4, 0.4,
8.4)), row.names = c(NA, -20L), class = "data.frame")
str(dat)
```

```
# 'data.frame': 20 obs. of 3 variables:
# $ f1 : Factor w/ 2 levels "a","c": 1 1 1 1 1 1 1 1 1 1 ...
# $ f2 : Factor w/ 2 levels "1","c": 1 2 1 2 1 2 1 2 1 2 ...
# $ resp: num 1.6 0.3 3 0.1 3.2 0.2 0.4 0.4 2.8 0.7 ...
```

The model I will use is a linear model with a log-transformed response and the two factors and their interaction as explanatory variables. This is the ātrueā model since I created these data so Iām skipping all model checks here (which I would not do in a real analysis).

Note I use `log(resp)`

in the model rather than creating a new log-transformed variable. This will allow me to demonstrate one of the convenient options available in `emmeans()`

later.

`fit1 = lm(log(resp) ~ f1 + f2 + f1:f2, data = dat)`

# Built in comparisons with emmeans()

The **emmeans** package has helper functions for commonly used post hoc comparisons (aka *contrasts*). For example, we can do pairwise comparisons via `pairwise`

or `revpairwise`

, treatment vs control comparisons via `trt.vs.ctrl`

or `trt.vs.ctrlk`

, and even consecutive comparisons via `consec`

.

The available built-in functions for doing comparisons are listed in the documentation for `?"contrast-methods"`

.

# All pairwise comparisons

One way to use `emmeans()`

, which I use a lot, is to use formula coding for the comparisons. This formula is defined in the `specs`

argument.

I will do all pairwise comparisons for all combinations of `f1`

and `f2`

. The built-in function `pairwise`

is put on the left-hand side of the formula in `specs`

and the factors with levels we want to compare among are on the right-hand side. Since Iām doing all pairwise comparisons, the combination of `f1`

and `f2`

are put in the formula.

The model object is passed to the first argument in `emmeans()`

, `object`

.

`emm1 = emmeans(fit1, specs = pairwise ~ f1:f2)`

Using the formula in this way returns an object with two parts. The first part, called `emmeans`

, is the estimated marginal means along with the standard errors and confidence intervals. We can pull these out with dollar sign notation, which I do below.

These results are all on the *model* scale, so in this case these are estimated mean log response for each `f1`

and `f2`

combination. Note the message that `emmeans()`

gives us about results being on the log scale in the output. It knows the model is on the log scale because I used `log(resp)`

as my response variable.

`emm1$emmeans`

```
# f1 f2 emmean SE df lower.CL upper.CL
# a 1 0.569 0.445 16 -0.374 1.512
# c 1 -0.102 0.445 16 -1.045 0.842
# a c -1.278 0.445 16 -2.221 -0.334
# c c 1.335 0.445 16 0.392 2.279
#
# Results are given on the log (not the response) scale.
# Confidence level used: 0.95
```

The second part of the output, called `contrasts`

, contains the comparisons of interest. It is this section that we are generally most interested in when answering a question about differences among groups.

These results are also on the model scale (and we get the same message in this section), and weāll want to put them on the original scale.

The comparisons are accompanied by statistical tests of the null hypothesis of āno differenceā, but lack confidence interval (CI) limits by default. Weāll need to get these.

The `emmeans()`

package automatically adjusts for multiple comparisons. Since we did all pairwise comparisons the package used a Tukey adjustment. The type of adjustment can be changed.

`emm1$contrasts`

```
# contrast estimate SE df t.ratio p.value
# a,1 - c,1 0.671 0.629 16 1.065 0.7146
# a,1 - a,c 1.847 0.629 16 2.934 0.0434
# a,1 - c,c -0.766 0.629 16 -1.217 0.6253
# c,1 - a,c 1.176 0.629 16 1.869 0.2795
# c,1 - c,c -1.437 0.629 16 -2.283 0.1438
# a,c - c,c -2.613 0.629 16 -4.152 0.0038
#
# Results are given on the log (not the response) scale.
# P value adjustment: tukey method for comparing a family of 4 estimates
```

# Back-transforming results

Since I used a log transformation I can express the results as multiplicative differences in medians on the original (data) scale. We can always back-transform estimates and CI limits by hand, but in `emmeans()`

we can use the `type`

argument for this. Using `type = "response"`

will return results on the original scale. This works when the transformation is explicit in the model (e.g., `log(resp)`

) and works similarly for link functions in generalized linear models.

Youāll see the message changes in the output once I do this, indicating things were back-transformed from the model scale. We also are reminded that the tests were done on the model scale.

In the `contrast`

column in the `contrasts`

section we can see the expression of the comparisons has changed from additive comparisons (via subtraction) shown above to multiplicative comparisons (via division).

`emmeans(fit1, specs = pairwise ~ f1:f2, type = "response")`

```
# $emmeans
# f1 f2 response SE df lower.CL upper.CL
# a 1 1.767 0.786 16 0.688 4.538
# c 1 0.903 0.402 16 0.352 2.321
# a c 0.279 0.124 16 0.108 0.716
# c c 3.800 1.691 16 1.479 9.763
#
# Confidence level used: 0.95
# Intervals are back-transformed from the log scale
#
# $contrasts
# contrast ratio SE df t.ratio p.value
# a,1 / c,1 1.9553 1.2306 16 1.065 0.7146
# a,1 / a,c 6.3396 3.9900 16 2.934 0.0434
# a,1 / c,c 0.4648 0.2926 16 -1.217 0.6253
# c,1 / a,c 3.2422 2.0406 16 1.869 0.2795
# c,1 / c,c 0.2377 0.1496 16 -2.283 0.1438
# a,c / c,c 0.0733 0.0461 16 -4.152 0.0038
#
# P value adjustment: tukey method for comparing a family of 4 estimates
# Tests are performed on the log scale
```

# Changing the multiple comparisons adjustment

The `adjust`

argument can be used to change the type of multiple comparisons adjustment. All available options are listed and described in the documentation for `summary.emmGrid`

under the section *P-value adjustments*.

One option is to skip multiple comparisons adjustments all together, using `adjust = "none"`

. If we use this the message about multiple comparisons disappears (since we didnāt use one).

```
emm1.1 = emmeans(fit1, specs = pairwise ~ f1:f2, type = "response", adjust = "none")
emm1.1
```

```
# $emmeans
# f1 f2 response SE df lower.CL upper.CL
# a 1 1.767 0.786 16 0.688 4.538
# c 1 0.903 0.402 16 0.352 2.321
# a c 0.279 0.124 16 0.108 0.716
# c c 3.800 1.691 16 1.479 9.763
#
# Confidence level used: 0.95
# Intervals are back-transformed from the log scale
#
# $contrasts
# contrast ratio SE df t.ratio p.value
# a,1 / c,1 1.9553 1.2306 16 1.065 0.3025
# a,1 / a,c 6.3396 3.9900 16 2.934 0.0097
# a,1 / c,c 0.4648 0.2926 16 -1.217 0.2412
# c,1 / a,c 3.2422 2.0406 16 1.869 0.0801
# c,1 / c,c 0.2377 0.1496 16 -2.283 0.0365
# a,c / c,c 0.0733 0.0461 16 -4.152 0.0008
#
# Tests are performed on the log scale
```

# Confidence intervals for comparisons

We will almost invariably want to report confidence intervals for any comparisons of interest. We need a separate function to get these. Here is an example using the `confint()`

function with the default 95% CI (the confidence level can be changed, see `?confint.emmGrid`

). I use the pipe to pass the `contrasts`

into the `confint()`

function.

```
emm1.1$contrasts %>%
confint()
```

```
# contrast ratio SE df lower.CL upper.CL
# a,1 / c,1 1.9553 1.2306 16 0.5150 7.424
# a,1 / a,c 6.3396 3.9900 16 1.6696 24.072
# a,1 / c,c 0.4648 0.2926 16 0.1224 1.765
# c,1 / a,c 3.2422 2.0406 16 0.8539 12.311
# c,1 / c,c 0.2377 0.1496 16 0.0626 0.903
# a,c / c,c 0.0733 0.0461 16 0.0193 0.278
#
# Confidence level used: 0.95
# Intervals are back-transformed from the log scale
```

The `confint()`

function returns confidence intervals but gets rid of the statistical tests. Some people will want to also report the test statistics and p-values. In this case, we can use `summary()`

instead of `confint()`

, with `infer = TRUE`

.

```
emm1.1$contrasts %>%
summary(infer = TRUE)
```

```
# contrast ratio SE df lower.CL upper.CL t.ratio p.value
# a,1 / c,1 1.9553 1.2306 16 0.5150 7.424 1.065 0.3025
# a,1 / a,c 6.3396 3.9900 16 1.6696 24.072 2.934 0.0097
# a,1 / c,c 0.4648 0.2926 16 0.1224 1.765 -1.217 0.2412
# c,1 / a,c 3.2422 2.0406 16 0.8539 12.311 1.869 0.0801
# c,1 / c,c 0.2377 0.1496 16 0.0626 0.903 -2.283 0.0365
# a,c / c,c 0.0733 0.0461 16 0.0193 0.278 -4.152 0.0008
#
# Confidence level used: 0.95
# Intervals are back-transformed from the log scale
# Tests are performed on the log scale
```

# Putting results in a data.frame

One of the really nice things about `emmeans()`

is that it makes it easy to get the results into a nice format for making tables or graphics of results. This is because the results can be converted into a data.frame via `as.data.frame()`

.

This is true for both `confint()`

and `summary()`

, although Iāll just show `confint()`

here. After getting the confidence intervals I pipe the results into `as.data.frame()`

. I assign this object a name so I can use it for, e.g., making a graph of results.

```
emm1.1_contrasts = emm1.1$contrasts %>%
confint() %>%
as.data.frame()
emm1.1_contrasts
```

```
# contrast ratio SE df lower.CL upper.CL
# 1 a,1 / c,1 1.95530312 1.23062914 16 0.51495216 7.4243989
# 2 a,1 / a,c 6.33957277 3.99000180 16 1.66960135 24.0717240
# 3 a,1 / c,c 0.46482555 0.29255202 16 0.12241730 1.7649695
# 4 c,1 / a,c 3.24224552 2.04060525 16 0.85388364 12.3109935
# 5 c,1 / c,c 0.23772557 0.14961978 16 0.06260784 0.9026577
# 6 a,c / c,c 0.07332127 0.04614696 16 0.01931002 0.2784051
```

If needed, the estimated marginal means can also be put into a data.frame.

```
emm1.1$emmeans %>%
as.data.frame()
```

```
# f1 f2 response SE df lower.CL upper.CL
# 1 a 1 1.7665334 0.7861763 16 0.6876870 4.537879
# 2 c 1 0.9034576 0.4020739 16 0.3517035 2.320806
# 3 a c 0.2786518 0.1240109 16 0.1084753 0.715802
# 4 c c 3.8004222 1.6913362 16 1.4794517 9.762542
```

# Within group comparisons

While we *can* do all pairwise comparisons, there are certainly plenty of situations where the research question dictates that we only want a specific set of comparisons. A common example of this is when we want to compare the levels of one factor within the levels of another. Here Iāll show comparisons among levels of `f1`

for each level of `f2`

.

The only thing that changes is the right-hand side of the `specs`

formula. The code `f1|f2`

translates to ācompare levels of `f1`

within each level of `f2`

ā.

```
emm2 = emmeans(fit1, specs = pairwise ~ f1|f2, type = "response")
emm2
```

```
# $emmeans
# f2 = 1:
# f1 response SE df lower.CL upper.CL
# a 1.767 0.786 16 0.688 4.538
# c 0.903 0.402 16 0.352 2.321
#
# f2 = c:
# f1 response SE df lower.CL upper.CL
# a 0.279 0.124 16 0.108 0.716
# c 3.800 1.691 16 1.479 9.763
#
# Confidence level used: 0.95
# Intervals are back-transformed from the log scale
#
# $contrasts
# f2 = 1:
# contrast ratio SE df t.ratio p.value
# a / c 1.9553 1.2306 16 1.065 0.3025
#
# f2 = c:
# contrast ratio SE df t.ratio p.value
# a / c 0.0733 0.0461 16 -4.152 0.0008
#
# Tests are performed on the log scale
```

You can see there is no message about a multiple comparisons adjustment in the above set of comparisons. This is because the package default is to correct for the number of comparisons *within* each group instead of across groups. In this case there is only a single comparison in each group.

If we consider the family of comparisons to be all comparisons regardless of group and want to correct for multiple comparisons, we can do so via `rbind.emmGrid`

.

Here is an example of passing `contrasts`

to `rbind()`

to correct for multiple comparisons. The default adjustment is Bonferroni, which can be much too conservative when the number of comparisons is large. You can control the multiple comparisons procedure via `adjust`

.

The results of `rbind()`

can also be used with `summary()`

, `confint()`

, and/or `as.data.frame()`

.

```
emm2$contrasts %>%
rbind()
```

```
# f2 contrast ratio SE df t.ratio p.value
# 1 a / c 1.9553 1.2306 16 1.065 0.6050
# c a / c 0.0733 0.0461 16 -4.152 0.0015
#
# P value adjustment: bonferroni method for 2 tests
# Tests are performed on the log scale
```

# Main effects comparisons

Even if we have multiple factors in the model, complete with an interaction term, we can still do āoverallā comparisons among groups if our research question indicated that main effects were an important thing to estimate.

Doing main effects in the presence of an interaction means we *average over* the levels of the other factor(s). The `emmeans()`

function gives both a warning about the interaction and a message indicating which factor was averaged over to remind us of this.

Here is the estimated main effect of `f1`

. Since we are only interested in overall comparisons of that factor it is the only factor given on the right-hand side of the `specs`

formula.

`emmeans(fit1, specs = pairwise ~ f1)`

`# NOTE: Results may be misleading due to involvement in interactions`

```
# $emmeans
# f1 emmean SE df lower.CL upper.CL
# a -0.354 0.315 16 -1.0215 0.313
# c 0.617 0.315 16 -0.0503 1.284
#
# Results are averaged over the levels of: f2
# Results are given on the log (not the response) scale.
# Confidence level used: 0.95
#
# $contrasts
# contrast estimate SE df t.ratio p.value
# a - c -0.971 0.445 16 -2.182 0.0443
#
# Results are averaged over the levels of: f2
# Results are given on the log (not the response) scale.
```

# Treatment vs control example

The **emmeans** package has built-in helper functions for comparing each group mean to the control mean. If the control group is the in the first row of the `emmeans`

section of the output, this set of comparisons can be requested via `trt.vs.ctrl`

.

Note the default multiple comparisons adjustment is a Dunnett adjustment.

`emmeans(fit1, specs = trt.vs.ctrl ~ f1:f2)`

```
# $emmeans
# f1 f2 emmean SE df lower.CL upper.CL
# a 1 0.569 0.445 16 -0.374 1.512
# c 1 -0.102 0.445 16 -1.045 0.842
# a c -1.278 0.445 16 -2.221 -0.334
# c c 1.335 0.445 16 0.392 2.279
#
# Results are given on the log (not the response) scale.
# Confidence level used: 0.95
#
# $contrasts
# contrast estimate SE df t.ratio p.value
# c,1 - a,1 -0.671 0.629 16 -1.065 0.5857
# a,c - a,1 -1.847 0.629 16 -2.934 0.0262
# c,c - a,1 0.766 0.629 16 1.217 0.4947
#
# Results are given on the log (not the response) scale.
# P value adjustment: dunnettx method for 3 tests
```

Using `trt.vs.ctrl`

means we ended up comparing each group mean to the āa 1ā group since it is in the first row. In the example Iām using the control group, āc cā, is actually the *last* group listed in the `emmeans`

section. When the control group is the last group in `emmeans`

we can use `trt.vs.ctrlk`

to get the correct set of comparisons.

`emmeans(fit1, specs = trt.vs.ctrlk ~ f1:f2)`

```
# $emmeans
# f1 f2 emmean SE df lower.CL upper.CL
# a 1 0.569 0.445 16 -0.374 1.512
# c 1 -0.102 0.445 16 -1.045 0.842
# a c -1.278 0.445 16 -2.221 -0.334
# c c 1.335 0.445 16 0.392 2.279
#
# Results are given on the log (not the response) scale.
# Confidence level used: 0.95
#
# $contrasts
# contrast estimate SE df t.ratio p.value
# a,1 - c,c -0.766 0.629 16 -1.217 0.4947
# c,1 - c,c -1.437 0.629 16 -2.283 0.0935
# a,c - c,c -2.613 0.629 16 -4.152 0.0021
#
# Results are given on the log (not the response) scale.
# P value adjustment: dunnettx method for 3 tests
```

That gives us what we want in this case. However, if the control group was some other group, like āc 1ā, we could use `trt.vs.ctrlk`

with the `ref`

argument to define which row in the `emmeans`

section represents the control group.

The āc 1ā group is the second row in the `emmeans`

so we can use `ref = 2`

to define this group as the control group.

`emmeans(fit1, specs = trt.vs.ctrlk ~ f1:f2, ref = 2)`

```
# $emmeans
# f1 f2 emmean SE df lower.CL upper.CL
# a 1 0.569 0.445 16 -0.374 1.512
# c 1 -0.102 0.445 16 -1.045 0.842
# a c -1.278 0.445 16 -2.221 -0.334
# c c 1.335 0.445 16 0.392 2.279
#
# Results are given on the log (not the response) scale.
# Confidence level used: 0.95
#
# $contrasts
# contrast estimate SE df t.ratio p.value
# a,1 - c,1 0.671 0.629 16 1.065 0.5857
# a,c - c,1 -1.176 0.629 16 -1.869 0.1937
# c,c - c,1 1.437 0.629 16 2.283 0.0935
#
# Results are given on the log (not the response) scale.
# P value adjustment: dunnettx method for 3 tests
```

Finally, if we want to reverse the order of subtraction in the treatment vs control comparisons we can use the `reverse`

argument.

`emmeans(fit1, specs = trt.vs.ctrlk ~ f1:f2, ref = 2, reverse = TRUE)`

```
# $emmeans
# f1 f2 emmean SE df lower.CL upper.CL
# a 1 0.569 0.445 16 -0.374 1.512
# c 1 -0.102 0.445 16 -1.045 0.842
# a c -1.278 0.445 16 -2.221 -0.334
# c c 1.335 0.445 16 0.392 2.279
#
# Results are given on the log (not the response) scale.
# Confidence level used: 0.95
#
# $contrasts
# contrast estimate SE df t.ratio p.value
# c,1 - a,1 -0.671 0.629 16 -1.065 0.5857
# c,1 - a,c 1.176 0.629 16 1.869 0.1937
# c,1 - c,c -1.437 0.629 16 -2.283 0.0935
#
# Results are given on the log (not the response) scale.
# P value adjustment: dunnettx method for 3 tests
```

# Alternative code for comparisons

The `emmeans()`

package offers the option to do comparisons in two steps instead of in one step the way I have been using it as above. I personally find this alternative most useful when doing custom comparisons, and I think itās good to introduce it now so it looks familiar. This alternative keeps the estimated marginal means and the comparisons of interest in separate objects, which may be attractive in some situations.

The first step is to use `emmeans()`

to calculate the marginal means of interest. We still use the formula in `specs`

with the factor(s) of interest on the right-hand side but no longer put anything on the left-hand side of the tilde.

We can still use `type`

in `emmeans()`

but cannot use `adjust`

(since we donāt adjust for multiple comparisons until weāve actually done comparisons š).

```
emm3 = emmeans(fit1, specs = ~ f1:f2, type = "response")
emm3
```

```
# f1 f2 response SE df lower.CL upper.CL
# a 1 1.767 0.786 16 0.688 4.538
# c 1 0.903 0.402 16 0.352 2.321
# a c 0.279 0.124 16 0.108 0.716
# c c 3.800 1.691 16 1.479 9.763
#
# Confidence level used: 0.95
# Intervals are back-transformed from the log scale
```

We then get the comparisons we want in a second step using the `contrast()`

function. We request the comparisons we want via `method`

. When using built-in comparisons like I am here, we give the comparison function name as a string (meaning in quotes). Also see the `pairs()`

function, which is for the special case of all pairwise comparisons.

We can use `adjust`

in `contrast()`

to change the multiple comparisons adjustment.

`contrast(emm3, method = "pairwise", adjust = "none")`

```
# contrast ratio SE df t.ratio p.value
# a,1 / c,1 1.9553 1.2306 16 1.065 0.3025
# a,1 / a,c 6.3396 3.9900 16 2.934 0.0097
# a,1 / c,c 0.4648 0.2926 16 -1.217 0.2412
# c,1 / a,c 3.2422 2.0406 16 1.869 0.0801
# c,1 / c,c 0.2377 0.1496 16 -2.283 0.0365
# a,c / c,c 0.0733 0.0461 16 -4.152 0.0008
#
# Tests are performed on the log scale
```

We can follow the `contrast()`

argument with `summary()`

or `confint()`

and then put the results into a data.frame for plotting/saving. Again, I think the real strength of `contrast()`

comes when we want custom comparisons, and Iāll demonstrate these in my next post.