This post was last updated on 2021-11-04.

Package emmeans (formerly known as lsmeans) is enormously useful for folks wanting to do post hoc comparisons among groups after fitting a model. It has a very thorough set of vignettes (see the vignette topics here), is very flexible with a ton of options, and works out of the box with a lot of different model objects (and can be extended to others 👍).

I’ve been consistently recommending emmeans to students fitting models in R. However, often times students struggle a bit to get started using the package, possibly due to the sheer amount of flexibility and information in the vignettes.

I’ve put together some basic examples for using emmeans, meant to be a complement to the vignettes. Specifically this post will demonstrate a few of the built-in options for some standard post hoc comparisons; I will write a separate post about custom comparisons in emmeans.

Disclaimer: This post is about using a package in R and so unfortunately does not focus on appropriate statistical practice for model fitting and post hoc comparisons.

R packages

I will load magrittr for the pipe in addition to emmeans.

library(emmeans) # v. 1.7.0
library(magrittr) # v. 2.0.1

The dataset and model

I’ve made a small dataset to use in this example.

The response variable is resp, which comes from the log-normal distribution, and the two crossed factors of interest are f1 and f2. Each factor has two levels: a control called c as well as a second non-control level.

dat = data.frame(resp = c(1.6,0.3,3,0.1,3.2,0.2,0.4,0.4,2.8,
                          0.7,3.8,3,0.3,14.3,1.2,0.5,1.1,4.4,0.4,8.4),
                 f1 = factor(c("a","a","a","a","a",
                               "a","a","a","a","a","c","c","c","c","c",
                               "c","c","c","c","c")),
                 f2 = factor(c("1","c","1","c","1",
                               "c","1","c","1","c","1","c","1","c","1",
                               "c","1","c","1","c")))

str(dat)
# 'data.frame': 20 obs. of  3 variables:
#  $ resp: num  1.6 0.3 3 0.1 3.2 0.2 0.4 0.4 2.8 0.7 ...
#  $ f1  : Factor w/ 2 levels "a","c": 1 1 1 1 1 1 1 1 1 1 ...
#  $ f2  : Factor w/ 2 levels "1","c": 1 2 1 2 1 2 1 2 1 2 ...

The model I will use is a linear model with a log-transformed response variable and the two factors and their interaction as explanatory variables. This is the “true” model since I created these data so I’m skipping all model checks (which I would not do in a real analysis).

Note I use log(resp) in the model rather than creating a new log-transformed variable. This will allow me to demonstrate one of the convenient options available in emmeans() later.

fit1 = lm(log(resp) ~ f1 + f2 + f1:f2, data = dat)

Built in comparisons with emmeans()

The emmeans package has helper functions for commonly used post hoc comparisons (aka contrasts). For example, we can do pairwise comparisons via pairwise or revpairwise, treatment vs control comparisons via trt.vs.ctrl or trt.vs.ctrlk, and even consecutive comparisons via consec.

The available built-in functions for doing comparisons are listed in the documentation for ?"contrast-methods".

All pairwise comparisons

One way to use emmeans() is via formula coding for the comparisons. The formula is defined in the specs argument.

In my first example I do all pairwise comparisons for all combinations of f1 and f2. The built-in function pairwise is put on the left-hand side of the formula of the specs argument. The factors with levels to compare among are on the right-hand side. Since I’m doing all pairwise comparisons, the combination of f1 and f2 are put in the formula.

The model object is passed to the first argument in emmeans(), object.

emm1 = emmeans(fit1, specs = pairwise ~ f1:f2)

Using the formula in this way returns an object with two parts. The first part, called emmeans, is the estimated marginal means along with the standard errors and confidence intervals. We can pull these out with dollar sign notation, which I demonstrate below.

These results are all on the model scale, so in this case these are estimated mean log response for each f1 and f2 combination. Note the message that emmeans() gives us about results being on the log scale in the output. It knows the model is on the log scale because I used log(resp) as the response variable.

emm1$emmeans
#  f1 f2 emmean    SE df lower.CL upper.CL
#  a  1   0.569 0.445 16   -0.374    1.512
#  c  1  -0.102 0.445 16   -1.045    0.842
#  a  c  -1.278 0.445 16   -2.221   -0.334
#  c  c   1.335 0.445 16    0.392    2.279
# 
# Results are given on the log (not the response) scale. 
# Confidence level used: 0.95

The second part of the output, called contrasts, contains the comparisons of interest. It is this section that we are generally most interested in when answering a question about differences among groups. You can see which comparison is which via the contrast column.

These results are also on the model scale (and we get the same message in this section), and we’ll want to put them on the original scale.

The comparisons are accompanied by statistical tests of the null hypothesis of “no difference”, but lack confidence interval (CI) limits by default. We’ll need to get these.

The emmeans() package automatically adjusts for multiple comparisons. Since we did all pairwise comparisons the package used a Tukey adjustment. The type of adjustment can be changed.

emm1$contrasts
#  contrast  estimate    SE df t.ratio p.value
#  a 1 - c 1    0.671 0.629 16   1.065  0.7146
#  a 1 - a c    1.847 0.629 16   2.934  0.0434
#  a 1 - c c   -0.766 0.629 16  -1.217  0.6253
#  c 1 - a c    1.176 0.629 16   1.869  0.2795
#  c 1 - c c   -1.437 0.629 16  -2.283  0.1438
#  a c - c c   -2.613 0.629 16  -4.152  0.0038
# 
# Results are given on the log (not the response) scale. 
# P value adjustment: tukey method for comparing a family of 4 estimates

Back-transforming results

Since I used a log transformation I can express the results as multiplicative differences in medians on the original (data) scale.

We can always back-transform estimates and CI limits by hand, but in emmeans() we can use the type argument for this. Using type = "response" will return results on the original scale. This works when the transformation is explicit in the model (e.g., log(resp)) and works similarly for link functions in generalized linear models.

You’ll see the message changes in the output once I do this, indicating things were back-transformed from the model scale. We also are reminded that the tests were done on the model scale.

In the contrast column in the contrasts section we can see the expression of the comparisons has changed from additive comparisons (via subtraction) shown above to multiplicative comparisons (via division).

emmeans(fit1, specs = pairwise ~ f1:f2, type = "response")
# $emmeans
#  f1 f2 response    SE df lower.CL upper.CL
#  a  1     1.767 0.786 16    0.688    4.538
#  c  1     0.903 0.402 16    0.352    2.321
#  a  c     0.279 0.124 16    0.108    0.716
#  c  c     3.800 1.691 16    1.479    9.763
# 
# Confidence level used: 0.95 
# Intervals are back-transformed from the log scale 
# 
# $contrasts
#  contrast   ratio     SE df null t.ratio p.value
#  a 1 / c 1 1.9553 1.2306 16    1   1.065  0.7146
#  a 1 / a c 6.3396 3.9900 16    1   2.934  0.0434
#  a 1 / c c 0.4648 0.2926 16    1  -1.217  0.6253
#  c 1 / a c 3.2422 2.0406 16    1   1.869  0.2795
#  c 1 / c c 0.2377 0.1496 16    1  -2.283  0.1438
#  a c / c c 0.0733 0.0461 16    1  -4.152  0.0038
# 
# P value adjustment: tukey method for comparing a family of 4 estimates 
# Tests are performed on the log scale

Changing the multiple comparisons adjustment

The adjust argument can be used to change the type of multiple comparisons adjustment. All available options are listed and described in the documentation for summary.emmGrid under the section P-value adjustments.

One option is to skip multiple comparisons adjustments all together, using adjust = "none". If we use this the message about multiple comparisons disappears (since we didn’t use one).

emm1.1 = emmeans(fit1, specs = pairwise ~ f1:f2, type = "response", adjust = "none")
emm1.1
# $emmeans
#  f1 f2 response    SE df lower.CL upper.CL
#  a  1     1.767 0.786 16    0.688    4.538
#  c  1     0.903 0.402 16    0.352    2.321
#  a  c     0.279 0.124 16    0.108    0.716
#  c  c     3.800 1.691 16    1.479    9.763
# 
# Confidence level used: 0.95 
# Intervals are back-transformed from the log scale 
# 
# $contrasts
#  contrast   ratio     SE df null t.ratio p.value
#  a 1 / c 1 1.9553 1.2306 16    1   1.065  0.3025
#  a 1 / a c 6.3396 3.9900 16    1   2.934  0.0097
#  a 1 / c c 0.4648 0.2926 16    1  -1.217  0.2412
#  c 1 / a c 3.2422 2.0406 16    1   1.869  0.0801
#  c 1 / c c 0.2377 0.1496 16    1  -2.283  0.0365
#  a c / c c 0.0733 0.0461 16    1  -4.152  0.0008
# 
# Tests are performed on the log scale

Confidence intervals for comparisons

We will almost invariably want to report confidence intervals for any comparisons of interest. We need a separate function to get these. Here is an example using the confint() function with the default 95% CI (the confidence level can be changed, see ?confint.emmGrid). I use the pipe to pass the contrasts into the confint() function.

emm1.1$contrasts %>%
     confint()
#  contrast   ratio     SE df lower.CL upper.CL
#  a 1 / c 1 1.9553 1.2306 16   0.5150    7.424
#  a 1 / a c 6.3396 3.9900 16   1.6696   24.072
#  a 1 / c c 0.4648 0.2926 16   0.1224    1.765
#  c 1 / a c 3.2422 2.0406 16   0.8539   12.311
#  c 1 / c c 0.2377 0.1496 16   0.0626    0.903
#  a c / c c 0.0733 0.0461 16   0.0193    0.278
# 
# Confidence level used: 0.95 
# Intervals are back-transformed from the log scale

The confint() function returns confidence intervals but gets rid of the statistical tests. Some people will want to also report the test statistics and p-values. In this case, we can use summary() instead of confint(), with infer = TRUE.

emm1.1$contrasts %>%
     summary(infer = TRUE)
#  contrast   ratio     SE df lower.CL upper.CL null t.ratio p.value
#  a 1 / c 1 1.9553 1.2306 16   0.5150    7.424    1   1.065  0.3025
#  a 1 / a c 6.3396 3.9900 16   1.6696   24.072    1   2.934  0.0097
#  a 1 / c c 0.4648 0.2926 16   0.1224    1.765    1  -1.217  0.2412
#  c 1 / a c 3.2422 2.0406 16   0.8539   12.311    1   1.869  0.0801
#  c 1 / c c 0.2377 0.1496 16   0.0626    0.903    1  -2.283  0.0365
#  a c / c c 0.0733 0.0461 16   0.0193    0.278    1  -4.152  0.0008
# 
# Confidence level used: 0.95 
# Intervals are back-transformed from the log scale 
# Tests are performed on the log scale

Putting results in a data.frame

One of the really nice things about emmeans() is that it makes it easy to get the results into a nice format for making tables or graphics of results. This is because the results are converted to a data.frame with confint() or summary().

If needed, the estimated marginal means can also be put into a data.frame. In this case we can use as.data.frame() to convert the emmeans to a data.frame for plotting or putting into a table of results. We can also use as.data.frame() directly on the contrasts above if we don’t need confint() or summary() (not shown).

emm1.1$emmeans %>%
     as.data.frame()
#   f1 f2  response        SE df  lower.CL upper.CL
# 1  a  1 1.7665334 0.7861763 16 0.6876870 4.537879
# 2  c  1 0.9034576 0.4020739 16 0.3517035 2.320806
# 3  a  c 0.2786518 0.1240109 16 0.1084753 0.715802
# 4  c  c 3.8004222 1.6913362 16 1.4794517 9.762542

Within group comparisons

While we can do all pairwise comparisons, there are certainly plenty of situations where the research question dictates that we only want a specific set of comparisons. A common example of this is when we want to compare the levels of one factor within the levels of another. Here I’ll show comparisons among levels of f1 for each level of f2.

The only thing that changes is the right-hand side of the specs formula. The code f1|f2 translates to “compare levels of f1 within each level of f2”.

emm2 = emmeans(fit1, specs = pairwise ~ f1|f2, type = "response")
emm2
# $emmeans
# f2 = 1:
#  f1 response    SE df lower.CL upper.CL
#  a     1.767 0.786 16    0.688    4.538
#  c     0.903 0.402 16    0.352    2.321
# 
# f2 = c:
#  f1 response    SE df lower.CL upper.CL
#  a     0.279 0.124 16    0.108    0.716
#  c     3.800 1.691 16    1.479    9.763
# 
# Confidence level used: 0.95 
# Intervals are back-transformed from the log scale 
# 
# $contrasts
# f2 = 1:
#  contrast  ratio     SE df null t.ratio p.value
#  a / c    1.9553 1.2306 16    1   1.065  0.3025
# 
# f2 = c:
#  contrast  ratio     SE df null t.ratio p.value
#  a / c    0.0733 0.0461 16    1  -4.152  0.0008
# 
# Tests are performed on the log scale

You can see there is no message about a multiple comparisons adjustment in the above set of comparisons. This is because the package default is to correct for the number of comparisons within each group instead of across groups. In this case there is only a single comparison in each group.

If we consider the family of comparisons to be all comparisons regardless of group and want to correct for multiple comparisons, we can do so via rbind.emmGrid.

Here is an example of passing contrasts to rbind() to correct for multiple comparisons. The default adjustment is Bonferroni, which can be much too conservative when the number of comparisons is large. You can control the multiple comparisons procedure via adjust.

The results of rbind() can also conveniently be used with summary(), confint(), and/or as.data.frame().

emm2$contrasts %>%
     rbind() 
#  f2 contrast  ratio     SE df null t.ratio p.value
#  1  a / c    1.9553 1.2306 16    1   1.065  0.6050
#  c  a / c    0.0733 0.0461 16    1  -4.152  0.0015
# 
# P value adjustment: bonferroni method for 2 tests 
# Tests are performed on the log scale

Main effects comparisons

Even if we have multiple factors in the model, complete with an interaction term, we can still do “overall” comparisons among groups if our research question indicated that main effects were important to estimate.

Doing main effects in the presence of an interaction means we average over the levels of the other factor(s). The emmeans() function gives both a warning about the interaction and a message indicating which factor was averaged over to remind us of this.

Here is the estimated main effect of f1. Since we are only interested in overall comparisons of that factor it is the only factor given on the right-hand side of the specs formula.

emmeans(fit1, specs = pairwise ~ f1)
# NOTE: Results may be misleading due to involvement in interactions
# $emmeans
#  f1 emmean    SE df lower.CL upper.CL
#  a  -0.354 0.315 16  -1.0215    0.313
#  c   0.617 0.315 16  -0.0503    1.284
# 
# Results are averaged over the levels of: f2 
# Results are given on the log (not the response) scale. 
# Confidence level used: 0.95 
# 
# $contrasts
#  contrast estimate    SE df t.ratio p.value
#  a - c      -0.971 0.445 16  -2.182  0.0443
# 
# Results are averaged over the levels of: f2 
# Results are given on the log (not the response) scale.

Treatment vs control example

The emmeans package has built-in helper functions for comparing each group mean to the control mean. If the control group is the in the first row of the emmeans section of the output, this set of comparisons can be requested via trt.vs.ctrl.

Note the default multiple comparisons adjustment is a Dunnett adjustment.

emmeans(fit1, specs = trt.vs.ctrl ~ f1:f2)
# $emmeans
#  f1 f2 emmean    SE df lower.CL upper.CL
#  a  1   0.569 0.445 16   -0.374    1.512
#  c  1  -0.102 0.445 16   -1.045    0.842
#  a  c  -1.278 0.445 16   -2.221   -0.334
#  c  c   1.335 0.445 16    0.392    2.279
# 
# Results are given on the log (not the response) scale. 
# Confidence level used: 0.95 
# 
# $contrasts
#  contrast  estimate    SE df t.ratio p.value
#  c 1 - a 1   -0.671 0.629 16  -1.065  0.5857
#  a c - a 1   -1.847 0.629 16  -2.934  0.0262
#  c c - a 1    0.766 0.629 16   1.217  0.4947
# 
# Results are given on the log (not the response) scale. 
# P value adjustment: dunnettx method for 3 tests

Using trt.vs.ctrl means we ended up comparing each group mean to the “a 1” group since it is in the first row. In the example I’m using the control group, “c c”, is actually the last group listed in the emmeans section. When the control group is the last group in emmeans we can use trt.vs.ctrlk to get the correct set of comparisons.

emmeans(fit1, specs = trt.vs.ctrlk ~ f1:f2)
# $emmeans
#  f1 f2 emmean    SE df lower.CL upper.CL
#  a  1   0.569 0.445 16   -0.374    1.512
#  c  1  -0.102 0.445 16   -1.045    0.842
#  a  c  -1.278 0.445 16   -2.221   -0.334
#  c  c   1.335 0.445 16    0.392    2.279
# 
# Results are given on the log (not the response) scale. 
# Confidence level used: 0.95 
# 
# $contrasts
#  contrast  estimate    SE df t.ratio p.value
#  a 1 - c c   -0.766 0.629 16  -1.217  0.4947
#  c 1 - c c   -1.437 0.629 16  -2.283  0.0935
#  a c - c c   -2.613 0.629 16  -4.152  0.0021
# 
# Results are given on the log (not the response) scale. 
# P value adjustment: dunnettx method for 3 tests

That gives us what we want in this case. However, if the control group was some other group, like “c 1”, we could use trt.vs.ctrlk with the ref argument to define which row in the emmeans section represents the control group.

The “c 1” group is the second row in the emmeans so we can use ref = 2 to define this group as the control group.

emmeans(fit1, specs = trt.vs.ctrlk ~ f1:f2, ref = 2)
# $emmeans
#  f1 f2 emmean    SE df lower.CL upper.CL
#  a  1   0.569 0.445 16   -0.374    1.512
#  c  1  -0.102 0.445 16   -1.045    0.842
#  a  c  -1.278 0.445 16   -2.221   -0.334
#  c  c   1.335 0.445 16    0.392    2.279
# 
# Results are given on the log (not the response) scale. 
# Confidence level used: 0.95 
# 
# $contrasts
#  contrast  estimate    SE df t.ratio p.value
#  a 1 - c 1    0.671 0.629 16   1.065  0.5857
#  a c - c 1   -1.176 0.629 16  -1.869  0.1937
#  c c - c 1    1.437 0.629 16   2.283  0.0935
# 
# Results are given on the log (not the response) scale. 
# P value adjustment: dunnettx method for 3 tests

Finally, if we want to reverse the order of subtraction in the treatment vs control comparisons we can use the reverse argument.

emmeans(fit1, specs = trt.vs.ctrlk ~ f1:f2, ref = 2, reverse = TRUE)
# $emmeans
#  f1 f2 emmean    SE df lower.CL upper.CL
#  a  1   0.569 0.445 16   -0.374    1.512
#  c  1  -0.102 0.445 16   -1.045    0.842
#  a  c  -1.278 0.445 16   -2.221   -0.334
#  c  c   1.335 0.445 16    0.392    2.279
# 
# Results are given on the log (not the response) scale. 
# Confidence level used: 0.95 
# 
# $contrasts
#  contrast  estimate    SE df t.ratio p.value
#  c 1 - a 1   -0.671 0.629 16  -1.065  0.5857
#  c 1 - a c    1.176 0.629 16   1.869  0.1937
#  c 1 - c c   -1.437 0.629 16  -2.283  0.0935
# 
# Results are given on the log (not the response) scale. 
# P value adjustment: dunnettx method for 3 tests

Alternative code for comparisons

The emmeans() package offers the option to do comparisons in two steps instead of in one step the way I have been using it so far. I personally find this alternative most useful when doing custom comparisons, and I think it’s useful to introduce it now so it looks familiar. This alternative keeps the estimated marginal means and the comparisons of interest in separate objects, which can be attractive in some situations.

The first step is to use emmeans() to calculate the marginal means of interest. We still use the formula in specs with the factor(s) of interest on the right-hand side but no longer put anything on the left-hand side of the tilde.

We can still use type in emmeans() but cannot use adjust (since we don’t adjust for multiple comparisons until we’ve actually done comparisons 😉).

emm3 = emmeans(fit1, specs = ~ f1:f2, type = "response")
emm3
#  f1 f2 response    SE df lower.CL upper.CL
#  a  1     1.767 0.786 16    0.688    4.538
#  c  1     0.903 0.402 16    0.352    2.321
#  a  c     0.279 0.124 16    0.108    0.716
#  c  c     3.800 1.691 16    1.479    9.763
# 
# Confidence level used: 0.95 
# Intervals are back-transformed from the log scale

We then get the comparisons we want in a second step using the contrast() function. We request the comparisons we want via method. When using built-in comparisons like I am here, we give the comparison function name as a string (meaning in quotes). Also see the pairs() function, which is for the special case of all pairwise comparisons.

We can use adjust in contrast() to change the multiple comparisons adjustment.

contrast(emm3, method = "pairwise", adjust = "none")
#  contrast   ratio     SE df null t.ratio p.value
#  a 1 / c 1 1.9553 1.2306 16    1   1.065  0.3025
#  a 1 / a c 6.3396 3.9900 16    1   2.934  0.0097
#  a 1 / c c 0.4648 0.2926 16    1  -1.217  0.2412
#  c 1 / a c 3.2422 2.0406 16    1   1.869  0.0801
#  c 1 / c c 0.2377 0.1496 16    1  -2.283  0.0365
#  a c / c c 0.0733 0.0461 16    1  -4.152  0.0008
# 
# Tests are performed on the log scale

We can follow the contrast() argument with summary() or confint() to get the output we want and put them into a data.frame for plotting/saving. Again, I think the real strength of contrast() comes when we want custom comparisons, and I’ll demonstrate these in my next post on custom contrasts.

Just the code, please

Here’s the code without all the discussion. Copy and paste the code below or you can download an R script of uncommented code from here.

library(emmeans) # v. 1.7.0
library(magrittr) # v. 2.0.1

dat = data.frame(resp = c(1.6,0.3,3,0.1,3.2,0.2,0.4,0.4,2.8,
                          0.7,3.8,3,0.3,14.3,1.2,0.5,1.1,4.4,0.4,8.4),
                 f1 = factor(c("a","a","a","a","a",
                               "a","a","a","a","a","c","c","c","c","c",
                               "c","c","c","c","c")),
                 f2 = factor(c("1","c","1","c","1",
                               "c","1","c","1","c","1","c","1","c","1",
                               "c","1","c","1","c")))

str(dat)

fit1 = lm(log(resp) ~ f1 + f2 + f1:f2, data = dat)

emm1 = emmeans(fit1, specs = pairwise ~ f1:f2)

emm1$emmeans
emm1$contrasts

emmeans(fit1, specs = pairwise ~ f1:f2, type = "response")

emm1.1 = emmeans(fit1, specs = pairwise ~ f1:f2, type = "response", adjust = "none")
emm1.1

emm1.1$contrasts %>%
     confint()

emm1.1$contrasts %>%
     summary(infer = TRUE)

emm1.1$emmeans %>%
     as.data.frame()

emm2 = emmeans(fit1, specs = pairwise ~ f1|f2, type = "response")
emm2

emm2$contrasts %>%
     rbind() 

emmeans(fit1, specs = pairwise ~ f1)

emmeans(fit1, specs = trt.vs.ctrl ~ f1:f2)

emmeans(fit1, specs = trt.vs.ctrlk ~ f1:f2)

emmeans(fit1, specs = trt.vs.ctrlk ~ f1:f2, ref = 2)

emmeans(fit1, specs = trt.vs.ctrlk ~ f1:f2, ref = 2, reverse = TRUE)

emm3 = emmeans(fit1, specs = ~ f1:f2, type = "response")
emm3

contrast(emm3, method = "pairwise", adjust = "none")