When working with counts, having many zeros does not necessarily indicate zero inflation. I demonstrate this by simulating data from the negative binomial and generalized Poisson distributions. I then show one way to check if the data has excess zeros compared to the number of zeros expected based on the model.
Analyzing positive data with 0 values can be challenging, since a direct log transformation isn't possible. I discuss some of the things to consider when deciding on an analysis strategy for such data and then explore the effect of the value of the constant, c, when using log(y + c) as the response variable.
Checking for model fit from generalized linear mixed models (GLMM) can be challenging. The DHARMa package helps with this by giving simulated residuals but doesn't work with all model types. I show how to use tools in DHARMa to extend it for use with unsupported models fit with glmmTMB() and zeroinfl().
I currently work as a consulting statistician, advising natural and social science researchers on statistics, statistical programming, and study design. I create and teach R workshops for applied science graduate students who are just getting started in R, where my goal is to make their transition to a programming language as smooth as possible. See my workshop materials at my website.
The small multiples plot: how to combine ggplot2 plots with one shared axis
Load R packages The set-up Using facets for small multiples Using cowplot to combine plots Using egg to combine plots Adding plot labels with tag_facet() in egg There are a variety of ways to combine ggplot2 plots with a single shared axis. However, things can get tricky if you want a lot of control over all plot elements. I demonstrate three different approaches for this: 1. Using facets, which is built in to ggplot2 but doesn’t allow much control over the non-shared axes.
Embedding subplots in ggplot2 graphics
The idea of embedded plots for visualizing a large dataset that has an overplotting problem recently came up in some discussions with students. I first learned about embedded graphics from package ggsubplot. You can still see an old post about that package and about embedded graphics in general, with examples. However, ggsubplot is no longer maintained and doesn’t work with current versions of ggplot2. I poked around a bit, and found that annotation_custom() is the go-to function for embedding plots in a ggplot2 graphic.
Custom contrasts in emmeans
Following up on a previous post, where I demonstrated the basic usage of package emmeans for doing post hoc comparisons, here I’ll demonstrate how to make custom comparisons (aka contrasts). These are comparisons that aren’t encompassed by the built-in functions in the package. Remember that you can explore the available built-in emmeans functions for doing comparisons via ?"contrast-methods". Reasons for custom comparisons There are a variety of reasons you might need custom comparisons instead of some of the standard, built-in ones.
Getting started with emmeans
Package emmeans (formerly known as lsmeans) is enormously useful for folks wanting to do post hoc comparisons among groups after fitting a model. It has a very thorough set of vignettes (see the vignette topics here), is very flexible with a ton of options, and works out of the box with a lot of different model objects (and can be extended to others 👍). I’ve started recommending emmeans all the time to students fitting models in R.
Lots of zeros or too many zeros?: Thinking about zero inflation in count data
Load packages and dataset Negative binomial with many zeros Generalized Poisson with many zeros Lots of zeros or excess zeros? Simulate negative binomial data Checking for excess zeros An example with excess zeros In a recent lecture I gave a basic overview of zero-inflation in count distributions. My main take-home message to the students that I thought worth posting about here is that having a lot of zero values does not necessarily mean you have zero inflation.
How to plot fitted lines with ggplot2
Load packages and dataset Plotting separate slopes with geom_smooth() Extracting predicted values with predict() Plotting predicted values with geom_line() Add confidence intervals for lm objects Using a new dataset with predict() Plotting fitted lines from an lme object Confidence intervals for lme objects What if there is no predict() function? Most analyses aren’t really done until we’ve found a way to visualize the results graphically, and I’ve recently been getting some questions from students on how to plot fitted lines from models.
Analysis essentials: An example directory structure for an analysis using R
There are a lot of practical skills involved in doing an analysis that are essential but that I rarely (never?) see included in the curriculum, statistics or otherwise. These are skills like how to organize your data, how to approach QAQC, and how to set up a naming algorithm for files. We all need to do these things, but too often we end up learning these skills by muddling through on our own.
The log-0 problem: analysis strategies and options for choosing c in log(y + c)
I periodically find myself having long conversations with consultees about 0’s. Why? Well, the basic suite of statistical tools many of us learn first involves the normal distribution (for the errors). The log transformation tends to feature prominently for working with right-skewed data. Since log(0) returns -Infinity, a common first reaction is to use log(y + c) as the response in place of log(y), where c is some constant added to the y variable to get rid of the 0 values.
Getting started simulating data in R: some helpful functions and how to use them
I’ve been trying to participate a little more in the R community outside of my narrow professional world, so when the co-organizer of the Eugene R Users Group invited me to come talk at one of their meet-ups I agreed (even though it involved public speaking! 😱). I started out thinking I’d talk about doing simulations. But could I do that in 45 minutes? Maybe not. After much pondering I ended up settling on the topic of how we start a simulation: by making data in R.
Automating exploratory plots with ggplot2 and purrr
Load R packages The set-up Create a plotting function Looping through one vector of variables Looping through both vectors Saving the plots Saving all plots to one PDF Saving groups of plots together Saving all plots separately Combining plots When you have a lot of variables and need to make a lot exploratory plots it’s usually worthwhile to automate the process in R instead of manually copying and pasting code for every plot.