In this post I show an example of how to automate the process of making many exploratory plots in ggplot2 with multiple continuous response and explanatory variables. To loop through both x and y variables involves nested looping. In the latter section of the post I go over options for saving the resulting plots, either together in a single document, separately, or by creating combined plots prior to saving.
I currently work as a consulting statistician, advising natural and social science researchers on statistics, statistical programming, and study design. I create and teach R workshops for applied science graduate students who are just getting started in R, where my goal is to make their transition to a programming language as smooth as possible. See my workshop materials at my website.
Lots of zeros or too many zeros?: Thinking about zero inflation in count data
In a recent lecture I gave a basic overview of zero-inflation in count distributions. My main take-home message to the students that I thought worth posting about here is that having a lot of zero values does not necessarily mean you have zero inflation. Zero inflation is when there are more 0 values in the data than the distribution allows for. But some distributions can have a lot of zeros!
How to plot fitted lines with ggplot2
Most analyses aren’t really done until we’ve found a way to visualize the results graphically, and I’ve recently been getting some questions from students on how to plot fitted lines from models. There are some R packages that are made specifically for this purpose; see packages effects and visreg, for example. If using the ggplot2 package for plotting, fitted lines from simple models can be graphed using geom_smooth(). However, once models get more complicated that convenient function is no longer useful.
Analysis essentials: An example directory structure for an analysis using R
There are a lot of practical skills involved in doing an analysis that are essential but that I rarely (never?) see included in the curriculum, statistics or otherwise. These are skills like how to organize your data, how to approach QAQC, and how to set up a naming algorithm for files. We all need to do these things, but too often we end up learning these skills by muddling through on our own.
The log-0 problem: analysis strategies and options for choosing c in log(y + c)
I periodically find myself having long conversations with consultees about 0’s. Why? Well, the basic suite of statistical tools many of us learn first involves the normal distribution (for the errors). The log transformation tends to feature prominently for working with right-skewed data. Since log(0) returns -Infinity, a common first reaction is to use log(y + c) as the response in place of log(y), where c is some constant added to the y variable to get rid of the 0 values.
Getting started simulating data in R: some helpful functions and how to use them
I’ve been trying to participate a little more in the R community outside of my narrow professional world, so when the co-organizer of the Eugene R Users Group invited me to come talk at one of their meet-ups I agreed (even though it involved public speaking! 😱). I started out thinking I’d talk about doing simulations. But could I do that in 45 minutes? Maybe not. After much pondering I ended up settling on the topic of how we start a simulation: by making data in R.
Automating exploratory plots with ggplot2 and purrr
When you have a lot of variables and need to make a lot exploratory plots it’s usually worthwhile to automate the process in R instead of manually copying and pasting code for every plot. However, the coding approach needed to automate plots can look pretty daunting to a beginner R user. It can look so daunting, in fact, that it can appear easier to manually make the plots (like in Excel) rather than using R at all.
Creating legends when aesthetics are constants in ggplot2
In general, if you want to map an aesthetic to a variable and get a legend in ggplot2 you do it inside aes(). If you want to set an aesthetic to a constant value, like making all the points purple, you do it outside aes(). However, there are situations where you might want to set an aesthetic for a layer to a constant but you also want a legend for that aesthetic.
Simulate! Simulate! - Part 3: The Poisson edition
One of the things I like about simulations is that, with practice, they can be a quick way to check your intuition about a model or relationship. My most recent example is based on a discussion with a student about quadratic effects. I’ve never had a great grasp on what the coefficients that define a quadratic relationship mean. Luckily there is this very nice FAQ page from the Institute for Digital Research and Education at UCLA that goes over the meaning of the coefficients in detail, with examples.
Time after time: calculating the autocorrelation function for uneven or grouped time series
I first learned how to check for autocorrelation via autocorrelation function (ACF) plots in R in a class on time series However, the examples we worked on were all single, long term time series with no missing values and no groups. I figured out later that calculating the ACF when the sampling through time is uneven or there are distinct time series for independent sample units takes a bit more thought.
A closer look at replicate() and purrr::map() for simulations
I’ve done a couple of posts so far on simulations, here and here, where I demonstrate how to build a function for simulating data from a defined linear model and then explore long-run behavior of models fit to the simulated datasets. The focus of those posts was on the general simulation process, and I didn’t go into much detail on the specific R code. In this post I’ll focus in on the code I use for repeatedly simulating data and extracting output, specifically talking about the function replicate() and the map family of functions from package purrr.