Very statisticious
https://aosmith.rbind.io/
Recent content on Very statisticiousHugo -- gohugo.ioMon, 13 May 2019 00:00:00 +0000The small multiples plot: how to combine ggplot2 plots with one shared axis
https://aosmith.rbind.io/2019/05/13/small-multiples-plot/
Mon, 13 May 2019 00:00:00 +0000https://aosmith.rbind.io/2019/05/13/small-multiples-plot/Load R packages The set-up Using facets for small multiples Using cowplot to combine plots Using egg to combine plots Adding plot labels with tag_facet() in egg There are a variety of ways to combine ggplot2 plots with a single shared axis. However, things can get tricky if you want a lot of control over all plot elements.
I demonstrate three different approaches for this:
1. Using facets, which is built in to ggplot2 but doesn’t allow much control over the non-shared axes.Embedding subplots in ggplot2 graphics
https://aosmith.rbind.io/2019/04/22/embedding-subplots/
Mon, 22 Apr 2019 00:00:00 +0000https://aosmith.rbind.io/2019/04/22/embedding-subplots/The idea of embedded plots for visualizing a large dataset that has an overplotting problem recently came up in some discussions with students. I first learned about embedded graphics from package ggsubplot. You can still see an old post about that package and about embedded graphics in general, with examples. However, ggsubplot is no longer maintained and doesn’t work with current versions of ggplot2.
I poked around a bit, and found that annotation_custom() is the go-to function for embedding plots in a ggplot2 graphic.Custom contrasts in emmeans
https://aosmith.rbind.io/2019/04/15/custom-contrasts-emmeans/
Mon, 15 Apr 2019 00:00:00 +0000https://aosmith.rbind.io/2019/04/15/custom-contrasts-emmeans/Following up on a previous post, where I demonstrated the basic usage of package emmeans for doing post hoc comparisons, here I’ll demonstrate how to make custom comparisons (aka contrasts). These are comparisons that aren’t encompassed by the built-in functions in the package.
Remember that you can explore the available built-in emmeans functions for doing comparisons via ?"contrast-methods".
Reasons for custom comparisons There are a variety of reasons you might need custom comparisons instead of some of the standard, built-in ones.Getting started with emmeans
https://aosmith.rbind.io/2019/03/25/getting-started-with-emmeans/
Mon, 25 Mar 2019 00:00:00 +0000https://aosmith.rbind.io/2019/03/25/getting-started-with-emmeans/Package emmeans (formerly known as lsmeans) is enormously useful for folks wanting to do post hoc comparisons among groups after fitting a model. It has a very thorough set of vignettes (see the vignette topics here), is very flexible with a ton of options, and works out of the box with a lot of different model objects (and can be extended to others 👍).
I’ve started recommending emmeans all the time to students fitting models in R.Lots of zeros or too many zeros?: Thinking about zero inflation in count data
https://aosmith.rbind.io/2019/03/06/lots-of-zeros/
Wed, 06 Mar 2019 00:00:00 +0000https://aosmith.rbind.io/2019/03/06/lots-of-zeros/Load packages and dataset Negative binomial with many zeros Generalized Poisson with many zeros Lots of zeros or excess zeros? Simulate negative binomial data Checking for excess zeros An example with excess zeros In a recent lecture I gave a basic overview of zero-inflation in count distributions. My main take-home message to the students that I thought worth posting about here is that having a lot of zero values does not necessarily mean you have zero inflation.How to plot fitted lines with ggplot2
https://aosmith.rbind.io/2018/11/16/plot-fitted-lines/
Fri, 16 Nov 2018 00:00:00 +0000https://aosmith.rbind.io/2018/11/16/plot-fitted-lines/Load packages and dataset Plotting separate slopes with geom_smooth() Extracting predicted values with predict() Plotting predicted values with geom_line() Add confidence intervals for lm objects Using a new dataset with predict() Plotting fitted lines from an lme object Confidence intervals for lme objects What if there is no predict() function? Most analyses aren’t really done until we’ve found a way to visualize the results graphically, and I’ve recently been getting some questions from students on how to plot fitted lines from models.Analysis essentials: An example directory structure for an analysis using R
https://aosmith.rbind.io/2018/10/29/an-example-directory-structure/
Mon, 29 Oct 2018 00:00:00 +0000https://aosmith.rbind.io/2018/10/29/an-example-directory-structure/There are a lot of practical skills involved in doing an analysis that are essential but that I rarely (never?) see included in the curriculum, statistics or otherwise. These are skills like how to organize your data, how to approach QAQC, and how to set up a naming algorithm for files. We all need to do these things, but too often we end up learning these skills by muddling through on our own.The log-0 problem: analysis strategies and options for choosing c in log(y + c)
https://aosmith.rbind.io/2018/09/19/the-log-0-problem/
Wed, 19 Sep 2018 00:00:00 +0000https://aosmith.rbind.io/2018/09/19/the-log-0-problem/I periodically find myself having long conversations with consultees about 0’s. Why? Well, the basic suite of statistical tools many of us learn first involves the normal distribution (for the errors). The log transformation tends to feature prominently for working with right-skewed data. Since log(0) returns -Infinity, a common first reaction is to use log(y + c) as the response in place of log(y), where c is some constant added to the y variable to get rid of the 0 values.Getting started simulating data in R: some helpful functions and how to use them
https://aosmith.rbind.io/2018/08/29/getting-started-simulating-data/
Wed, 29 Aug 2018 00:00:00 +0000https://aosmith.rbind.io/2018/08/29/getting-started-simulating-data/I’ve been trying to participate a little more in the R community outside of my narrow professional world, so when the co-organizer of the Eugene R Users Group invited me to come talk at one of their meet-ups I agreed (even though it involved public speaking! 😱).
I started out thinking I’d talk about doing simulations. But could I do that in 45 minutes? Maybe not. After much pondering I ended up settling on the topic of how we start a simulation: by making data in R.Automating exploratory plots with ggplot2 and purrr
https://aosmith.rbind.io/2018/08/20/automating-exploratory-plots/
Mon, 20 Aug 2018 00:00:00 +0000https://aosmith.rbind.io/2018/08/20/automating-exploratory-plots/Load R packages The set-up Create a plotting function Looping through one vector of variables Looping through both vectors Saving the plots Saving all plots to one PDF Saving groups of plots together Saving all plots separately Combining plots When you have a lot of variables and need to make a lot exploratory plots it’s usually worthwhile to automate the process in R instead of manually copying and pasting code for every plot.Creating legends when aesthetics are constants in ggplot2
https://aosmith.rbind.io/2018/07/19/legends-constants-for-aesthetics-in-ggplot2/
Thu, 19 Jul 2018 00:00:00 +0000https://aosmith.rbind.io/2018/07/19/legends-constants-for-aesthetics-in-ggplot2/In general, if you want to map an aesthetic to a variable and get a legend in ggplot2 you do it inside aes(). If you want to set an aesthetic to a constant value, like making all the points purple, you do it outside aes().
However, there are situations where you might want to set an aesthetic for a layer to a constant but you also want a legend for that aesthetic.Simulate! Simulate! - Part 3: The Poisson edition
https://aosmith.rbind.io/2018/07/18/simulate-poisson-edition/
Wed, 18 Jul 2018 00:00:00 +0000https://aosmith.rbind.io/2018/07/18/simulate-poisson-edition/One of the things I like about simulations is that, with practice, they can be a quick way to check your intuition about a model or relationship.
My most recent example is based on a discussion with a student about quadratic effects.
I’ve never had a great grasp on what the coefficients that define a quadratic relationship mean. Luckily there is this very nice FAQ page from the Institute for Digital Research and Education at UCLA that goes over the meaning of the coefficients in detail, with examples.Time after time: calculating the autocorrelation function for uneven or grouped time series
https://aosmith.rbind.io/2018/06/27/uneven-group-autocorrelation/
Wed, 27 Jun 2018 00:00:00 +0000https://aosmith.rbind.io/2018/06/27/uneven-group-autocorrelation/Simulate data with autocorrelation Fit model and extract residuals Problems with naively using acf() Calculate the maximum lag Order the dataset by time Pad the dataset with NA Plot autocorrelation function of appropriately-spaced residuals Add confidence interval to the ACF plot I first learned how to check for autocorrelation via autocorrelation function (ACF) plots in R in a class on time series However, the examples we worked on were all single, long term time series with no missing values and no groups.A closer look at replicate() and purrr::map() for simulations
https://aosmith.rbind.io/2018/06/05/a-closer-look-at-replicate-and-purrr/
Tue, 05 Jun 2018 00:00:00 +0000https://aosmith.rbind.io/2018/06/05/a-closer-look-at-replicate-and-purrr/I’ve done a couple of posts so far on simulations, here and here, where I demonstrate how to build a function for simulating data from a defined linear model and then explore long-run behavior of models fit to the simulated datasets. The focus of those posts was on the general simulation process, and I didn’t go into much detail on the specific R code. In this post I’ll focus in on the code I use for repeatedly simulating data and extracting output, specifically talking about the function replicate() and the map family of functions from package purrr.Simulate! Simulate! - Part 2: A linear mixed model
https://aosmith.rbind.io/2018/04/23/simulate-simulate-part-2/
Mon, 23 Apr 2018 00:00:00 +0000https://aosmith.rbind.io/2018/04/23/simulate-simulate-part-2/I feel like I learn something every time start simulating new data to update an assignment or exploring a question from a client via simulation. I’ve seen instances where residual autocorrelation isn’t detectable when I know it exists (because I simulated it) or I have skewed residuals and/or unequal variances when I simulated residuals from a normal distribution with a single variance. Such results are often due to small sample sizes, which even in this era of big data still isn’t so unusual in ecology.Unstandardizing coefficients from a GLMM
https://aosmith.rbind.io/2018/03/26/unstandardizing-coefficients/
Mon, 26 Mar 2018 00:00:00 +0000https://aosmith.rbind.io/2018/03/26/unstandardizing-coefficients/Winter term grades are in and I can once again scrape together some time to write blog posts! 🎉
The last post I did about making added variable plots led me to think about other “get model results” topics, such as the one I’m talking about today: unstandardizing coefficients.
I find this comes up particularly for generalized linear mixed models (GLMM), where models don’t always converge if explanatory variables are left unstandardized.Making many added variable plots with purrr and ggplot2
https://aosmith.rbind.io/2018/01/31/added-variable-plots/
Wed, 31 Jan 2018 00:00:00 +0000https://aosmith.rbind.io/2018/01/31/added-variable-plots/Last week two of my consulting meetings ended up on the same topic: making added variable plots.
In both cases, the student had a linear model of some flavor that had several continuous explanatory variables. They wanted to plot the estimated relationship between each variable in the model and the response. This could easily lead to a lot of copying and pasting of code, since they want to do the same thing for every explanatory variable in the model.Reversing the order of a ggplot2 legend
https://aosmith.rbind.io/2018/01/19/reversing-the-order-of-a-ggplot2-legend/
Fri, 19 Jan 2018 00:00:00 +0000https://aosmith.rbind.io/2018/01/19/reversing-the-order-of-a-ggplot2-legend/It’s always nice to get good questions in a workshop. It can help everybody, including the instructor, get a bit of extra learnin’ in.
Every spring I give a ggplot2 workshop for graduate students in my college. The first half is focused on the terminology and understanding the basics of how to put a plot together (I remember as a beginner feeling like I was throwing darts at things to see what stuck when deciding if something should go inside or outside aes() 🎯 ).Simulate! Simulate! - Part 1: A linear model
https://aosmith.rbind.io/2018/01/09/simulate-simulate-part1/
Tue, 09 Jan 2018 00:00:00 +0000https://aosmith.rbind.io/2018/01/09/simulate-simulate-part1/Confession: I love simulations.
In simulations you get to define everything about a model and then see how that model behaves over the long run. It’s like getting the luxury of taking many samples instead of only the one real one we have resources for in an actual study.
I find simulations incredibly useful in understanding statistical theory and assumptions of linear models. When someone tells me with great certainty “I don’t need to meet that assumption because [fill in the blank]” or asks “Does it matter that [something complicated]?Fiesta 2017 gas mileage
https://aosmith.rbind.io/2018/01/03/fiesta-2017-gas-mileage/
Wed, 03 Jan 2018 00:00:00 +0000https://aosmith.rbind.io/2018/01/03/fiesta-2017-gas-mileage/It’s January - time to see how we did on car gas mileage last year!
We purchased a used 2011 Ford Fiesta in 2016. It replaced a 1991 Honda CRX, which was running like a champ but was starting to look a little, well, limited in its safety features. 👅
I drive ~70 commuting miles each day, and we wanted a car we could afford that was competitive with the CRX on gas mileage.Combining many datasets in R
https://aosmith.rbind.io/2017/12/31/many-datasets/
Sun, 31 Dec 2017 00:00:00 +0000https://aosmith.rbind.io/2017/12/31/many-datasets/At least once a year I meet with a graduate student who has many separate datasets that need to be combined into a single file. The data are usually from a series of data loggers (e.g., iButtons or RFID readers) that record data remotely over a specified time period. The researcher periodically downloads the data from each data logger and then redeploys it for further data collection.
I’m going to set up the background for my particular use case before jumping into the R code to perform this sort of task.Using DHARMa for residual checks of unsupported models
https://aosmith.rbind.io/2017/12/21/using-dharma-for-residual-checks-of-unsupported-models/
Thu, 21 Dec 2017 00:00:00 +0000https://aosmith.rbind.io/2017/12/21/using-dharma-for-residual-checks-of-unsupported-models/Why use simulations for model checking? One of the difficult things about working with generalized linear models (GLM) and generalized linear mixed models (GLMM) is figuring out how to interpret residual plots. We don’t really expect residual plots from a GLMM to look like one from a linear model, sure, but how do we tell when something looks “bad”?
This is the situation I was in several years ago, working on an analysis involving counts from a fairly complicated study design.