Very statisticious
https://aosmith.rbind.io/
Recent content on Very statisticiousHugo -- gohugo.ioWed, 06 Mar 2019 00:00:00 +0000Lots of zeros or too many zeros?: Thinking about zero inflation in count data
https://aosmith.rbind.io/2019/03/06/lots-of-zeros/
Wed, 06 Mar 2019 00:00:00 +0000https://aosmith.rbind.io/2019/03/06/lots-of-zeros/In a recent lecture I gave a basic overview of zero-inflation in count distributions. My main take-home message to the students that I thought worth posting about here is that having a lot of zero values does not necessarily mean you have zero inflation.
Zero inflation is when there are more 0 values in the data than the distribution allows for. But some distributions can have a lot of zeros!How to plot fitted lines with ggplot2
https://aosmith.rbind.io/2018/11/16/plot-fitted-lines/
Fri, 16 Nov 2018 00:00:00 +0000https://aosmith.rbind.io/2018/11/16/plot-fitted-lines/Most analyses aren’t really done until we’ve found a way to visualize the results graphically, and I’ve recently been getting some questions from students on how to plot fitted lines from models. There are some R packages that are made specifically for this purpose; see packages effects and visreg, for example.
If using the ggplot2 package for plotting, fitted lines from simple models can be graphed using geom_smooth(). However, once models get more complicated that convenient function is no longer useful.Analysis essentials: An example directory structure for an analysis using R
https://aosmith.rbind.io/2018/10/29/an-example-directory-structure/
Mon, 29 Oct 2018 00:00:00 +0000https://aosmith.rbind.io/2018/10/29/an-example-directory-structure/There are a lot of practical skills involved in doing an analysis that are essential but that I rarely (never?) see included in the curriculum, statistics or otherwise. These are skills like how to organize your data, how to approach QAQC, and how to set up a naming algorithm for files. We all need to do these things, but too often we end up learning these skills by muddling through on our own.The log-0 problem: analysis strategies and options for choosing c in log(y + c)
https://aosmith.rbind.io/2018/09/19/the-log-0-problem/
Wed, 19 Sep 2018 00:00:00 +0000https://aosmith.rbind.io/2018/09/19/the-log-0-problem/I periodically find myself having long conversations with consultees about 0’s. Why? Well, the basic suite of statistical tools many of us learn first involves the normal distribution (for the errors). The log transformation tends to feature prominently for working with right-skewed data. Since log(0) returns -Infinity, a common first reaction is to use log(y + c) as the response in place of log(y), where c is some constant added to the y variable to get rid of the 0 values.Getting started simulating data in R: some helpful functions and how to use them
https://aosmith.rbind.io/2018/08/29/getting-started-simulating-data/
Wed, 29 Aug 2018 00:00:00 +0000https://aosmith.rbind.io/2018/08/29/getting-started-simulating-data/I’ve been trying to participate a little more in the R community outside of my narrow professional world, so when the co-organizer of the Eugene R Users Group invited me to come talk at one of their meet-ups I agreed (even though it involved public speaking! 😱).
I started out thinking I’d talk about doing simulations. But could I do that in 45 minutes? Maybe not. After much pondering I ended up settling on the topic of how we start a simulation: by making data in R.Automating exploratory plots with ggplot2 and purrr
https://aosmith.rbind.io/2018/08/20/automating-exploratory-plots/
Mon, 20 Aug 2018 00:00:00 +0000https://aosmith.rbind.io/2018/08/20/automating-exploratory-plots/When you have a lot of variables and need to make a lot exploratory plots it’s usually worthwhile to automate the process in R instead of manually copying and pasting code for every plot. However, the coding approach needed to automate plots can look pretty daunting to a beginner R user. It can look so daunting, in fact, that it can appear easier to manually make the plots (like in Excel) rather than using R at all.Creating legends when aesthetics are constants in ggplot2
https://aosmith.rbind.io/2018/07/19/legends-constants-for-aesthetics-in-ggplot2/
Thu, 19 Jul 2018 00:00:00 +0000https://aosmith.rbind.io/2018/07/19/legends-constants-for-aesthetics-in-ggplot2/In general, if you want to map an aesthetic to a variable and get a legend in ggplot2 you do it inside aes(). If you want to set an aesthetic to a constant value, like making all the points purple, you do it outside aes().
However, there are situations where you might want to set an aesthetic for a layer to a constant but you also want a legend for that aesthetic.Simulate! Simulate! - Part 3: The Poisson edition
https://aosmith.rbind.io/2018/07/18/simulate-poisson-edition/
Wed, 18 Jul 2018 00:00:00 +0000https://aosmith.rbind.io/2018/07/18/simulate-poisson-edition/One of the things I like about simulations is that, with practice, they can be a quick way to check your intuition about a model or relationship.
My most recent example is based on a discussion with a student about quadratic effects.
I’ve never had a great grasp on what the coefficients that define a quadratic relationship mean. Luckily there is this very nice FAQ page from the Institute for Digital Research and Education at UCLA that goes over the meaning of the coefficients in detail, with examples.Time after time: calculating the autocorrelation function for uneven or grouped time series
https://aosmith.rbind.io/2018/06/27/uneven-group-autocorrelation/
Wed, 27 Jun 2018 00:00:00 +0000https://aosmith.rbind.io/2018/06/27/uneven-group-autocorrelation/I first learned how to check for autocorrelation via autocorrelation function (ACF) plots in R in a class on time series However, the examples we worked on were all single, long term time series with no missing values and no groups. I figured out later that calculating the ACF when the sampling through time is uneven or there are distinct time series for independent sample units takes a bit more thought.A closer look at replicate() and purrr::map() for simulations
https://aosmith.rbind.io/2018/06/05/a-closer-look-at-replicate-and-purrr/
Tue, 05 Jun 2018 00:00:00 +0000https://aosmith.rbind.io/2018/06/05/a-closer-look-at-replicate-and-purrr/I’ve done a couple of posts so far on simulations, here and here, where I demonstrate how to build a function for simulating data from a defined linear model and then explore long-run behavior of models fit to the simulated datasets. The focus of those posts was on the general simulation process, and I didn’t go into much detail on the specific R code. In this post I’ll focus in on the code I use for repeatedly simulating data and extracting output, specifically talking about the function replicate() and the map family of functions from package purrr.Simulate! Simulate! - Part 2: A linear mixed model
https://aosmith.rbind.io/2018/04/23/simulate-simulate-part-2/
Mon, 23 Apr 2018 00:00:00 +0000https://aosmith.rbind.io/2018/04/23/simulate-simulate-part-2/I feel like I learn something every time start simulating new data to update an assignment or exploring a question from a client via simulation. I’ve seen instances where residual autocorrelation isn’t detectable when I know it exists (because I simulated it) or I have skewed residuals and/or unequal variances when I simulated residuals from a normal distribution with a single variance. Such results are often due to small sample sizes, which even in this era of big data still isn’t so unusual in ecology.Unstandardizing coefficients from a GLMM
https://aosmith.rbind.io/2018/03/26/unstandardizing-coefficients/
Mon, 26 Mar 2018 00:00:00 +0000https://aosmith.rbind.io/2018/03/26/unstandardizing-coefficients/Winter term grades are in and I can once again scrape together some time to write blog posts! 🎉
The last post I did about making added variable plots led me to think about other “get model results” topics, such as the one I’m talking about today: unstandardizing coefficients.
I find this comes up particularly for generalized linear mixed models (GLMM), where models don’t always converge if explanatory variables are left unstandardized.Making many added variable plots with purrr and ggplot2
https://aosmith.rbind.io/2018/01/31/added-variable-plots/
Wed, 31 Jan 2018 00:00:00 +0000https://aosmith.rbind.io/2018/01/31/added-variable-plots/Last week two of my consulting meetings ended up on the same topic: making added variable plots.
In both cases, the student had a linear model of some flavor that had several continuous explanatory variables. They wanted to plot the estimated relationship between each variable in the model and the response. This could easily lead to a lot of copying and pasting of code, since they want to do the same thing for every explanatory variable in the model.Reversing the order of a ggplot2 legend
https://aosmith.rbind.io/2018/01/19/reversing-the-order-of-a-ggplot2-legend/
Fri, 19 Jan 2018 00:00:00 +0000https://aosmith.rbind.io/2018/01/19/reversing-the-order-of-a-ggplot2-legend/It’s always nice to get good questions in a workshop. It can help everybody, including the instructor, get a bit of extra learnin’ in.
Every spring I give a ggplot2 workshop for graduate students in my college. The first half is focused on the terminology and understanding the basics of how to put a plot together (I remember as a beginner feeling like I was throwing darts at things to see what stuck when deciding if something should go inside or outside aes() 🎯 ).Simulate! Simulate! - Part 1: A linear model
https://aosmith.rbind.io/2018/01/09/simulate-simulate-part1/
Tue, 09 Jan 2018 00:00:00 +0000https://aosmith.rbind.io/2018/01/09/simulate-simulate-part1/Confession: I love simulations.
In simulations you get to define everything about a model and then see how that model behaves over the long run. It’s like getting the luxury of taking many samples instead of only the one real one we have resources for in an actual study.
I find simulations incredibly useful in understanding statistical theory and assumptions of linear models. When someone tells me with great certainty “I don’t need to meet that assumption because [fill in the blank]” or asks “Does it matter that [something complicated]?Fiesta 2017 gas mileage
https://aosmith.rbind.io/2018/01/03/fiesta-2017-gas-mileage/
Wed, 03 Jan 2018 00:00:00 +0000https://aosmith.rbind.io/2018/01/03/fiesta-2017-gas-mileage/It’s January - time to see how we did on car gas mileage last year!
We purchased a used 2011 Ford Fiesta in 2016. It replaced a 1991 Honda CRX, which was running like a champ but was starting to look a little, well, limited in its safety features. 👅
I drive ~70 commuting miles each day, and we wanted a car we could afford that was competitive with the CRX on gas mileage.Combining many datasets in R
https://aosmith.rbind.io/2017/12/31/many-datasets/
Sun, 31 Dec 2017 00:00:00 +0000https://aosmith.rbind.io/2017/12/31/many-datasets/At least once a year I meet with a graduate student who has many separate datasets that need to be combined into a single file. The data are usually from a series of data loggers (e.g., iButtons or RFID readers) that record data remotely over a specified time period. The researcher periodically downloads the data from each data logger and then redeploys it for further data collection.
I’m going to set up the background for my particular use case before jumping into the R code to perform this sort of task.Using DHARMa for residual checks of unsupported models
https://aosmith.rbind.io/2017/12/21/using-dharma-for-residual-checks-of-unsupported-models/
Thu, 21 Dec 2017 00:00:00 +0000https://aosmith.rbind.io/2017/12/21/using-dharma-for-residual-checks-of-unsupported-models/Why use simulations for model checking? One of the difficult things about working with generalized linear models (GLM) and generalized linear mixed models (GLMM) is figuring out how to interpret residual plots. We don’t really expect residual plots from a GLMM to look like one from a linear model, sure, but how do we tell when something looks “bad”?
This is the situation I was in several years ago, working on an analysis involving counts from a fairly complicated study design.