Occupancy modelling, GLMs, nls, SEM
Occupancy modelling
Rylee talked about a bullfrog occupancy model and concerns about the implied shape of the relationship between a predictor (distance) and occupancy probability.
We talked about the logit transformation that these kinds of models (like logistic regression on its own) are fit with and what this implies about the shape of the response-predictor relationship.
The logit transformation: http://en.wikipedia.org/wiki/Logit http://en.wikipedia.org/wiki/Logistic_regression
Basically, the logit link transforms data ranging from 0 to 1 onto the range of -infinity to infinity. This means that when the model output is transformed back to the probability scale (0 to 1), the relationship follows a logistic curve that gradually tails off at both ends. Plotting the predictor-response relationship while holding the other predictors constant should illustrate this.
GLMs etc.
There was some discussion about how overwhelming it can seem with all the possible data transformations (or link functions) and error distributions.
We talked about how, in reality, the vast majority if problems we encounter fall into one of a few categories:
-
An identity link and normal error distribution (i.e. linear regression) for continuous data that can be positive or negative. Usually this results from things adding together – think a + b = y not a * b = y. We can often transform response data to make a linear regression appropriate. By far the most common is to log-transform the response, which means we model a + b = log(y) or equivalently exp(a) * exp(b) = y. We’ve gone from an additive relationship to a multiplicative one. Many relationships in ecology are like this. Response data that must be greater than zero is one hint.
-
A log link and Gamma error distribution for continuous data that must be positive (this can be very similar to a linear regression with log-transformed response data and often either is appropriate)
-
A logit link with a binomial error distribution for modelling 0s and 1s (logistic regression)
-
A log link and a Poisson distribution for count data
-
A log link and a negative binomial distribution for count data that has more variability than the Poisson distribution allows (i.e. most ecological datasets)
Another semi-common one is the beta error distribution for continuous values ranging between 0 and 1 (say proportions), which is often modelled with a logit link.
Goodness of fit for nls() models
There isn’t an exact R^2 equivalent for many nonlinear models. R^2 compares the variance explained by a model to the variance explained by a null constant model (an intercept only). It’s not obvious with many nonlinear models what that null model should be since a constant model isn’t necessarily possible.
Best practice is probably to focus on model comparison, say with AIC, to evaluate relative model fit, and graphically present model predictions overlaid on the data to demonstrate goodness of fit.
There are some pseudo-R^2 measures that people sometimes use like looking at the correlation of predicted vs. observed points. Or even just plotting predicted vs. observed points. Even better, do some sort of cross-validation where you fit your model to some of the data and predict on the rest and then repeat. R has the boot::boot() function to help with that and the boot::cv.glm()
function specifically for GLM models, but the examples can be a bit hard to work through.
This blog post summarizes and links to many of the important points about goodness of fit for nls models: http://fishr.wordpress.com/2013/08/09/r-squared-for-a-vbgm/
Structural equation modelling
Joel talked about an analysis he’s starting on salmon, stream length, and predation pressure.
Beanplots are awesome: http://cran.r-project.org/web/packages/beanplot/index.html
We talked about how he could potentially break it up into separate multilevel models.
We also talked about structural equation modelling (SEM) (which I know very little about!). The lavaan package is the main structural equation modelling R package, as far as I know: http://lavaan.ugent.be/
But, they haven’t yet implemented multilevel structural equation modelling. If anyone needs some SEM pointers, I can connect you with Jarrett Byrnes (http://byrneslab.net) who has done some pretty extensive workshops on them.
Separately, this paper looks like it might be useful for these kinds of problems: Shipley, B. 2009. Confirmatory path analysis in a generalized multilevel context. Ecology. http://www.esajournals.org/doi/full/10.1890/08-1034.1