Multivariate methods

Here are a few more details about what we talked about yesterday at stats beer re. your data and their potential for multivariate analysis:

Ordinations (PCA, MDS,CA): they are a nice way to explore the multidimensional nature of your data, get a feel for relationships and correlations, etc. There are several functions in R to do them, and they are usually easy to use. Beware of the scaling: scaling 1 approximates the distance among your objects (the closer your objects, ie. fish, are in your ordination diagram, the more similar they are in terms of whatever variables you used to describe them), while scaling 2 approximates the correlations among your variables (the smaller the angles on the ordination diagram between two vectors, the more correlated they are). As we said, PCA can be a good way to summarize a high number of variables in a big dataset to a smaller amount of meaningful axis, e.g. for light variables, or water chemistry, or discharge etc.

Canonical ordinations (redundancy analysis, canonical correspondance analysis): they are an extension of ordinations that include regressions to constrain the variations in your set of dependent variables (by convention named Y, as compared to y in a unidimensional context) by a series of dependent variables X. They are a step further than simple ordinations (which are not statistical tests, but simply “graphs” in a multidimensional space), since they formally test how much of the variation in Y is explain by the Xs. So you get an ordination diagram with two sets of vectors this time (the Ys and the Xs), as well as a series of coefficients and parameters like in simple regressions that tell you which of your Xs explain most of the variation in the Ys, as well as an overall R2 and probability. It can be a powerful way to understand and describe a complex multivariate dataset, but it also can suffer from the same thing, i.e. being too complex to make sense ;-) Also, it assumes linearity among your Ys and Xs (you can always transform variables to achieve that, but that also is complex with a multivariate dataset), and does not deal super well with qualitative variables. In your case though, you could probably use RDA to do your MANOVAs, which might be something worth exploring.

Multivariate regression trees: they are similar in a way to canonical ordinations in that they use a set of Ys and Xs, but they are part of the clustering techniques rather than a formal hypothesis testing method. That can be nice sometimes when the data are too complex or messy to yield well to formal testing. You can use the trees to explore and sort through which ones of your independent variables are driving most of the differences in your Ys among objects, which can be useful as a first step leading to more formal testing (or not). The tree splits the objects based on your Xs, giving you a threshold value in each X variable used for the split, until a certain number of branches where the deviance explained is not big enough anymore (which you can set yourself based on certain criteria). And yes Josh, you can prune them ;-) If your dependent variables are binary (i.e. Yes/No, presence/absence), you can use classification trees. They are nice I find because they give a nice graphical display, they don’t assume linearity among variables, and they deal well with quantitative and qualitative variables. My bet is you could use them to sort through your various independent variables and factors and help you justify why you’d merge or not, or drop certain variables from further testing.

My bible for ordinations, RDAs, multivariate trees, and most other multivariate analysis is (some of you may guess ;) the Legendre book: http://adn.biol.umontreal.ca/~numericalecology/. The Numerical Ecology book has all the theoretical foundations and explanations for those techniques – libraries usually have it, or you can order online. The second one on the page at the link above is a companion book which gives all the R codes to do those techniques :-) Very handy. Pierre Legendre also has several R codes and functions available on his website (http://adn.biol.umontreal.ca/~numericalecology/Rcode/), including the ones I use for RDAs – most of his functions run with permutations (i.e. bootstraps) which is pretty neat since it relaxes the need to normalize your data (which is also a pain in a multivariate context).