Practical 4-3: Add a package demo

Overview

In this session you will take the biodiversity data package you have created and write a demo that calculates a variety of diversity measures on the data.

Background

Now you have a package with your data in it, we can easily run analyses, simply starting each analysis script with the R command library(githubusernameBCI).

There are an enormous number of different ways of measuring diversity, to the extent that diversity itself has been declared a “non-concept” in the past (by Hurlbert, 1971), but nowadays they fit broadly within Whittaker’s framework of three basic types of diversity: alpha, beta and gamma diversity.

All of these types of diversity can be measured in several different ways. The standard approach is often called “species diversity”, where we treat all species as distinct and all of our measures of alpha and gamma diversity are counting in terms of effective numbers of species. However, we often need to consider similarity between species, and this results in a wide variety of measures, from closely related ones such as counts of “distinct species” (species that are taxonomically unrelated) or “distinct evolutionary histories” (species that have no common ancestor in a phylogeny), to ones that are more peripherally connected such as “distinct functional groups” (species that are functionally distinct). Leinster and Cobbold (2012) have developed a framework for working with all of these types of similarity consistently. Because beta diversity measures are comparing subcommunities, they are generally unitless (measuring fractions or percentages) or are measured in terms of numbers of subcommunities; how similar two subcommunities are seen to be still entirely depends on the type of similarity though, resulting in, for instance a count of “the effective number of phylogenetically / functionally / taxonomically distinct subcommunities”. The relationship between alpha, beta and gamma has been the source of many arguments – see, for example, Whittaker (1972), Jost (2007) and Reeve et al. (2016).

Finally, I used the phrase “effective number” several times above. At one extreme, when we disregard abundance and just look at presence-absence data, we measure “species richness” by simply counting the number of species present; at the opposite extreme, we have measures such as Berger-Parker which only look at dominance of the most abundant species. In between, we find measures such as Simpson and Shannon, where we weight our counting of species by their abundance. For instance, Shannon entropy is a measure of the “surprise” or uncertainty expected from the next observation, which is low if there are few species or one is dominant, but high when many are equally abundant. In all cases, the highest diversity is achieved when all species are equally abundant (and completely distinct, when we are considering similarity). We therefore talk about the “effective number of species” as:

the number of completely distinct, equally abundant species that would give us the same value of diversity as the observed community.

This acts as a reference to allow fair comparisons to be made. In the case of species richness, this is just the richness itself, but for instance in the case of Shannon entropy, the effective numbers equivalent is in fact the exponential of the entropy. For diversity measures that work in terms of these effective numbers, the extent to which we weight by abundance is determined by a parameter \(q\), which varies from 0, where relative abundance is ignored and only presence-absence is considered, to \(\infty\), where we only consider the most abundant species. This standard for dealing with diversities as effective numbers, while far from universal, has now been pursued for over 40 years and is broadly accepted – originally by Hill (1973), and subsequently, for instance, by Jost (2007), Leinster and Cobbold (2012), and Reeve et al. (2016).

Tasks

Setting up R

First, remember as you add in library(xxx) calls into your demo, you’ll need to add them in as dependencies to the package using usethis::use_package("xxx"), otherwise other people may not be able to run the demo. To test the demo as you go along you may also need to install the packages in advance using install.packages("xxx").

Create the demo files

As in the last practical series, we’re going to create a demo inside the package. In the Files tab, you may need to create a new demo folder. Then create an R script in that folder to contain your demo, call it diversity-of-BCI.R. Note in general you just have to make sure the name of the script has no spaces in it. Then (again, as before) create a new text file also in that folder called 00Index, with no file extension to contain the list of the demos in your package. As you should know by now, each demo you create is entered into one line of this file with the demo name followed by a one-line description. For example, here, the demo file diversity-of-BCI.R would have a line like:

diversity-of-BCI    Showing the diversity of the BCI forest plot.

For more details on how to create a demo go here.

Create the demo itself

Now you’re going to write the R script that will be the demo. The script will be measuring several kinds of alpha, beta and gamma diversity, as detailed below. But in order to do some work with measures of taxonomic diversity, the taxonomy will first need to match the species abundance data from the BCI dataset, exactly. Here I’m assuming that you created the package githubusernameBCI, and in it bci_taxa contains the taxonomy of all BCI species and bci_2010 contains the 2010 BCI census.

library(githubusernameBCI)
# Make a copy of the taxonomy as a data frame
taxonomy <- as.data.frame(bci_taxa)
# Make sure the species are ordered correctly
rownames(taxonomy) <- taxonomy$GenusSpecies
taxonomy <- taxonomy[rownames(bci_2010),]

To create a metacommunity to analyse in the rdiversity package, you will need to create an object of that type using metacommunity(), and then you can calculate, for example, the gamma diversity of the whole metacommunity with the function meta_gamma() and plot the output with plot():

library(rdiversity)
meta <- metacommunity(bci_2010)
results <- meta_gamma(meta, qs = seq(from = 0, to = 5)) 
plot(diversity ~ q, type = "l", data = results)

To create a metacommunity that takes account of the similarity between species, you need to input the similarity as a second argument to metacommunity(). First use tax2dist() to create a distance matrix from a taxonomy, then use dist2sim() to create a similarity matrix from the distance matrix:

library(magrittr)
taxSim <- taxonomy %>%
  tax2dist(c(GenusSpecies=0, Genus=1, Family=2, Other=3)) %>%
  dist2sim("linear") 
metatax <- metacommunity(bci_2010, taxSim)
results4tax <- meta_gamma(metatax, qs = seq(from = 0, to = 5)) 
lines(diversity ~ q, col=2, data = results4tax)

Finally, to create a metacommunity to analyse in vegan, you need to transpose the axes of the abundance data matrix:

library(vegan)
# Swap rows and columns 
abundances <- t(bci_2010)
# Calculate species richness (gamma diversity)
specnumber(abundances, groups = 1) 

This loads the dataset and then calculates the species richness of the whole metacommunity. It’s important to always use the abundances object (created above) when using the vegan package, and always use the bci_2010 object when using rdiversity – they need the data in opposite formats!

Note the documentation and reference manual for the vegan package is available at https://cran.r-project.org/web/packages/vegan/index.html and the package itself is developed on GitHub at https://github.com/vegandevs/vegan.

Your own additions – this section is optional

Because we lost two days from the course, the material here wasn’t fully covered, so is optional if you have enough time to look further at the documentation for the rdiversity and vegan packages.

First, read the rdiversity package documentation and examples. You can find this by going to the GitHub repository (https://github.com/boydorr/rdiversity), and clicking on the docs: rdiversity badge, or by clicking the link to https://boydorr.github.io/rdiversity.

Then add to the script you have started above to calculate and plot the gamma diversity and the normalised alpha diversity for the 2010 BCI census, for a range of values of \(q\), say qs = 0:5; individuals of the same species have a taxonomic distance of zero and each species is taxonomically distinct. Then repeat these calculations using taxonomic similarity, showing that there are fewer taxonomically distinct species than there are species. Then do this again counting the effective number of genera and families. Note that this can be achieved by using tax2dist() with c(GenusSpecies=0, Genus=0, Family=1, Other=1) and c(GenusSpecies=0, Genus=0, Family=0, Other=1), respectively – these distance measurements say that two species in the same genus (or family) are zero distance apart, just like two individuals in the same species. As a result, the diversity measures following these instructions will simply count in effective numbers of genera or families, respectively, rather than species.

Moving on to beta diversity, we now want to find out which quadrats in the BCI study site are the most distinct. We can measure the average distinctiveness of all of the quadrats in the plot using raw_meta_beta(), but you can also ask for the distinctiveness of every quadrat individually using raw_sub_beta(). Calculate that for a single value of \(q\), say qs=1, and show a histogram of the distinctivenesses (the diversity column) of the subcommunities. Sort the data frame by distinctiveness and output just the highest values. Bearing in mind that the first two digits of the quadrat name are an x-coordinate, and the second two are a y-coordinate, is there any pattern about these distinct areas? If so write about it in the report.

A similar option in vegan is the function betadiver(). Look at the documentation in R for this function – you will see that it can calculate a very large range of different beta diversity measures – and calculate the beta diversities for the subcommunities using the "j" Jaccard and "sor" Sorensen similarity measures. This function calculates pairwise similarities between the quadrats as dist objects. You can convert these to a matrix and then calculate the average similarities to all of the quadrats by using as.matrix() and then rowMeans(). You should be able to show that some of the identified quadrats (with the lowest average similarity to the other quadrats) are the same, even though the exact beta diversity measure, and indeed even its direction, is different.

Remember to add some text chunks into the report to describe the results for when you compile a report rather than run the demo. Now reinstall the package and make sure the demo runs successfully too. Don’t forget to delete any html files that have been generated and don’t commit them to the repo.

GitHub

Now push all of your changes (your commits) to GitHub.

A note on generating a report from a demo

If you’ve installed a package directly using devtools::install_github() you can still generate a report from the demo as follows without cloning the repo:

rmarkdown::render(system.file("demo", "diversity-of-BCI.R", 
                              package = "mypartnersBCIpackage"),
                  output_dir = ".")

The report will be saved to the base folder of your current project (or whatever your current working directory is). Check that it looks okay by opening it in a web browser (and delete the html file afterwards).