Project 1: Species richness and Simpson index

Overview

In this series of practicals you will develop the code to calculate a few diversity measures. In this first exercise we will calculate the species richness and Simpson index of a population.

Background

R provides us with many built-in datasets, but sadly none of them are counts of species abundances, so you will use the famous Barro Colorado Island dataset, a 30+ year study of a 50-hectare forest plot in Panama. We will use the number of trees in the study site in 2010, which you have access to from the package you developed in Practical Series 4.

You will also construct a couple of fake datasets that you will use for comparison. Calculating the diversity of these populations (remembering that there are many, in fact a continuum of, diversity measures) will tell us how many observed species there are – the species richness – and give us measures of how evenly distributed the trees are amongst the tree species (the evenness of the species distribution) by various measures. Species richness is a very commonly used measure of diversity, which simply counts the number of non-zero species abundances; Simpson index is also commonly used: it simply calculates the species proportions from the counts (so the total sums to 1), squares them all individually, and add them all together – it tells us (amongst other things) how likely two randomly chosen individuals are to be from the same species.

Tasks

This project will be written in the githubusernameDiversity package that you set up in the last exercise, with functions being saved in the R folder and documented so that they work with the standard R help system, and scripts being saved as demos in the demo folder so that they can be run using the demo() function or compiled into reports to read the results with suitable reporting explaining what is happening. In each project exercise you will have to write some functions and (most likely) one script. So now create a demo script for this project and add an entry for it in the 00Index file in the same folder.

The study populations

We are taking the data from the 2010 census – the true abundances for the whole site can be calculated as follows:

library(githubusernameBCI) 
tree.pop <- rowSums(bci_2010)

Note that if you don’t want to use your own BCI package, you can use ours instead. It’s available as soniamitchellBCI in the SBOHVM organisation on GitHub just like yours. Either way you should install the package you are using by running devtools::install_github(), and then add it to your githubusernameDiversity package dependencies by running usethis::use_dev_package() appropriately.

The total counts of trees of different species in this forest plot are now available in the variable tree.pop. We can then do some simple extraction of information from the full species counts:

species.names <- names(tree.pop) 
num.sp <- length(tree.pop) 
num.trees <- sum(tree.pop) 
mean.trees <- mean(tree.pop)

The next two lines sample a single random quadrat and sum up 10 random quadrats, and store the results in quadrat.pop and quadrat10.pop, respectively:

quadrat.pop <- rowSums(sample_by_subcommunities(bci_2010, 1)) 
quadrat10.pop <- rowSums(sample_by_subcommunities(bci_2010, 10))

You will also use a handful of fake datasets for comparison:

one.pop <- c(num.trees, rep(0, num.sp - 1)) 
uneven.pop <- c(num.trees / 2, rep(0, num.sp - 1)) + rep(mean.trees / 2, num.sp) 
mixed.pop <- (1:num.sp) / sum(1:num.sp) * num.trees
even.pop <- rep(mean.trees, num.sp)
rand.pop <- as.vector(rmultinom(1, num.trees, rep(1 / num.sp, num.sp)))
rand50.pop <- as.vector(rmultinom(1, round(num.trees / 50), rep(1 / num.sp, num.sp)))

Remember that rep(x, size) repeats x, size times to make a vector of length size, and rmultinom() draws from the multinomial distribution, in this case, rand.pop is a population of 222,715 individuals drawn from 299 species, each equally abundant (\(\frac{1}{299}\)). Note that rmultinom() is is in the stats package, so you need to add this to the dependencies of your package now. We won’t tell you about any future dependencies you need to add, but note that you can check what package a function is in by looking in the help file for the function:

?rmultinom

If you run this in the console, the Multinom {stats} in the top left of the help page tells you that this function is in the stats package.

A species richness function

Then write and document a function that takes a vector of abundances like the data above as an argument and calculates the species richness from it. Remember that the species richness is the number of species whose abundance is not zero. The name of this function should start with the letters p1_, so if you decide to call it species_richness(), you should actually call it p1_species_richness(). This is because you will be making a series of similar functions throughout the project, so they will need to be called p1_species_richness(), p2_species_richness(), p3_species_richness(), etc. to distinguish them.

Hint: Applying a test to a vector of numbers, will apply the test to all of the numbers in the vector and produce a vector of booleans (TRUE or FALSE) values, so:

c(1, 3, 5, 7) > 4

[1] FALSE FALSE  TRUE  TRUE

1 and 3 are not > 4 and 5 and 7 are. And then sum() will count the number of TRUEs in such a vector of booleans. So:

over4s <- (c(1, 3, 5, 7) > 4)
sum(over4s)

[1] 2

If you think about it, you should be able to see how to use these two ideas to count the number of non-zero abundances in a population. You can see more about this in Practical A-1.

A Simpson index function

Then write and document a second function that will calculate the Simpson index of the population, remembering, first, that this measure works with the relative proportions of the species not the total counts and, second, that we can do calculations with the whole vector at once.

Running the code

Look at the values in the nine population vectors – you could try sorting them using sort(x.pop) and plotting them using plot(sort(x.pop)), barplot(sort(x.pop)) (or something more attractive?) to get a clearer idea of what they look like – and decide which you think have the richest species diversity.

Run the two functions on the populations provided as see whether the results of the two diversity measures agree with your assessments. Note that a lower value means a higher diversity for the Simpson index measure.

NB Remember that to make the functions you have written accessible to your demo, you need to document and install them:

devtools::document()
devtools::install()
library(githubusernameDiversity)
?p1_species_richness

You also need to install the package to use the demos:

devtools::install()
demo(package = "githubusernameDiversity")

Demo

Edit the demo that you have started above so that it loads in the functions (by calling library(githubusernameDiversity)), reads in the BCI data, and generates a report that shows some kind of plot of each of the series of counts and then calculates their species richness and Simpson index Check that your functions also generate appropriate help files (e.g. ?p1_species_richness).

Remember to commit the code once you are happy before moving on to the next exercise – you can always come back later and make further commits if you realise you need to change something. I won’t remind you about this in subsequent exercises.

Note on function names

As we said above, note that you will often need to refer back to functions in later exercises (as you did in the practicals) and you will be writing several very similar functions in successive exercises. You will need to make sure they do not have identical names or you won’t be able to call them all. The simplest way of doing this might be to label them by the project exercise number and call them p1_species_richness, p2_species_richness, etc. depending on what exercise you are in, but you are welcome to find a more elegant solution – anything that distinguishes them is fine.