In this series of practicals you will develop the code to calculate a few diversity measures. In this first exercise we will calculate the species richness and Simpson index of a population.
R provides us with many built-in datasets, but sadly none of them are counts of species abundances, so you will use the famous Barro Colorado Island dataset, a 30+ year study of a 50-hectare forest plot in Panama. We will use the number of trees in the study site in 2010, which you have access to from the package you developed in Practical Series 4.
You will also construct a couple of fake datasets that you will use for comparison. Calculating the diversity of these populations (remembering that there are many, in fact a continuum of, diversity measures) will tell us how many observed species there are – the species richness – and give us measures of how evenly distributed the trees are amongst the tree species (the evenness of the species distribution) by various measures. Species richness is a very commonly used measure of diversity, which simply counts the number of non-zero species abundances; Simpson index is also commonly used: it simply calculates the species proportions from the counts (so the total sums to 1), squares them all individually, and add them all together – it tells us (amongst other things) how likely two randomly chosen individuals are to be from the same species.
This project will be written in the
githubusernameDiversity package that you set up in the
last exercise, with functions being saved in the R folder and documented so that they work
with the standard R help system, and scripts being saved as demos in the
demo folder so that they can be run
using the demo()
function or compiled into reports to read
the results with suitable reporting explaining what is happening. In
each project exercise you will have to write some functions and (most
likely) one script. So now create a demo script for this project and
add an entry for it in the 00Index
file in the same folder.
We are taking the data from the 2010 census – the true abundances for the whole site can be calculated as follows:
library(githubusernameBCI)
tree.pop <- rowSums(bci_2010)
Note that if you don’t want to use your own BCI package, you can use ours instead. It’s available as
soniamitchellBCI
in the SBOHVM organisation on GitHub just like yours. Either way you should install the package you are using by runningdevtools::install_github()
, and then add it to yourgithubusernameDiversity
package dependencies by runningusethis::use_dev_package()
appropriately.
The total counts of trees of different species in this forest plot
are now available in the variable tree.pop
. We can then do
some simple extraction of information from the full species counts:
species.names <- names(tree.pop)
num.sp <- length(tree.pop)
num.trees <- sum(tree.pop)
mean.trees <- mean(tree.pop)
The next two lines sample a single random quadrat and sum up 10
random quadrats, and store the results in quadrat.pop
and
quadrat10.pop
, respectively:
quadrat.pop <- rowSums(sample_by_subcommunities(bci_2010, 1))
quadrat10.pop <- rowSums(sample_by_subcommunities(bci_2010, 10))
You will also use a handful of fake datasets for comparison:
one.pop <- c(num.trees, rep(0, num.sp - 1))
uneven.pop <- c(num.trees / 2, rep(0, num.sp - 1)) + rep(mean.trees / 2, num.sp)
mixed.pop <- (1:num.sp) / sum(1:num.sp) * num.trees
even.pop <- rep(mean.trees, num.sp)
rand.pop <- as.vector(rmultinom(1, num.trees, rep(1 / num.sp, num.sp)))
rand50.pop <- as.vector(rmultinom(1, round(num.trees / 50), rep(1 / num.sp, num.sp)))
Remember that rep(x, size)
repeats x
,
size
times to make a vector of length size
,
and rmultinom()
draws from the multinomial
distribution, in this case, rand.pop
is a
population of 222,715 individuals drawn from 299 species, each equally
abundant (\(\frac{1}{299}\)). Note that
rmultinom()
is is in the stats
package, so you
need to add this to the dependencies of your package now. We won’t tell
you about any future dependencies you need to add, but note that you can
check what package a function is in by looking in the help file for the
function:
?rmultinom
If you run this in the console, the Multinom {stats}
in the top left of the help page tells you that this function is in the
stats
package.
Then write and document a function that takes a vector of abundances
like the data above as an argument and calculates the species richness
from it. Remember that the species richness is the number of species
whose abundance is not zero. The name of this function should start with
the letters p1_
, so if you decide to call it
species_richness()
, you should actually call it
p1_species_richness()
. This is because you will be making a
series of similar functions throughout the project, so they will need to
be called p1_species_richness()
,
p2_species_richness()
, p3_species_richness()
,
etc. to distinguish them.
Hint: Applying a test to a vector of numbers, will apply the
test to all of the numbers in the vector and produce a vector of
booleans (TRUE
or FALSE
) values, so:
c(1, 3, 5, 7) > 4
[1] FALSE FALSE TRUE TRUE
1 and 3 are not > 4
and 5 and 7 are. And then
sum()
will count the number of TRUE
s in such a
vector of booleans. So:
over4s <- (c(1, 3, 5, 7) > 4)
sum(over4s)
[1] 2
If you think about it, you should be able to see how to use these two ideas to count the number of non-zero abundances in a population. You can see more about this in Practical A-1.
Then write and document a second function that will calculate the Simpson index of the population, remembering, first, that this measure works with the relative proportions of the species not the total counts and, second, that we can do calculations with the whole vector at once.
Look at the values in the nine population vectors – you could try
sorting them using sort(x.pop)
and plotting them using
plot(sort(x.pop))
, barplot(sort(x.pop))
(or
something more attractive?) to get a clearer idea of what they look like
– and decide which you think have the richest species diversity.
Run the two functions on the populations provided as see whether the results of the two diversity measures agree with your assessments. Note that a lower value means a higher diversity for the Simpson index measure.
NB Remember that to make the functions you have written accessible to your demo, you need to document and install them:
devtools::document()
devtools::install()
library(githubusernameDiversity)
?p1_species_richness
You also need to install the package to use the demos:
devtools::install()
demo(package = "githubusernameDiversity")
Edit the demo that you have started above so that it loads in the
functions (by calling library(githubusernameDiversity)
),
reads in the BCI data, and generates a report that shows some kind of
plot of each of the series of counts and then calculates their species
richness and Simpson index Check that your functions also generate
appropriate help files (e.g.
?p1_species_richness
).
Remember to commit the code once you are happy before moving on to the next exercise – you can always come back later and make further commits if you realise you need to change something. I won’t remind you about this in subsequent exercises.
As we said above, note that you will often need to refer back to
functions in later exercises (as you did in the practicals) and you will
be writing several very similar functions in successive exercises. You
will need to make sure they do not have identical names or you won’t be
able to call them all. The simplest way of doing this might be to label
them by the project exercise number and call them
p1_species_richness
, p2_species_richness
,
etc. depending on what exercise you are in, but you are welcome
to find a more elegant solution – anything that distinguishes them is
fine.