In this session you will add some functions to your biodiversity data package to allow you to subsample from your data. The intention is just to see how we add functions to a package, and to provide you with an example of what the documentation for a function in an R package looks like. You will also add the documentation for the package as a whole.
It’s rarely the case that you will have access to a fully sampled
dataset like that at BCI. To pretend that your dataset has incomplete
data, I am providing you with two functions that will subsample the data
in different ways. The first, sample_by_species()
, will
sample from the dataset as if you were recording only some of the
species (chosen at random) and ignoring the others. The second,
sample_by_subcommunities()
, will record the counts from
only some of the subcommunities (here quadrats) and ignore the others.
There are other ways of subsampling or recording data incompletely, and
we will investigate this in the project.
You simply need to put the code provided into R files in the R folder of your package, and edit the
DESCRIPTION file of your package
(using usethis::use_package()
if you like) so that all
dependencies are included. Then regenerate the documentation and
reinstall the package. First the sample_by_species()
function:
#' Subsample a dataset by species
#'
#' Sample a dataset as if we only recorded some species.
#'
#' @param dataset A matrix, data frame or tibble containing abundance or incidence data
#' @param count The number of species to retain
#' @return The subsampled dataset in the format it was passed in
#'
#' @export
#'
#' @examples
#' library(magrittr)
#' sample_by_species(bci_2010, count = 20) %>%
#' sample_by_subcommunities(count = 6)
#'
sample_by_species <- function(dataset, count) {
rows <- nrow(dataset)
if (count > rows) {
warning("Trying to pick more species than are present")
count = rows
}
sample.rows <- sample(rows, count)
dataset[sample.rows, , drop = FALSE]
}
and then the sample_by_subcommunities()
function:
#' Subsample a dataset by subcommunities
#'
#' Sample a dataset as if we only counted in some subcommunities.
#'
#' @param dataset A matrix, data frame or tibble containing abundance or incidence data
#' @param count The number of subcommunities to retain
#' @return The subsampled dataset in the format it was passed in
#'
#' @export
#'
#' @examples
#' library(magrittr)
#' sample_by_species(bci_2010, count = 20) %>%
#' sample_by_subcommunities(count = 6)
#'
sample_by_subcommunities <- function(dataset, count) {
cols <- ncol(dataset)
if (count > cols) {
warning("Trying to pick more subcommunities than are present")
count = cols
}
sample.cols <- sample(cols, count)
dataset[, sample.cols, drop = FALSE]
}
Once you have redocumented and reinstalled everything, and checked that it is all working, commit and push all of the changes to GitHub.