Practical 4-2: Provide some helper functions through the package

Overview

In this session you will add some functions to your biodiversity data package to allow you to subsample from your data. The intention is just to see how we add functions to a package, and to provide you with an example of what the documentation for a function in an R package looks like. You will also add the documentation for the package as a whole.

Background

It’s rarely the case that you will have access to a fully sampled dataset like that at BCI. To pretend that your dataset has incomplete data, I am providing you with two functions that will subsample the data in different ways. The first, sample_by_species(), will sample from the dataset as if you were recording only some of the species (chosen at random) and ignoring the others. The second, sample_by_subcommunities(), will record the counts from only some of the subcommunities (here quadrats) and ignore the others. There are other ways of subsampling or recording data incompletely, and we will investigate this in the project.

Tasks

You simply need to put the code provided into R files in the R folder of your package, and edit the DESCRIPTION file of your package (using usethis::use_package() if you like) so that all dependencies are included. Then regenerate the documentation and reinstall the package. First the sample_by_species() function:

#' Subsample a dataset by species
#'
#' Sample a dataset as if we only recorded some species.
#'
#' @param dataset A matrix, data frame or tibble containing abundance or incidence data
#' @param count The number of species to retain
#' @return The subsampled dataset in the format it was passed in
#'
#' @export
#'
#' @examples
#' library(magrittr)
#' sample_by_species(bci_2010, count = 20) %>%
#'   sample_by_subcommunities(count = 6)
#'
sample_by_species <- function(dataset, count) {
  rows <- nrow(dataset)
  if (count > rows) {
    warning("Trying to pick more species than are present")
    count = rows
  }

  sample.rows <- sample(rows, count)
  dataset[sample.rows, , drop = FALSE]
}

and then the sample_by_subcommunities() function:

#' Subsample a dataset by subcommunities
#'
#' Sample a dataset as if we only counted in some subcommunities.
#'
#' @param dataset A matrix, data frame or tibble containing abundance or incidence data
#' @param count The number of subcommunities to retain
#' @return The subsampled dataset in the format it was passed in
#'
#' @export
#'
#' @examples
#' library(magrittr)
#' sample_by_species(bci_2010, count = 20) %>%
#'   sample_by_subcommunities(count = 6)
#'
sample_by_subcommunities <- function(dataset, count) {
  cols <- ncol(dataset)
  if (count > cols) {
    warning("Trying to pick more subcommunities than are present")
    count = cols
  }

  sample.cols <- sample(cols, count)
  dataset[, sample.cols, drop = FALSE]
}

GitHub

Once you have redocumented and reinstalled everything, and checked that it is all working, commit and push all of the changes to GitHub.