By the end of this assignment, you should be able to achieve the following tasks in R:

These achievements belong to Learning Outcomes 2, 3, 4, 5, 6.

Click on any blue text to visit the external website.

Note: If you contact me for help or (better yet) open an issue in the public discussion forum, please include the code that is not working and also tell me what you have tried.

Preparation

Assignment

Your task here is simple. Write code chunks that

Then, describe the results briefly. Identify patterns and trends, potential outliers, and other results you find interesting. Think like the scientist you are!

You may need to review your Introduction to R coding exercise to meet some requirements, especially factors and how to select one column from a dataframe, and how to change one value to another.

Remember to put all the data files in your data folder.

A few things:

Hint: tbl$x <- factor(tbl$x, levels = c(...), ordered = TRUE/FALSE). Just sayin’\(\dots\).

Limpets

Let’s begin gently.

Fenberg and Roy 2012 studied how human harvesting of the owl limpet (Lottia gigantea) affected its life history. The study sites are along the coast of California.

Requirements:

  • Data file: limpets.csv.
  • Are the data tidy?
  • Use the col_types argument. The three column types are numeric, character, and character.
  • Make the Sites column an ordered factor with these levels:
    • PBL, KNRM, VBG, WP, PF, DP, SIO, CTZ1, CTZ2, CNM
    • PBL is the northern most site (Pebble Beach). CNM is the southernmost site (Cabrillo National Monument).
  • Make a boxplot of length (in millimeters) for each site, colored by protected status.
  • Change the axis labels so they begin with capital letters. The y-axis should include the unit of measurement in parentheses.
  • In your description, tell which two sites have outliers and whether the protected sites tend to have larger or smaller limpets.

Roseate Terns

Seward et al. 2018 studied metapopulation dynamics of roseate terns (Sterna dougallii) in northwestern Europe to determine how abundance changed at nine sites. The number of individuals was counted at each site every year between 1992 and 2016.

  • Data: roseate_terns.txt
  • Are the data tidy?
  • Use filter to remove sites with missing counts.
  • Make a line plot of population size over time.
  • Change the axis labels as appropriate (you have to start thinking about what is appropriate).
  • Which population(s) obviously increased in size between 1992 and 2016? Which population(s) obviously decreased in size during that time?
  • Some lines have breaks in them. That is, they are not continuous across all years. Why?

Blacklip Abalone

Warwick et al. 1994 studied the population biology of Blacklip Abalone (Haliotis rubra) from the north coast and Bass Strait Islands of Tasmania.

  • Data: abalone.csv (ab-ah-LOW-knee; rhymes with bologne)
  • Follow the instructions carefully. This exercise walks through a few steps of a “typical” analysis. Make a separate code chunk for each instruction.

Chunk 1: Import, remove the first column, then make a boxplot of height differences among the three types.

Chunk 2: The boxplot for height shows a female and a male outlier. Perhaps the samples contained two very large, old individuals. Make a scatterplot to see if height appears to correlate with rings. Rings is a measure used to estimate age. Based on the graph, are the extraordinarily large individuals really old individuals?

Chunk 3: Let’s assume the outliers are coding errors so remove them by filtering. Filter the data to remove the two large individuals. Change Type to an ordered factor. Immatures must be first, as that makes sense in terms of age. The order of female and male after immature is up to you. Then, redo the scatterplot that you just made with the newly wrangled data.

What patterns emerge? Which type is the largest? Are all females and males larger than immatures?

Chunk 4: Are there really immatures with more than five rings with zero height? Srsly? - Print the records of the individuals with zero height. - Most likely, the two zero height values are mistakes made during data recording. This time, instead of filtering them, assign NA (missing data) to those two records. Repolot the data to ensure the two observations are not included in the graph.

Chunk 5: Make two scatterplots of your choice, between any two pairs of continuous variables that make sense to show as scatterplots. Color, shape, or both should distinguish the three types.

Darters

This will be your most challenging import. Inspect the file! Think through the problem. If necessary, jot some notes on paper to outline the information you need to consider, such as how many lines to skip, the position of columns, etc. Use chunks logically and appropriately to accomplish the tasks described below.

Taylor (unpublished) studied the microhabitat use of darters in the genus Etheostoma (Family Percidae) from the Niangua River watershed in Missouri.

  • Data: darters.txt
  • Column names and widths are included in the file. You can use whatever column names you want but adjust accordingly for info below.
  • Make riffle an unordered factor with levels 1 and 2.
  • Make major_type an ordered factor with levels s, fg, sg, lg, c

Do these four steps together with the pipe.

  • Filter to remove rows with “zonale” and “tetrazonum”.
  • Remove mintype and minsub columns.
  • Rename majsub and majtype to major_substrate and major_type, respectively.
  • Arrange the data by id.

Data were collected from two riffles, separated by several hundred meters. The plots below explore differences between riffles. Use facet_wrap() to make pairs of plots separated by riffle.

Plot 1 Plot length as a function of depth. Map species to color and shape. What differences do you see between the two riffles?

Plot 1 chunk here.

Plot 2: Make a boxplot of length for each of the three species. Which riffle shows the greatest number of outliers?

Plot 2 chunk here.

Plot 3: Make a boxplot of length for major substrate types for each species for each riffle. This will actually be six plots in one! To do this, use facet_grid(species ~ riffle) in place of facet_wrap(). How does the plot change if you switch the order of the argument (riffle ~ species) in the facet_grid() function?

Plot 3 chunk here.