Intro

#Prevent warning and messages from displaying in the pdf of this document
knitr::opts_chunk$set(message = FALSE, warning=FALSE)

Getting set up

Before getting started, we need to load some extra functions of ‘packages’ that will let us organize and work with data. The build in functions of R are called ‘base R’ and everything else comes from a package.

The tidyverse is a collection of packages that let you work with data in a more streamlined way that what base R lets you do.

#eval=FALSE means this won't be run when I knit my final report of this markdown

#Generally you can install the entire family of tidyverse packages with just
install.packages("tidyverse")

#If there are problems, you can install the ones we're using today individually

install.packages("dplyr")
install.packages("ggplot2")

Each time you start R, you have to load the packages that you want to use.

library(tidyverse)
#> Loading tidyverse: ggplot2
#> Loading tidyverse: tibble
#> Loading tidyverse: tidyr
#> Loading tidyverse: readr
#> Loading tidyverse: purrr
#> Loading tidyverse: dplyr


#or for today

library(dplyr)
library(ggplot2)

#It's not necessary, but I'm also going to load the cowplot package, which loads a publication-ready theme for ggplot

library(cowplot)

Let’s check out some data. R comes with some built-in dataset that are all ready to use. One of these is called iris. This contains measurements on three types of irises. Many ggplot2 examples will use either the iris or mtcars dataset.

iris
##     Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
## 1            5.1         3.5          1.4         0.2     setosa
## 2            4.9         3.0          1.4         0.2     setosa
## 3            4.7         3.2          1.3         0.2     setosa
## 4            4.6         3.1          1.5         0.2     setosa
## 5            5.0         3.6          1.4         0.2     setosa
## 6            5.4         3.9          1.7         0.4     setosa
## 7            4.6         3.4          1.4         0.3     setosa
## 8            5.0         3.4          1.5         0.2     setosa
## 9            4.4         2.9          1.4         0.2     setosa
## 10           4.9         3.1          1.5         0.1     setosa
## 11           5.4         3.7          1.5         0.2     setosa
## 12           4.8         3.4          1.6         0.2     setosa
## 13           4.8         3.0          1.4         0.1     setosa
## 14           4.3         3.0          1.1         0.1     setosa
## 15           5.8         4.0          1.2         0.2     setosa
## 16           5.7         4.4          1.5         0.4     setosa
## 17           5.4         3.9          1.3         0.4     setosa
## 18           5.1         3.5          1.4         0.3     setosa
## 19           5.7         3.8          1.7         0.3     setosa
## 20           5.1         3.8          1.5         0.3     setosa
## 21           5.4         3.4          1.7         0.2     setosa
## 22           5.1         3.7          1.5         0.4     setosa
## 23           4.6         3.6          1.0         0.2     setosa
## 24           5.1         3.3          1.7         0.5     setosa
## 25           4.8         3.4          1.9         0.2     setosa
## 26           5.0         3.0          1.6         0.2     setosa
## 27           5.0         3.4          1.6         0.4     setosa
## 28           5.2         3.5          1.5         0.2     setosa
## 29           5.2         3.4          1.4         0.2     setosa
## 30           4.7         3.2          1.6         0.2     setosa
## 31           4.8         3.1          1.6         0.2     setosa
## 32           5.4         3.4          1.5         0.4     setosa
## 33           5.2         4.1          1.5         0.1     setosa
## 34           5.5         4.2          1.4         0.2     setosa
## 35           4.9         3.1          1.5         0.2     setosa
## 36           5.0         3.2          1.2         0.2     setosa
## 37           5.5         3.5          1.3         0.2     setosa
## 38           4.9         3.6          1.4         0.1     setosa
## 39           4.4         3.0          1.3         0.2     setosa
## 40           5.1         3.4          1.5         0.2     setosa
## 41           5.0         3.5          1.3         0.3     setosa
## 42           4.5         2.3          1.3         0.3     setosa
## 43           4.4         3.2          1.3         0.2     setosa
## 44           5.0         3.5          1.6         0.6     setosa
## 45           5.1         3.8          1.9         0.4     setosa
## 46           4.8         3.0          1.4         0.3     setosa
## 47           5.1         3.8          1.6         0.2     setosa
## 48           4.6         3.2          1.4         0.2     setosa
## 49           5.3         3.7          1.5         0.2     setosa
## 50           5.0         3.3          1.4         0.2     setosa
## 51           7.0         3.2          4.7         1.4 versicolor
## 52           6.4         3.2          4.5         1.5 versicolor
## 53           6.9         3.1          4.9         1.5 versicolor
## 54           5.5         2.3          4.0         1.3 versicolor
## 55           6.5         2.8          4.6         1.5 versicolor
## 56           5.7         2.8          4.5         1.3 versicolor
## 57           6.3         3.3          4.7         1.6 versicolor
## 58           4.9         2.4          3.3         1.0 versicolor
## 59           6.6         2.9          4.6         1.3 versicolor
## 60           5.2         2.7          3.9         1.4 versicolor
## 61           5.0         2.0          3.5         1.0 versicolor
## 62           5.9         3.0          4.2         1.5 versicolor
## 63           6.0         2.2          4.0         1.0 versicolor
## 64           6.1         2.9          4.7         1.4 versicolor
## 65           5.6         2.9          3.6         1.3 versicolor
## 66           6.7         3.1          4.4         1.4 versicolor
## 67           5.6         3.0          4.5         1.5 versicolor
## 68           5.8         2.7          4.1         1.0 versicolor
## 69           6.2         2.2          4.5         1.5 versicolor
## 70           5.6         2.5          3.9         1.1 versicolor
## 71           5.9         3.2          4.8         1.8 versicolor
## 72           6.1         2.8          4.0         1.3 versicolor
## 73           6.3         2.5          4.9         1.5 versicolor
## 74           6.1         2.8          4.7         1.2 versicolor
## 75           6.4         2.9          4.3         1.3 versicolor
## 76           6.6         3.0          4.4         1.4 versicolor
## 77           6.8         2.8          4.8         1.4 versicolor
## 78           6.7         3.0          5.0         1.7 versicolor
## 79           6.0         2.9          4.5         1.5 versicolor
## 80           5.7         2.6          3.5         1.0 versicolor
## 81           5.5         2.4          3.8         1.1 versicolor
## 82           5.5         2.4          3.7         1.0 versicolor
## 83           5.8         2.7          3.9         1.2 versicolor
## 84           6.0         2.7          5.1         1.6 versicolor
## 85           5.4         3.0          4.5         1.5 versicolor
## 86           6.0         3.4          4.5         1.6 versicolor
## 87           6.7         3.1          4.7         1.5 versicolor
## 88           6.3         2.3          4.4         1.3 versicolor
## 89           5.6         3.0          4.1         1.3 versicolor
## 90           5.5         2.5          4.0         1.3 versicolor
## 91           5.5         2.6          4.4         1.2 versicolor
## 92           6.1         3.0          4.6         1.4 versicolor
## 93           5.8         2.6          4.0         1.2 versicolor
## 94           5.0         2.3          3.3         1.0 versicolor
## 95           5.6         2.7          4.2         1.3 versicolor
## 96           5.7         3.0          4.2         1.2 versicolor
## 97           5.7         2.9          4.2         1.3 versicolor
## 98           6.2         2.9          4.3         1.3 versicolor
## 99           5.1         2.5          3.0         1.1 versicolor
## 100          5.7         2.8          4.1         1.3 versicolor
## 101          6.3         3.3          6.0         2.5  virginica
## 102          5.8         2.7          5.1         1.9  virginica
## 103          7.1         3.0          5.9         2.1  virginica
## 104          6.3         2.9          5.6         1.8  virginica
## 105          6.5         3.0          5.8         2.2  virginica
## 106          7.6         3.0          6.6         2.1  virginica
## 107          4.9         2.5          4.5         1.7  virginica
## 108          7.3         2.9          6.3         1.8  virginica
## 109          6.7         2.5          5.8         1.8  virginica
## 110          7.2         3.6          6.1         2.5  virginica
## 111          6.5         3.2          5.1         2.0  virginica
## 112          6.4         2.7          5.3         1.9  virginica
## 113          6.8         3.0          5.5         2.1  virginica
## 114          5.7         2.5          5.0         2.0  virginica
## 115          5.8         2.8          5.1         2.4  virginica
## 116          6.4         3.2          5.3         2.3  virginica
## 117          6.5         3.0          5.5         1.8  virginica
## 118          7.7         3.8          6.7         2.2  virginica
## 119          7.7         2.6          6.9         2.3  virginica
## 120          6.0         2.2          5.0         1.5  virginica
## 121          6.9         3.2          5.7         2.3  virginica
## 122          5.6         2.8          4.9         2.0  virginica
## 123          7.7         2.8          6.7         2.0  virginica
## 124          6.3         2.7          4.9         1.8  virginica
## 125          6.7         3.3          5.7         2.1  virginica
## 126          7.2         3.2          6.0         1.8  virginica
## 127          6.2         2.8          4.8         1.8  virginica
## 128          6.1         3.0          4.9         1.8  virginica
## 129          6.4         2.8          5.6         2.1  virginica
## 130          7.2         3.0          5.8         1.6  virginica
## 131          7.4         2.8          6.1         1.9  virginica
## 132          7.9         3.8          6.4         2.0  virginica
## 133          6.4         2.8          5.6         2.2  virginica
## 134          6.3         2.8          5.1         1.5  virginica
## 135          6.1         2.6          5.6         1.4  virginica
## 136          7.7         3.0          6.1         2.3  virginica
## 137          6.3         3.4          5.6         2.4  virginica
## 138          6.4         3.1          5.5         1.8  virginica
## 139          6.0         3.0          4.8         1.8  virginica
## 140          6.9         3.1          5.4         2.1  virginica
## 141          6.7         3.1          5.6         2.4  virginica
## 142          6.9         3.1          5.1         2.3  virginica
## 143          5.8         2.7          5.1         1.9  virginica
## 144          6.8         3.2          5.9         2.3  virginica
## 145          6.7         3.3          5.7         2.5  virginica
## 146          6.7         3.0          5.2         2.3  virginica
## 147          6.3         2.5          5.0         1.9  virginica
## 148          6.5         3.0          5.2         2.0  virginica
## 149          6.2         3.4          5.4         2.3  virginica
## 150          5.9         3.0          5.1         1.8  virginica
#    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
#1            5.1         3.5          1.4         0.2     setosa
#2            4.9         3.0          1.4         0.2     setosa
#3            4.7         3.2          1.3         0.2     setosa
#4            4.6         3.1          1.5         0.2     setosa
#5            5.0         3.6          1.4         0.2     setosa
#6            5.4         3.9          1.7         0.4     setosa
#7            4.6         3.4          1.4         0.3     setosa
#8            5.0         3.4          1.5         0.2     setosa
#9            4.4         2.9          1.4         0.2     setosa
#10           4.9         3.1          1.5         0.1     setosa

#?iris
#Description
#This famous (Fisher's or Anderson's) iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, and virginica.

#Format
#iris is a data frame with 150 cases (rows) and 5 variables (columns) named Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, and Species.

#To look at the species in the iris, we can use a few different syntaxes
#Get unique values of the iris column Species
unique(iris$Species)
## [1] setosa     versicolor virginica 
## Levels: setosa versicolor virginica
#or Take the iris column Species and send it to the unique function
iris$Species %>% unique
## [1] setosa     versicolor virginica 
## Levels: setosa versicolor virginica
#or  Take iris, select the Species column, and send it to the unique function
iris %>% 
    select(Species) %>%
    unique
##        Species
## 1       setosa
## 51  versicolor
## 101  virginica

To look at your own data, you’d do something like: > library(readr) > my_data <- read_delim(“/home/cmcwhite/experiment.txt”, delim=‘’)

The readr library functions read_delim() and read_csv() are prefereable than the base R read.table() or read.csv() function beacause they are less likely to mess up the formatting of your data.

Data format

The iris dataset is conviently already in a tidy format. What this means is that each observation has its wn row.

This is wide format data

ID 0hr 6hr 12hr sample1 1 2 3 sample2 2 4 6

For data analysis in R, we want to use a tidy format like:

ID time measure sample1 0 1 sample1 6 2 sample1 12 3 sample2 0 2 sample2 6 4 sample2 12 6

What this does is let us easily add new descriptive columns to this table. If you look back at the wide format, there isn’t a clear place to put observations or descriptions about individual timepoints. Say that it hailed during sample2 collection at 6 hours. Where would you put this info in the above wide table? In tidy format, it can go in a column for weather.

ID time measure weather temperature sample1 0 1 ‘clear’ 80 sample1 6 2 ‘rain’ 70 sample1 12 3 ‘cloudy’ 75 sample2 0 2 ‘cloudy’ 60 sample2 6 4 ‘hail’ 50 sample2 12 6 ‘rain’ 45

We’ll have lessons later about how to transform your data between wide and tidy formats using tidyr.

One more wide vs tidy example

The same information about the sky in wide form and in tidy form: A Metaphor

Wide form data

  • The sky has colors and august it was blue but this other time at sunset it was red but in the winter it’s grey

Tidy form data

  • The sky in August was blue
  • The sky in December was grey
  • The sky at Sunset was red

ie.

  • gene experiment_condition1 value1
  • gene experiment_condition2 value2

Now for some actual plots!

There are many different types of plot that you can make with ggplot. We’ll start with a plain scatter plot. List of the current ggplot geoms:

http://docs.ggplot2.org/current/

The basic grammar of a ggplot is:

ggplot(dataname ex. iris, aes(assign x, y, colors, shape, size to columns)) + geom_plottype() + other attributes to add on + like, remove the legend + or change the color scheme

The aes stands for aesthetics

“Aesthetic mappings describe how variables in the data are mapped to visual properties”

Ok, first plots now

#Set up the data set, and tell what columns are the x and y axes
#This will make a scatter plot of petal width vs petal length. 
ggplot(data = iris, aes(x=Petal.Width, y=Petal.Length)) +
    geom_point()

#Now with color!
ggplot(iris, aes(x=Petal.Width, y=Petal.Length, color=Species)) +
    geom_point()

#Now with point size scaled to petal.width!
#And slightly transparent!
ggplot(iris, aes(x=Petal.Width, y=Petal.Length, color=Species, size=Sepal.Width)) +
    geom_point(alpha=0.7)

#Now with good colors
#Each ggplot is like a recipe where you add on things you want.
ggplot(iris, aes(x=Petal.Width, y=Petal.Length, color=Species, size=Sepal.Width)) +
    geom_point(alpha=0.7) +
    scale_color_manual(values =c("#1b9e77","#d95f02", "#7570b3")) 

#Adding a facet line will break data into subplots
ggplot(iris, aes(x=Petal.Width, y=Petal.Length, color=Species, size=Sepal.Width)) +
    geom_point(alpha=0.7) +
    scale_color_manual(values =c("#1b9e77","#d95f02", "#7570b3")) +
    facet_wrap(~Species, scales = "free_x")

#Change the aesthetics to plot different columns
ggplot(iris, aes(x=Species, y=Petal.Width, color=Species)) +
    geom_violin(alpha=0.7) +
    scale_color_manual(values =c("#1b9e77","#d95f02", "#7570b3"))

#Use the fill aesthtic to fill in the shape
ggplot(iris, aes(x=Species, y=Petal.Width, fill=Species)) +
    geom_violin(alpha=0.7) +
    scale_color_manual(values =c("#1b9e77","#d95f02", "#7570b3"))

#We can use the same setup to make different types of plots by changing the geom.
ggplot(data = iris, aes(x=Species, y=Petal.Width, color=Sepal.Width)) +
    geom_point()

#With geom_point, the points are too stacked on to of each other, geom_jitter might work
ggplot(data = iris, aes(x=Species, y=Petal.Width, color=Sepal.Width)) +
    geom_jitter()

#Or geom_boxplot
ggplot(data = iris, aes(x=Species, y=Petal.Width, fill=Species)) +
    geom_boxplot()

#Or geom_violin
ggplot(data = iris, aes(x=Species, y=Petal.Width, fill=Species)) +
    geom_violin()

#Or geom_density
ggplot(iris, aes(x=Petal.Width, fill=Species)) +
    geom_density(alpha=0.7)

#For a final example, I'm switching to a timeseries dataset Chickweight
head(ChickWeight)
## Grouped Data: weight ~ Time | Chick
##   weight Time Chick Diet
## 1     42    0     1    1
## 2     51    2     1    1
## 3     59    4     1    1
## 4     64    6     1    1
## 5     76    8     1    1
## 6     93   10     1    1
#Plot the growth of each chick
ggplot(ChickWeight, aes(x=Time, y=weight, color=Diet, group=Chick)) +
        geom_line()

ggplot(ChickWeight, aes(x=Time, y=weight, color=Diet, group=Chick)) +
        geom_line(size=2, alpha=0.4) +
        facet_wrap(~Diet)  +
        scale_color_manual(values =c("#1b9e77","#d95f02", "#7570b3", "#a13242")) 

Exercises

For these example, try to make some plots using the mtcars dataset

mtcars
##                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
## Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
## Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
#                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
#Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
#Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
#Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
#...

#

#?mtcars

#A data frame with 32 observations on 11 variables.

#[, 1]   mpg     Miles/(US) gallon
#[, 2]   cyl     Number of cylinders
#[, 3]   disp    Displacement (cu.in.)
#[, 4]   hp  Gross horsepower
#[, 5]   drat    Rear axle ratio
#[, 6]   wt  Weight (1000 lbs)
#[, 7]   qsec    1/4 mile time
#[, 8]   vs  V/S
#[, 9]   am  Transmission (0 = automatic, 1 = manual)
#[,10]   gear    Number of forward gears
#[,11]   carb    Number of carburetors
  1. Make a scatterplot of mpg vs. horsepower

  2. Make the same plot, as in (1) but with all the points red

  3. Make a histogram (geom_histogram) of mpg

  4. Make a boxplot of mpg for each number of cylinders. Add a jitter plot on top of it.

#workspace