#Prevent warning and messages from displaying in the pdf of this document
knitr::opts_chunk$set(message = FALSE, warning=FALSE)
Before getting started, we need to load some extra functions of ‘packages’ that will let us organize and work with data. The build in functions of R are called ‘base R’ and everything else comes from a package.
The tidyverse is a collection of packages that let you work with data in a more streamlined way that what base R lets you do.
#eval=FALSE means this won't be run when I knit my final report of this markdown
#Generally you can install the entire family of tidyverse packages with just
install.packages("tidyverse")
#If there are problems, you can install the ones we're using today individually
install.packages("dplyr")
install.packages("ggplot2")
Each time you start R, you have to load the packages that you want to use.
library(tidyverse)
#> Loading tidyverse: ggplot2
#> Loading tidyverse: tibble
#> Loading tidyverse: tidyr
#> Loading tidyverse: readr
#> Loading tidyverse: purrr
#> Loading tidyverse: dplyr
#or for today
library(dplyr)
library(ggplot2)
#It's not necessary, but I'm also going to load the cowplot package, which loads a publication-ready theme for ggplot
library(cowplot)
Let’s check out some data. R comes with some built-in dataset that are all ready to use. One of these is called iris
. This contains measurements on three types of irises. Many ggplot2 examples will use either the iris
or mtcars
dataset.
iris
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 7 4.6 3.4 1.4 0.3 setosa
## 8 5.0 3.4 1.5 0.2 setosa
## 9 4.4 2.9 1.4 0.2 setosa
## 10 4.9 3.1 1.5 0.1 setosa
## 11 5.4 3.7 1.5 0.2 setosa
## 12 4.8 3.4 1.6 0.2 setosa
## 13 4.8 3.0 1.4 0.1 setosa
## 14 4.3 3.0 1.1 0.1 setosa
## 15 5.8 4.0 1.2 0.2 setosa
## 16 5.7 4.4 1.5 0.4 setosa
## 17 5.4 3.9 1.3 0.4 setosa
## 18 5.1 3.5 1.4 0.3 setosa
## 19 5.7 3.8 1.7 0.3 setosa
## 20 5.1 3.8 1.5 0.3 setosa
## 21 5.4 3.4 1.7 0.2 setosa
## 22 5.1 3.7 1.5 0.4 setosa
## 23 4.6 3.6 1.0 0.2 setosa
## 24 5.1 3.3 1.7 0.5 setosa
## 25 4.8 3.4 1.9 0.2 setosa
## 26 5.0 3.0 1.6 0.2 setosa
## 27 5.0 3.4 1.6 0.4 setosa
## 28 5.2 3.5 1.5 0.2 setosa
## 29 5.2 3.4 1.4 0.2 setosa
## 30 4.7 3.2 1.6 0.2 setosa
## 31 4.8 3.1 1.6 0.2 setosa
## 32 5.4 3.4 1.5 0.4 setosa
## 33 5.2 4.1 1.5 0.1 setosa
## 34 5.5 4.2 1.4 0.2 setosa
## 35 4.9 3.1 1.5 0.2 setosa
## 36 5.0 3.2 1.2 0.2 setosa
## 37 5.5 3.5 1.3 0.2 setosa
## 38 4.9 3.6 1.4 0.1 setosa
## 39 4.4 3.0 1.3 0.2 setosa
## 40 5.1 3.4 1.5 0.2 setosa
## 41 5.0 3.5 1.3 0.3 setosa
## 42 4.5 2.3 1.3 0.3 setosa
## 43 4.4 3.2 1.3 0.2 setosa
## 44 5.0 3.5 1.6 0.6 setosa
## 45 5.1 3.8 1.9 0.4 setosa
## 46 4.8 3.0 1.4 0.3 setosa
## 47 5.1 3.8 1.6 0.2 setosa
## 48 4.6 3.2 1.4 0.2 setosa
## 49 5.3 3.7 1.5 0.2 setosa
## 50 5.0 3.3 1.4 0.2 setosa
## 51 7.0 3.2 4.7 1.4 versicolor
## 52 6.4 3.2 4.5 1.5 versicolor
## 53 6.9 3.1 4.9 1.5 versicolor
## 54 5.5 2.3 4.0 1.3 versicolor
## 55 6.5 2.8 4.6 1.5 versicolor
## 56 5.7 2.8 4.5 1.3 versicolor
## 57 6.3 3.3 4.7 1.6 versicolor
## 58 4.9 2.4 3.3 1.0 versicolor
## 59 6.6 2.9 4.6 1.3 versicolor
## 60 5.2 2.7 3.9 1.4 versicolor
## 61 5.0 2.0 3.5 1.0 versicolor
## 62 5.9 3.0 4.2 1.5 versicolor
## 63 6.0 2.2 4.0 1.0 versicolor
## 64 6.1 2.9 4.7 1.4 versicolor
## 65 5.6 2.9 3.6 1.3 versicolor
## 66 6.7 3.1 4.4 1.4 versicolor
## 67 5.6 3.0 4.5 1.5 versicolor
## 68 5.8 2.7 4.1 1.0 versicolor
## 69 6.2 2.2 4.5 1.5 versicolor
## 70 5.6 2.5 3.9 1.1 versicolor
## 71 5.9 3.2 4.8 1.8 versicolor
## 72 6.1 2.8 4.0 1.3 versicolor
## 73 6.3 2.5 4.9 1.5 versicolor
## 74 6.1 2.8 4.7 1.2 versicolor
## 75 6.4 2.9 4.3 1.3 versicolor
## 76 6.6 3.0 4.4 1.4 versicolor
## 77 6.8 2.8 4.8 1.4 versicolor
## 78 6.7 3.0 5.0 1.7 versicolor
## 79 6.0 2.9 4.5 1.5 versicolor
## 80 5.7 2.6 3.5 1.0 versicolor
## 81 5.5 2.4 3.8 1.1 versicolor
## 82 5.5 2.4 3.7 1.0 versicolor
## 83 5.8 2.7 3.9 1.2 versicolor
## 84 6.0 2.7 5.1 1.6 versicolor
## 85 5.4 3.0 4.5 1.5 versicolor
## 86 6.0 3.4 4.5 1.6 versicolor
## 87 6.7 3.1 4.7 1.5 versicolor
## 88 6.3 2.3 4.4 1.3 versicolor
## 89 5.6 3.0 4.1 1.3 versicolor
## 90 5.5 2.5 4.0 1.3 versicolor
## 91 5.5 2.6 4.4 1.2 versicolor
## 92 6.1 3.0 4.6 1.4 versicolor
## 93 5.8 2.6 4.0 1.2 versicolor
## 94 5.0 2.3 3.3 1.0 versicolor
## 95 5.6 2.7 4.2 1.3 versicolor
## 96 5.7 3.0 4.2 1.2 versicolor
## 97 5.7 2.9 4.2 1.3 versicolor
## 98 6.2 2.9 4.3 1.3 versicolor
## 99 5.1 2.5 3.0 1.1 versicolor
## 100 5.7 2.8 4.1 1.3 versicolor
## 101 6.3 3.3 6.0 2.5 virginica
## 102 5.8 2.7 5.1 1.9 virginica
## 103 7.1 3.0 5.9 2.1 virginica
## 104 6.3 2.9 5.6 1.8 virginica
## 105 6.5 3.0 5.8 2.2 virginica
## 106 7.6 3.0 6.6 2.1 virginica
## 107 4.9 2.5 4.5 1.7 virginica
## 108 7.3 2.9 6.3 1.8 virginica
## 109 6.7 2.5 5.8 1.8 virginica
## 110 7.2 3.6 6.1 2.5 virginica
## 111 6.5 3.2 5.1 2.0 virginica
## 112 6.4 2.7 5.3 1.9 virginica
## 113 6.8 3.0 5.5 2.1 virginica
## 114 5.7 2.5 5.0 2.0 virginica
## 115 5.8 2.8 5.1 2.4 virginica
## 116 6.4 3.2 5.3 2.3 virginica
## 117 6.5 3.0 5.5 1.8 virginica
## 118 7.7 3.8 6.7 2.2 virginica
## 119 7.7 2.6 6.9 2.3 virginica
## 120 6.0 2.2 5.0 1.5 virginica
## 121 6.9 3.2 5.7 2.3 virginica
## 122 5.6 2.8 4.9 2.0 virginica
## 123 7.7 2.8 6.7 2.0 virginica
## 124 6.3 2.7 4.9 1.8 virginica
## 125 6.7 3.3 5.7 2.1 virginica
## 126 7.2 3.2 6.0 1.8 virginica
## 127 6.2 2.8 4.8 1.8 virginica
## 128 6.1 3.0 4.9 1.8 virginica
## 129 6.4 2.8 5.6 2.1 virginica
## 130 7.2 3.0 5.8 1.6 virginica
## 131 7.4 2.8 6.1 1.9 virginica
## 132 7.9 3.8 6.4 2.0 virginica
## 133 6.4 2.8 5.6 2.2 virginica
## 134 6.3 2.8 5.1 1.5 virginica
## 135 6.1 2.6 5.6 1.4 virginica
## 136 7.7 3.0 6.1 2.3 virginica
## 137 6.3 3.4 5.6 2.4 virginica
## 138 6.4 3.1 5.5 1.8 virginica
## 139 6.0 3.0 4.8 1.8 virginica
## 140 6.9 3.1 5.4 2.1 virginica
## 141 6.7 3.1 5.6 2.4 virginica
## 142 6.9 3.1 5.1 2.3 virginica
## 143 5.8 2.7 5.1 1.9 virginica
## 144 6.8 3.2 5.9 2.3 virginica
## 145 6.7 3.3 5.7 2.5 virginica
## 146 6.7 3.0 5.2 2.3 virginica
## 147 6.3 2.5 5.0 1.9 virginica
## 148 6.5 3.0 5.2 2.0 virginica
## 149 6.2 3.4 5.4 2.3 virginica
## 150 5.9 3.0 5.1 1.8 virginica
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1 5.1 3.5 1.4 0.2 setosa
#2 4.9 3.0 1.4 0.2 setosa
#3 4.7 3.2 1.3 0.2 setosa
#4 4.6 3.1 1.5 0.2 setosa
#5 5.0 3.6 1.4 0.2 setosa
#6 5.4 3.9 1.7 0.4 setosa
#7 4.6 3.4 1.4 0.3 setosa
#8 5.0 3.4 1.5 0.2 setosa
#9 4.4 2.9 1.4 0.2 setosa
#10 4.9 3.1 1.5 0.1 setosa
#?iris
#Description
#This famous (Fisher's or Anderson's) iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, and virginica.
#Format
#iris is a data frame with 150 cases (rows) and 5 variables (columns) named Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, and Species.
#To look at the species in the iris, we can use a few different syntaxes
#Get unique values of the iris column Species
unique(iris$Species)
## [1] setosa versicolor virginica
## Levels: setosa versicolor virginica
#or Take the iris column Species and send it to the unique function
iris$Species %>% unique
## [1] setosa versicolor virginica
## Levels: setosa versicolor virginica
#or Take iris, select the Species column, and send it to the unique function
iris %>%
select(Species) %>%
unique
## Species
## 1 setosa
## 51 versicolor
## 101 virginica
To look at your own data, you’d do something like: > library(readr) > my_data <- read_delim(“/home/cmcwhite/experiment.txt”, delim=‘’)
The readr
library functions read_delim()
and read_csv()
are prefereable than the base R read.table()
or read.csv()
function beacause they are less likely to mess up the formatting of your data.
The iris
dataset is conviently already in a tidy format. What this means is that each observation has its wn row.
This is wide format data
ID 0hr 6hr 12hr sample1 1 2 3 sample2 2 4 6
For data analysis in R, we want to use a tidy format like:
ID time measure sample1 0 1 sample1 6 2 sample1 12 3 sample2 0 2 sample2 6 4 sample2 12 6
What this does is let us easily add new descriptive columns to this table. If you look back at the wide format, there isn’t a clear place to put observations or descriptions about individual timepoints. Say that it hailed during sample2 collection at 6 hours. Where would you put this info in the above wide table? In tidy format, it can go in a column for weather.
ID time measure weather temperature sample1 0 1 ‘clear’ 80 sample1 6 2 ‘rain’ 70 sample1 12 3 ‘cloudy’ 75 sample2 0 2 ‘cloudy’ 60 sample2 6 4 ‘hail’ 50 sample2 12 6 ‘rain’ 45
We’ll have lessons later about how to transform your data between wide and tidy formats using tidyr
.
The same information about the sky in wide form and in tidy form: A Metaphor
Wide form data
Tidy form data
ie.
There are many different types of plot that you can make with ggplot. We’ll start with a plain scatter plot. List of the current ggplot geoms:
http://docs.ggplot2.org/current/
The basic grammar of a ggplot is:
ggplot(dataname ex. iris, aes(assign x, y, colors, shape, size to columns)) + geom_plottype() + other attributes to add on + like, remove the legend + or change the color scheme
The aes stands for aesthetics
“Aesthetic mappings describe how variables in the data are mapped to visual properties”
Ok, first plots now
#Set up the data set, and tell what columns are the x and y axes
#This will make a scatter plot of petal width vs petal length.
ggplot(data = iris, aes(x=Petal.Width, y=Petal.Length)) +
geom_point()
#Now with color!
ggplot(iris, aes(x=Petal.Width, y=Petal.Length, color=Species)) +
geom_point()
#Now with point size scaled to petal.width!
#And slightly transparent!
ggplot(iris, aes(x=Petal.Width, y=Petal.Length, color=Species, size=Sepal.Width)) +
geom_point(alpha=0.7)
#Now with good colors
#Each ggplot is like a recipe where you add on things you want.
ggplot(iris, aes(x=Petal.Width, y=Petal.Length, color=Species, size=Sepal.Width)) +
geom_point(alpha=0.7) +
scale_color_manual(values =c("#1b9e77","#d95f02", "#7570b3"))
#Adding a facet line will break data into subplots
ggplot(iris, aes(x=Petal.Width, y=Petal.Length, color=Species, size=Sepal.Width)) +
geom_point(alpha=0.7) +
scale_color_manual(values =c("#1b9e77","#d95f02", "#7570b3")) +
facet_wrap(~Species, scales = "free_x")
#Change the aesthetics to plot different columns
ggplot(iris, aes(x=Species, y=Petal.Width, color=Species)) +
geom_violin(alpha=0.7) +
scale_color_manual(values =c("#1b9e77","#d95f02", "#7570b3"))
#Use the fill aesthtic to fill in the shape
ggplot(iris, aes(x=Species, y=Petal.Width, fill=Species)) +
geom_violin(alpha=0.7) +
scale_color_manual(values =c("#1b9e77","#d95f02", "#7570b3"))
#We can use the same setup to make different types of plots by changing the geom.
ggplot(data = iris, aes(x=Species, y=Petal.Width, color=Sepal.Width)) +
geom_point()
#With geom_point, the points are too stacked on to of each other, geom_jitter might work
ggplot(data = iris, aes(x=Species, y=Petal.Width, color=Sepal.Width)) +
geom_jitter()
#Or geom_boxplot
ggplot(data = iris, aes(x=Species, y=Petal.Width, fill=Species)) +
geom_boxplot()
#Or geom_violin
ggplot(data = iris, aes(x=Species, y=Petal.Width, fill=Species)) +
geom_violin()
#Or geom_density
ggplot(iris, aes(x=Petal.Width, fill=Species)) +
geom_density(alpha=0.7)
#For a final example, I'm switching to a timeseries dataset Chickweight
head(ChickWeight)
## Grouped Data: weight ~ Time | Chick
## weight Time Chick Diet
## 1 42 0 1 1
## 2 51 2 1 1
## 3 59 4 1 1
## 4 64 6 1 1
## 5 76 8 1 1
## 6 93 10 1 1
#Plot the growth of each chick
ggplot(ChickWeight, aes(x=Time, y=weight, color=Diet, group=Chick)) +
geom_line()
ggplot(ChickWeight, aes(x=Time, y=weight, color=Diet, group=Chick)) +
geom_line(size=2, alpha=0.4) +
facet_wrap(~Diet) +
scale_color_manual(values =c("#1b9e77","#d95f02", "#7570b3", "#a13242"))
For these example, try to make some plots using the mtcars
dataset
mtcars
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
#...
#
#?mtcars
#A data frame with 32 observations on 11 variables.
#[, 1] mpg Miles/(US) gallon
#[, 2] cyl Number of cylinders
#[, 3] disp Displacement (cu.in.)
#[, 4] hp Gross horsepower
#[, 5] drat Rear axle ratio
#[, 6] wt Weight (1000 lbs)
#[, 7] qsec 1/4 mile time
#[, 8] vs V/S
#[, 9] am Transmission (0 = automatic, 1 = manual)
#[,10] gear Number of forward gears
#[,11] carb Number of carburetors
Make a scatterplot of mpg vs. horsepower
Make the same plot, as in (1) but with all the points red
Make a histogram (geom_histogram) of mpg
Make a boxplot of mpg for each number of cylinders. Add a jitter plot on top of it.
#workspace