3 Data Management and Manipulation
install.packages("dplyr",repos = "https://cran.us.r-project.org")
install.packages("tidyr",repos = "https://cran.us.r-project.org")
install.packages("stringr",repos = "https://cran.us.r-project.org")
install.packages("lubridate",repos = "https://cran.us.r-project.org")
Read in the data
urlfile03a="https://raw.githubusercontent.com/apicellap/data/main/compensation.csv"
compensation<-read.csv(url(urlfile03a))
head(compensation)
## Root Fruit Grazing
## 1 6.225 59.77 Ungrazed
## 2 6.487 60.98 Ungrazed
## 3 4.919 14.73 Ungrazed
## 4 5.130 19.28 Ungrazed
## 5 5.417 34.25 Ungrazed
## 6 5.359 35.53 Ungrazed
Summarize the data in each variable:
summary(compensation)
## Root Fruit Grazing
## Min. : 4.426 Min. : 14.73 Length:40
## 1st Qu.: 6.083 1st Qu.: 41.15 Class :character
## Median : 7.123 Median : 60.88 Mode :character
## Mean : 7.181 Mean : 59.41
## 3rd Qu.: 8.510 3rd Qu.: 76.19
## Max. :10.253 Max. :116.05
3.1 Subsetting
Create new dataframe comprised of specific variable(s):
## Fruit
## 1 59.77
## 2 60.98
## 3 14.73
## 4 19.28
## 5 34.25
## 6 35.53
Select all columns except one:
## Fruit Grazing
## 1 59.77 Ungrazed
## 2 60.98 Ungrazed
## 3 14.73 Ungrazed
## 4 19.28 Ungrazed
## 5 34.25 Ungrazed
## 6 35.53 Ungrazed
Create new dataframe comrpised of specific variable(s) except ‘Root’:
## Root Fruit Grazing
## 1 6.487 60.98 Ungrazed
## 2 4.919 14.73 Ungrazed
## 3 5.130 19.28 Ungrazed
## 4 5.417 34.25 Ungrazed
## 5 5.359 35.53 Ungrazed
## 6 7.614 87.73 Ungrazed
Create new dataframe comprised of a list of variables:
## Root Fruit Grazing
## 1 6.487 60.98 Ungrazed
## 2 4.919 14.73 Ungrazed
## 3 6.930 64.34 Ungrazed
Filter data set to only observations in which this is TRUE:
filter(compensation, Fruit == 80)
## [1] Root Fruit Grazing
## <0 rows> (or 0-length row.names)
Grab observations when Fruit is not equal to 80:
## Root Fruit Grazing
## 1 6.225 59.77 Ungrazed
## 2 6.487 60.98 Ungrazed
## 3 4.919 14.73 Ungrazed
## 4 5.130 19.28 Ungrazed
## 5 5.417 34.25 Ungrazed
## 6 5.359 35.53 Ungrazed
Grab any observations in which Fruit is ≤ 80; can also use < symbol for less than:
## Root Fruit Grazing
## 1 6.225 59.77 Ungrazed
## 2 6.487 60.98 Ungrazed
## 3 4.919 14.73 Ungrazed
## 4 5.130 19.28 Ungrazed
## 5 5.417 34.25 Ungrazed
## 6 5.359 35.53 Ungrazed
Grab any observations in which Fruit is greater than 95 OR less than 15:
## Root Fruit Grazing
## 1 4.919 14.73 Ungrazed
## 2 10.253 116.05 Grazed
## 3 6.106 14.95 Grazed
## 4 9.844 105.07 Grazed
## 5 9.351 98.47 Grazed
Grab any observations in which Fruit is greater than 50 AND less than 55:
## Root Fruit Grazing
## 1 6.248 52.92 Ungrazed
## 2 6.013 53.61 Ungrazed
## 3 5.928 54.86 Ungrazed
## 4 7.354 50.08 Grazed
## 5 8.158 52.26 Grazed
Order data by Fruit from lowest to highest observation:
## Root Fruit Grazing
## 1 4.919 14.73 Ungrazed
## 2 6.106 14.95 Grazed
## 3 4.426 18.89 Ungrazed
## 4 5.130 19.28 Ungrazed
## 5 4.975 24.25 Ungrazed
## 6 5.451 32.35 Ungrazed
Create new dataframe that filters observations that have Fruit values above 80 and only contains the corresponding Root values:
## Root
## 1 7.614
## 2 7.001
## 3 10.253
## 4 9.039
## 5 8.988
## 6 8.975
3.2 Calculating summary statistics about groups of your data
Perform summary analyses on dataframe:
summarise(
group_by(compensation, Grazing), #access the dataframe, target Grazing to be the grouping variable
meanFruit = mean(Fruit)) #creates the object, meanFruit which is the mean of the data in the Fruit variable
## # A tibble: 2 × 2
## Grazing meanFruit
## <chr> <dbl>
## 1 Grazed 67.9
## 2 Ungrazed 50.9
Additional summary functions and create new dataframe to encompass calculations:
mean.fruit<-summarise(
group_by(compensation, Grazing),
meanFruit = mean(Fruit), sdfruit =sd(Fruit)) #multiple statistics can be calculated within summarise
mean.fruit
## # A tibble: 2 × 3
## Grazing meanFruit sdfruit
## <chr> <dbl> <dbl>
## 1 Grazed 67.9 25.0
## 2 Ungrazed 50.9 21.8
x <- sum(with(compensation, Grazing == "Grazed")) #counts number of observations for variable when it = Grazed
x
## [1] 20
SE.mean.fruit<-summarise(
group_by(compensation, Grazing),
meanFruit = mean(Fruit),
SEfruit =(sd(Fruit))/sqrt(x)) #multiple statistics can be calculated within summarise
SE.mean.fruit
## # A tibble: 2 × 3
## Grazing meanFruit SEfruit
## <chr> <dbl> <dbl>
## 1 Grazed 67.9 5.58
## 2 Ungrazed 50.9 4.87