+ - 0:00:00
Notes for current slide
Notes for next slide

Best Practices in R

What I wish I someone told me when I started using R

Emil Hvitfeldt

2018-10-29

1 / 43

Overview

2 / 43

Overview

  • Before You Start Using R
2 / 43

Overview

  • Before You Start Using R
  • Once You Start Writing R
2 / 43

Overview

  • Before You Start Using R
  • Once You Start Writing R

All references and additional information can be found in this repository

https://github.com/EmilHvitfeldt/oRganized-talk

2 / 43

Before You Start Using R

3 / 43

Other alternatives. Emacs, Vim, Visual Studio.

Change Settings

Keyboard shortcut to open settings
⌘ + , in Mac OS,
ctrl + , in Windows

✓ - Uncheck "Restore .RData into work space at start up"

✓ - Set "Save work space to .Rdata on exit" to "Never"

Settings window

5 / 43

Change Appearance

Plenty choices of

  • Rstudio themes
  • Fonts
  • Font Sizes
  • Editor Themes

Play around! Find something you love!

Settings window

6 / 43

Pane layouts

Change the layout of the panes

Source on top?
Source down to the right?
Its all up to you!

Settings window

7 / 43

Pane layouts - My Setup

I like having both source and console open

8 / 43

Pane layouts - My Setup 2

... while still allowing me to have plots/help/viewer open

9 / 43

Mention recursive viewer pane.

RStudio Projects

Keep all files from one project together. Use RStudio projects.

10 / 43

RStudio Projects

Keep all files from one project together. Use RStudio projects.

  • Self contained
10 / 43

RStudio Projects

Keep all files from one project together. Use RStudio projects.

  • Self contained
  • Avoid overlapping
10 / 43

RStudio Projects

Keep all files from one project together. Use RStudio projects.

  • Self contained
  • Avoid overlapping
  • Project orientated
10 / 43

keep all the files associated with a project together — input data, R scripts, analytic results, figures.

RStudio Projects - Creation 1 / 4

Keep all files from one project together. Use RStudio projects.

Click File > New Project
Up right tick

Or click on the upper right Up right tick

11 / 43

RStudio Projects - Creation 2 / 4

1

12 / 43

RStudio Projects - Creation 3 / 4

1

13 / 43

RStudio Projects - Creation 4 / 4

1

14 / 43

Folder Structure

15 / 43

Folder Structure

name_of_project
|--raw_data
|--WhateverData.xlsx
|--report_2017.csv
|--output_data
|--summary2017.csv
|--rmd
|--01-analysis.Rmd
|--docs
|--01-analysis.html
|--01-analysis.pdf
|--scripts
|--exploratory_analysis.R
|--pdf_scraper.R
|--figures
|--weather_2017.png
|--name_of_project.Rproj
|--run_all.R
15 / 43

Folder Structure

Everything has a spot where it belongs.

  • Raw data separate from cleaned data
  • Reports and scrips are separated
  • Generated and imported figures has its own place
  • Numbered using 2 digits
  • Reusable and easily understandable
library(fs)
folder_names <- c("raw_data", "output_data", "rmd", "docs",
"scripts", "figures")
dir_create(fldr_names)
16 / 43

never modify raw data, only read (forever untouched)

Paths

library(tidyverse)
# data import
data <- read_csv("/Users/Emil/Research/Health/amazing_data.csv")
17 / 43

Paths

library(tidyverse)
# data import
data <- read_csv("/Users/Emil/Research/Health/amazing_data.csv")
## Error: '/Users/Emil/Research/Health/amazing_data.csv' does not exist.
18 / 43

Paths

library(tidyverse)
# data import
data <- read_csv("/Users/Emil/Research/Health/amazing_data.csv")
## Error: '/Users/Emil/Research/Health/amazing_data.csv' does not exist.

Only use relative paths, never absolute paths

18 / 43

Paths

library(tidyverse)
# data import
data <- read_csv("/Users/Emil/Research/Health/amazing_data.csv")
## Error: '/Users/Emil/Research/Health/amazing_data.csv' does not exist.

Only use relative paths, never absolute paths

Introducing the here package.

library(here)
here()
## [1] "/Users/Emil/Research/Health"
library(here)
data <- read_csv(here("amazing_data.csv"))
18 / 43

Naming Things

19 / 43

Naming Things

tweet about naming

19 / 43
  • Organization
  • Ease of use
    There will be multi slides about naming

Naming Things - Files

NO

report.pdf
reportv2.pdf
reportthisisthelastone.pages
Figure 2.png
3465-234szx.r
foo.R

YES

2018-10-01_01_report-for-cdc.pdf
01_data.rmd
01_data.pdf
02_data-filtering.rmd
02_data-filtering.pdf
20 / 43

Follow narrative from folder structure slide
jenny Bryan naming things

Naming Things - Files

We want file names to be "machine readable" and "human readable".

21 / 43

Naming Things - Files

We want file names to be "machine readable" and "human readable".

  • Avoid spaces, punctuation, special characters and case sensitivity
  • Deliberate use of delimiters
  • File name should describe the contents of the file
  • Put something numeric first
  • Left pad numbers with zeroes
  • Use ISO 8601 standard for dates (YYYY-MM-DD)
21 / 43

to preserve chronological and logical ordering.

Naming Things - Files

library(fs)
dir_ls("data/", regexp = "health-study")
## 2018-02-23_health-study_power-100_group-A1.csv
## 2018-02-23_health-study_power-100_group-B1.csv
## 2018-02-23_health-study_power-100_group-C1.csv
## 2018-02-23_health-study_power-200_group-A1.csv
## 2018-02-23_health-study_power-200_group-B1.csv
## 2018-02-23_health-study_power-200_group-C1.csv
22 / 43

Naming Things - Files

library(fs)
dir_ls("data/", regexp = "health-study")
## 2018-02-23_health-study_power-100_group-A1.csv
## 2018-02-23_health-study_power-100_group-B1.csv
## 2018-02-23_health-study_power-100_group-C1.csv
## 2018-02-23_health-study_power-200_group-A1.csv
## 2018-02-23_health-study_power-200_group-B1.csv
## 2018-02-23_health-study_power-200_group-C1.csv
stringr::str_split_fixed(x, "[_\\.]", 5)
## [,1] [,2] [,3] [,4] [,5]
## [1,] "2018-02-23" "health-study" "power-100" "group-A1" "csv"
## [2,] "2018-02-23" "health-study" "power-100" "group-B1" "csv"
## [3,] "2018-02-23" "health-study" "power-100" "group-C1" "csv"
## [4,] "2018-02-23" "health-study" "power-200" "group-A1" "csv"
## [5,] "2018-02-23" "health-study" "power-200" "group-B1" "csv"
## [6,] "2018-02-23" "health-study" "power-200" "group-C1" "csv"
22 / 43
  • Avoid spaces, punctuation, special characters and case sensitivity
  • Deliberate use of delimiters
  • File name should describe the contents of the file
  • Put something numeric first
  • Left pad numbers with zeroes
  • Use ISO 8601 standard for dates (YYYY-MM-DD)

Naming Things - Files

library(fs)
dir_ls("data/", regexp = "health-study")
## 2018-02-23_health-study_power-100_group-A1.csv
## 2018-02-23_health-study_power-100_group-B1.csv
## 2018-02-23_health-study_power-100_group-C1.csv
## 2018-02-23_health-study_power-200_group-A1.csv
## 2018-02-23_health-study_power-200_group-B1.csv
## 2018-02-23_health-study_power-200_group-C1.csv
23 / 43

Naming Things - Files

library(fs)
dir_ls("data/", regexp = "health-study")
## 2018-02-23_health-study_power-100_group-A1.csv
## 2018-02-23_health-study_power-100_group-B1.csv
## 2018-02-23_health-study_power-100_group-C1.csv
## 2018-02-23_health-study_power-200_group-A1.csv
## 2018-02-23_health-study_power-200_group-B1.csv
## 2018-02-23_health-study_power-200_group-C1.csv
library(tidyverse)
map_df(dir_ls("data/", regexp = "health-study"), read_csv)
# or
dir_ls("data/", regexp = "health-study") %>%
map_df(read_csv)
23 / 43
  • Avoid spaces, punctuation, special characters and case sensitivity
  • Deliberate use of delimiters
  • File name should describe the contents of the file
  • Put something numeric first
  • Left pad numbers with zeroes
  • Use ISO 8601 standard for dates (YYYY-MM-DD)

Naming Things - Objects

  • Only use lowercase letters, numbers, and _
  • Use names that are not jargony, weight instead of K
  • Use informative names
24 / 43

Naming Things - Objects

  • Only use lowercase letters, numbers, and _
  • Use names that are not jargony, weight instead of K
  • Use informative names
# Bad
df
e
tuningVar
# Good
health_data
error
tuning_var
24 / 43

lowercase letters + numbers = alpha-numeric characters (ish)

Once You Start Writing R

25 / 43

What To Avoid - attach()

Never use attach()

26 / 43

What To Avoid - attach()

Never use attach()

attach(mtcars)
mean(mpg)
## [1] 20.09062

Loading lots of names into the search path, ambiguous selections.

26 / 43

What To Avoid - attach()

Never use rm(list=ls())

27 / 43

What To Avoid - attach()

Never use rm(list=ls())

Will not reset packages, reset options, ...

Restart the R session

CTRL+SHIFT+F10 for Windows
CMD+SHIFT+ALT+F10 for Mac OS

27 / 43

R Markdown

Most of your documents can be written in R Markdown.

rmarkdown

Using the basics of markdown with the addition of R code chunks.

28 / 43

Reference Folder structure web page, PDF, MS Word document, slide show, handout, book, dashboard, package vignette or other format. rticles package

no more copy pasting results into your document.

R Markdown

R Markdown documents versus R scripts

Use R scripts for simple self contained tasks.

source() R scripts into your R Markdown document where you will do analyses, visualizations and reporting.

29 / 43

R Markdown

- 01-import.R
- 02-clean-names.R
- 03-tidy.R
- etc
30 / 43

R Markdown

- 01-import.R
- 02-clean-names.R
- 03-tidy.R
- etc

Include at the start of R Markdown file

{r load_scripts, include = FALSE}
library(here)
source(here("scripts", "01-import.R"))
source(here("scripts", "02-clean-names.R"))
source(here("scripts", "03-tidy.R"))
30 / 43

Naming Chunks

Names can be placed after the comma

```{r, chunk-label, results='hide', fig.height=4}

or before

```{r chunk-label, results='hide', fig.height=4}

In general it is recommended to use alphabetic characters with words separated by - and avoid other characters. - Yihui Xie

31 / 43

Naming Chunks

  • Make navigating the R Markdown document easier
  • Make your R Markdown easier to understand
  • Clarifies error reports or progress of knitting
  • Caching when moving chunks around
  • Feels good
32 / 43

Lower left corner of Rstudio have menu where sections and chunks can be selected with.

Caching on unnamed chunks are based on numbering.

Setup Chunk

In a fresh R Markdown document you see this

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
33 / 43

Setup Chunk

In a fresh R Markdown document you see this

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)

The setup chuck is run before another code - use to your advantage

33 / 43

Chunk Options

Defaults will rarely work for you 100% of the time.

34 / 43

Chunk Options

Defaults will rarely work for you 100% of the time.

Set echo = TRUE, knit document with code.
Set echo = FALSE, reknit document with the code hidden.

34 / 43

Chunk Options

Defaults will rarely work for you 100% of the time.

Set echo = TRUE, knit document with code.
Set echo = FALSE, reknit document with the code hidden.

  • eval
  • echo
  • results
  • collapse
  • fig.path
  • warning
  • error
  • message
34 / 43

Working with colleagues that knows R Working with colleagues that don't

Chunk Options

35 / 43

Chunk Options

```{r setup, include=FALSE}
knitr::opts_chunk$set(fig.path = "figures/")

Will save all figures in the the folder figures with the chunk name.

35 / 43

highlight use of fig.path option

fig.path: ('figure/'; character) prefix to be used for figure filenames (fig.path and chunk labels are concatenated to make filenames)

Styling Code

Use consistent style when writing code

36 / 43

Styling Code

Use consistent style when writing code

http://style.tidyverse.org/

36 / 43

Styling Code

Use consistent style when writing code

http://style.tidyverse.org/

All about preferences but keep it consistent!!!

36 / 43

Give examples of styles to follow

Keep .Rprofile Clean

Your project contains a file called .Rprofile.

This file runs first in every session. Think of it as configuration file.

37 / 43

Keep .Rprofile Clean

Your project contains a file called .Rprofile.

This file runs first in every session. Think of it as configuration file.

options(stringsAsFactors = FALSE)
options(max.print = 100)
37 / 43

Keep .Rprofile Clean

Your project contains a file called .Rprofile.

This file runs first in every session. Think of it as configuration file.

options(stringsAsFactors = FALSE)
options(max.print = 100)

PANIC!!! Code will only work for you!
Avoid putting output altering code in .Rprofile!

37 / 43

Use it to change options and load packages

Comment Your Code

  • Functions: Arguments and purpose
  • Code: What or why, NOT how
38 / 43

Comment Your Code

  • Functions: Arguments and purpose
  • Code: What or why, NOT how
# Takes a data.frame (data) and replaces the columns with the names
# (names) and converts them from factor variable to character
# variables. Keeps characters variables unchanged.
factor_to_text <- function(data, names) {
for (i in seq_along(names)) {
if(is.factor(data[, names[i], drop = TRUE]))
data[, names[i]] <- as.character.factor(data[, names[i],
drop = TRUE])
}
data
}
38 / 43

Should You Write a Function?

Once you’ve written the same code 3 times, write a function

mtcars %>%
mutate(mpg = mpg / 2.5 + 2,
cyl = cyl / 2.5 + 2,
hp = hp / 2.5 + 2)
39 / 43

Should You Write a Function?

Once you’ve written the same code 3 times, write a function

mtcars %>%
mutate(mpg = mpg / 2.5 + 2,
cyl = cyl / 2.5 + 2,
hp = hp / 2.5 + 2)

You should have added 3 instead of 2

mtcars %>%
mutate(mpg = mpg / 2.5 + 2,
cyl = cyl / 2.5 + 3,
hp = hp / 2.5 + 3)
39 / 43

Should You Write a Function?

Once you’ve written the same code 3 times, write a function

40 / 43

Should You Write a Function?

Once you’ve written the same code 3 times, write a function

scale_fun <- function(x) x / 2.5 + 3
mtcars %>%
mutate(mpg = scale_fun(mpg),
cyl = scale_fun(cyl),
hp = scale_fun(hp))
40 / 43

Testing

Use tests in your pipeline to check assumptions.

library(tidyverse)
data <- fancy_gov_data_api("health", year = 2018, month = 10)
data %>%
group_by(county) %>%
summarize(happiness = (- death + love) / population)
41 / 43

Say you are downloading data from government API.

Testing

testhat is primarily used for unit testing in R packages.

Contains large collection of expect_ functions.

  • expect_equal
  • expect_equivalent
  • expect_error
  • expect_length
  • expect_named
  • ...
42 / 43

Testing

Use tests in your pipeline to check assumptions.

library(tidyverse)
library(testthat)
data <- fancy_gov_data_api("health", year = 2018, month = 10)
expect_length(unique(data$county), 3007)
expect_gt(mean(data$love == NA), 0.1)
data %>%
group_by(county) %>%
summarize(happiness = (- death + love) / population)
43 / 43

Say you are downloading data from government API.

Overview

2 / 43
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow