class: center, middle, title-slide # Best Practices in R ## What I wish I someone told me when I started using R ### Emil Hvitfeldt ### 2018-10-29 --- ## Overview -- - Before You Start Using R -- - Once You Start Writing R -- All references and additional information can be found in this repository https://github.com/EmilHvitfeldt/oRganized-talk --- class: center # Before You Start Using R --- ## Use Rstudio https://www.rstudio.com/products/rstudio/download/#download .center[![Download appropiate version of Rstudio](images/rstudio-download.png)] ??? Other alternatives. Emacs, Vim, Visual Studio. --- ## Change Settings .pull-left[ .medium[**Keyboard shortcut to open settings**] `⌘ + ,` in Mac OS, `ctrl + ,` in Windows ✓ - Uncheck "Restore .RData into work space at start up" ✓ - Set "Save work space to .Rdata on exit" to "Never" ] .pull-right[![Settings window](images/settings-general.png)] --- ## Change Appearance .pull-left[ Plenty choices of - Rstudio themes - Fonts - Font Sizes - Editor Themes Play around! Find something you love! ] .pull-right[![Settings window](images/settings-appearance.png)] --- ## Pane layouts .pull-left[ Change the layout of the panes Source on top? Source down to the right? Its all up to you! ] .pull-right[![Settings window](images/settings-pane-layout.png)] --- ## Pane layouts - My Setup I like having both source and console open .center[![:scale 80%](images/my-pane-layout.png)] --- ## Pane layouts - My Setup 2 ... while still allowing me to have plots/help/viewer open .center[![:scale 80%](images/my-pane-layout-recursive.png)] ??? Mention recursive viewer pane. --- ## RStudio Projects Keep all files from one project together. Use RStudio projects. -- - Self contained -- - Avoid overlapping -- - Project orientated ??? keep all the files associated with a project together — input data, R scripts, analytic results, figures. --- ## RStudio Projects - Creation 1 / 4 Keep all files from one project together. Use RStudio projects. .pull-left[ Click File > New Project ![Up right tick](images/rstudio-project-1.png) ] .pull-right[ Or click on the upper right ![Up right tick](images/rstudio-project-2.png) ] --- ## RStudio Projects - Creation 2 / 4 .center[![1](images/project-creation-1.png)] --- ## RStudio Projects - Creation 3 / 4 .center[![1](images/project-creation-2.png)] --- ## RStudio Projects - Creation 4 / 4 .center[![1](images/project-creation-3.png)] --- ## Folder Structure -- ```r name_of_project |--raw_data |--WhateverData.xlsx |--report_2017.csv |--output_data |--summary2017.csv |--rmd |--01-analysis.Rmd |--docs |--01-analysis.html |--01-analysis.pdf |--scripts |--exploratory_analysis.R |--pdf_scraper.R |--figures |--weather_2017.png |--name_of_project.Rproj |--run_all.R ``` --- ## Folder Structure Everything has a spot where it belongs. - Raw data separate from cleaned data - Reports and scrips are separated - Generated and imported figures has its own place - Numbered using 2 digits - Reusable and easily understandable ```r library(fs) folder_names <- c("raw_data", "output_data", "rmd", "docs", "scripts", "figures") dir_create(fldr_names) ``` ??? never modify raw data, only read (forever untouched) --- ## Paths ```r library(tidyverse) # data import data <- read_csv("/Users/Emil/Research/Health/amazing_data.csv") ``` --- ## Paths ```r library(tidyverse) # data import data <- read_csv("/Users/Emil/Research/Health/amazing_data.csv") ``` ``` ## Error: '/Users/Emil/Research/Health/amazing_data.csv' does not exist. ``` -- __Only use relative paths, never absolute paths__ -- Introducing the *here* package. ```r library(here) here() ``` ``` ## [1] "/Users/Emil/Research/Health" ``` ```r library(here) data <- read_csv(here("amazing_data.csv")) ``` --- ## Naming Things -- .center[![tweet about naming](images/naming-tweet.png)] ??? - Organization - Ease of use There will be multi slides about naming --- ## Naming Things - Files ### NO ```r report.pdf reportv2.pdf reportthisisthelastone.pages Figure 2.png 3465-234szx.r foo.R ``` ### YES ```r 2018-10-01_01_report-for-cdc.pdf 01_data.rmd 01_data.pdf 02_data-filtering.rmd 02_data-filtering.pdf ``` ??? Follow narrative from folder structure slide jenny Bryan naming things --- ## Naming Things - Files We want file names to be "machine readable" and "human readable". -- - Avoid spaces, punctuation, special characters and case sensitivity - Deliberate use of delimiters - File name should describe the **contents** of the file - Put something numeric first - Left pad numbers with zeroes - Use ISO 8601 standard for dates (YYYY-MM-DD) ??? to preserve chronological and logical ordering. --- ## Naming Things - Files ```r library(fs) dir_ls("data/", regexp = "health-study") ``` ``` ## 2018-02-23_health-study_power-100_group-A1.csv ## 2018-02-23_health-study_power-100_group-B1.csv ## 2018-02-23_health-study_power-100_group-C1.csv ## 2018-02-23_health-study_power-200_group-A1.csv ## 2018-02-23_health-study_power-200_group-B1.csv ## 2018-02-23_health-study_power-200_group-C1.csv ``` -- ```r stringr::str_split_fixed(x, "[_\\.]", 5) ``` ``` ## [,1] [,2] [,3] [,4] [,5] ## [1,] "2018-02-23" "health-study" "power-100" "group-A1" "csv" ## [2,] "2018-02-23" "health-study" "power-100" "group-B1" "csv" ## [3,] "2018-02-23" "health-study" "power-100" "group-C1" "csv" ## [4,] "2018-02-23" "health-study" "power-200" "group-A1" "csv" ## [5,] "2018-02-23" "health-study" "power-200" "group-B1" "csv" ## [6,] "2018-02-23" "health-study" "power-200" "group-C1" "csv" ``` ??? - Avoid spaces, punctuation, special characters and case sensitivity - Deliberate use of delimiters - File name should describe the **contents** of the file - Put something numeric first - Left pad numbers with zeroes - Use ISO 8601 standard for dates (YYYY-MM-DD) --- ## Naming Things - Files ```r library(fs) dir_ls("data/", regexp = "health-study") ``` ``` ## 2018-02-23_health-study_power-100_group-A1.csv ## 2018-02-23_health-study_power-100_group-B1.csv ## 2018-02-23_health-study_power-100_group-C1.csv ## 2018-02-23_health-study_power-200_group-A1.csv ## 2018-02-23_health-study_power-200_group-B1.csv ## 2018-02-23_health-study_power-200_group-C1.csv ``` -- ```r library(tidyverse) map_df(dir_ls("data/", regexp = "health-study"), read_csv) # or dir_ls("data/", regexp = "health-study") %>% map_df(read_csv) ``` ??? - Avoid spaces, punctuation, special characters and case sensitivity - Deliberate use of delimiters - File name should describe the **contents** of the file - Put something numeric first - Left pad numbers with zeroes - Use ISO 8601 standard for dates (YYYY-MM-DD) --- ## Naming Things - Objects - Only use lowercase letters, numbers, and `_` - Use names that are not jargony, **weight** instead of **K** - Use informative names -- ```r # Bad df e tuningVar # Good health_data error tuning_var ``` ??? lowercase letters + numbers = alpha-numeric characters (ish) --- class: center # Once You Start Writing R --- ## What To Avoid - attach() Never use `attach()` -- ```r attach(mtcars) mean(mpg) ``` ``` ## [1] 20.09062 ``` Loading lots of names into the search path, ambiguous selections. --- ## What To Avoid - attach() Never use `rm(list=ls())` -- Will not reset packages, reset options, ... Restart the R session CTRL+SHIFT+F10 for Windows CMD+SHIFT+ALT+F10 for Mac OS --- ## R Markdown Most of your documents can be written in R Markdown. .center[![rmarkdown](images/rmarkdown-flow.png)] Using the basics of markdown with the addition of R code chunks. ??? Reference Folder structure web page, PDF, MS Word document, slide show, handout, book, dashboard, package vignette or other format. rticles package no more copy pasting results into your document. --- ## R Markdown **R Markdown documents versus R scripts** Use R scripts for simple self contained tasks. `source()` R scripts into your R Markdown document where you will do analyses, visualizations and reporting. --- ## R Markdown ```r - 01-import.R - 02-clean-names.R - 03-tidy.R - etc ``` -- Include at the start of R Markdown file ```r {r load_scripts, include = FALSE} library(here) source(here("scripts", "01-import.R")) source(here("scripts", "02-clean-names.R")) source(here("scripts", "03-tidy.R")) ``` ??? --- ## Naming Chunks Names can be placed after the comma ````markdown ```{r, chunk-label, results='hide', fig.height=4} ```` or before ````markdown ```{r chunk-label, results='hide', fig.height=4} ```` > In general it is recommended to use alphabetic characters with words separated by - and avoid other characters. - Yihui Xie --- ## Naming Chunks - Make navigating the R Markdown document easier - Make your R Markdown easier to understand - Clarifies error reports or progress of knitting - Caching when moving chunks around - Feels good ??? Lower left corner of Rstudio have menu where sections and chunks can be selected with. Caching on unnamed chunks are based on numbering. --- ## Setup Chunk In a fresh R Markdown document you see this ````markdown ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ```` -- The setup chuck is run before another code - use to your advantage --- ## Chunk Options > Defaults will rarely work for you 100% of the time. -- Set `echo = TRUE`, knit document with code. Set `echo = FALSE`, reknit document with the code hidden. -- .small[ - eval - echo - results - collapse - fig.path - warning - error - message ] ??? Working with colleagues that knows R Working with colleagues that don't --- ## Chunk Options -- ````markdown ```{r setup, include=FALSE} knitr::opts_chunk$set(fig.path = "figures/") ```` Will save all figures in the the folder figures with the chunk name. ??? highlight use of fig.path option fig.path: ('figure/'; character) prefix to be used for figure filenames (fig.path and chunk labels are concatenated to make filenames) --- ## Styling Code Use consistent style when writing code -- [http://style.tidyverse.org/](http://style.tidyverse.org/) -- All about preferences but keep it consistent!!! ??? Give examples of styles to follow --- ## Keep .Rprofile Clean Your project contains a file called .Rprofile. This file runs first in every session. Think of it as configuration file. -- ```r options(stringsAsFactors = FALSE) options(max.print = 100) ``` -- _PANIC!!!_ Code will only work for you! Avoid putting output altering code in .Rprofile! ??? Use it to change options and load packages --- ## Comment Your Code - Functions: Arguments and purpose - Code: What or why, NOT how -- ```r # Takes a data.frame (data) and replaces the columns with the names # (names) and converts them from factor variable to character # variables. Keeps characters variables unchanged. factor_to_text <- function(data, names) { for (i in seq_along(names)) { if(is.factor(data[, names[i], drop = TRUE])) data[, names[i]] <- as.character.factor(data[, names[i], drop = TRUE]) } data } ``` --- ## Should You Write a Function? > Once you’ve written the same code 3 times, write a function ```r mtcars %>% mutate(mpg = mpg / 2.5 + 2, cyl = cyl / 2.5 + 2, hp = hp / 2.5 + 2) ``` -- You should have added 3 instead of 2 ```r mtcars %>% mutate(mpg = mpg / 2.5 + 2, cyl = cyl / 2.5 + 3, hp = hp / 2.5 + 3) ``` --- ## Should You Write a Function? > Once you’ve written the same code 3 times, write a function -- ```r scale_fun <- function(x) x / 2.5 + 3 mtcars %>% mutate(mpg = scale_fun(mpg), cyl = scale_fun(cyl), hp = scale_fun(hp)) ``` --- ## Testing Use tests in your pipeline to check assumptions. ```r library(tidyverse) data <- fancy_gov_data_api("health", year = 2018, month = 10) data %>% group_by(county) %>% summarize(happiness = (- death + love) / population) ``` ??? Say you are downloading data from government API. --- ## Testing __testhat__ is primarily used for unit testing in R packages. Contains large collection of `expect_` functions. - expect_equal - expect_equivalent - expect_error - expect_length - expect_named - ... --- ## Testing Use tests in your pipeline to check assumptions. ```r library(tidyverse) library(testthat) data <- fancy_gov_data_api("health", year = 2018, month = 10) expect_length(unique(data$county), 3007) expect_gt(mean(data$love == NA), 0.1) data %>% group_by(county) %>% summarize(happiness = (- death + love) / population) ``` ??? Say you are downloading data from government API.