All references and additional information can be found in this repository
https://github.com/EmilHvitfeldt/oRganized-talk
Other alternatives. Emacs, Vim, Visual Studio.
Keyboard shortcut to open settings⌘ + ,
in Mac OS,ctrl + ,
in Windows
✓ - Uncheck "Restore .RData into work space at start up"
✓ - Set "Save work space to .Rdata on exit" to "Never"
Plenty choices of
Play around! Find something you love!
Change the layout of the panes
Source on top?
Source down to the right?
Its all up to you!
I like having both source and console open
... while still allowing me to have plots/help/viewer open
Mention recursive viewer pane.
Keep all files from one project together. Use RStudio projects.
Keep all files from one project together. Use RStudio projects.
Keep all files from one project together. Use RStudio projects.
Keep all files from one project together. Use RStudio projects.
keep all the files associated with a project together — input data, R scripts, analytic results, figures.
Keep all files from one project together. Use RStudio projects.
Click File > New Project
Or click on the upper right
name_of_project|--raw_data |--WhateverData.xlsx |--report_2017.csv|--output_data |--summary2017.csv|--rmd |--01-analysis.Rmd|--docs |--01-analysis.html |--01-analysis.pdf|--scripts |--exploratory_analysis.R |--pdf_scraper.R|--figures |--weather_2017.png|--name_of_project.Rproj|--run_all.R
Everything has a spot where it belongs.
library(fs)folder_names <- c("raw_data", "output_data", "rmd", "docs", "scripts", "figures")dir_create(fldr_names)
never modify raw data, only read (forever untouched)
library(tidyverse)# data importdata <- read_csv("/Users/Emil/Research/Health/amazing_data.csv")
library(tidyverse)# data importdata <- read_csv("/Users/Emil/Research/Health/amazing_data.csv")
## Error: '/Users/Emil/Research/Health/amazing_data.csv' does not exist.
library(tidyverse)# data importdata <- read_csv("/Users/Emil/Research/Health/amazing_data.csv")
## Error: '/Users/Emil/Research/Health/amazing_data.csv' does not exist.
Only use relative paths, never absolute paths
library(tidyverse)# data importdata <- read_csv("/Users/Emil/Research/Health/amazing_data.csv")
## Error: '/Users/Emil/Research/Health/amazing_data.csv' does not exist.
Only use relative paths, never absolute paths
Introducing the here package.
library(here)here()
## [1] "/Users/Emil/Research/Health"
library(here)data <- read_csv(here("amazing_data.csv"))
report.pdfreportv2.pdfreportthisisthelastone.pagesFigure 2.png 3465-234szx.rfoo.R
2018-10-01_01_report-for-cdc.pdf01_data.rmd01_data.pdf02_data-filtering.rmd02_data-filtering.pdf
Follow narrative from folder structure slide
jenny Bryan naming things
We want file names to be "machine readable" and "human readable".
We want file names to be "machine readable" and "human readable".
to preserve chronological and logical ordering.
library(fs)dir_ls("data/", regexp = "health-study")
## 2018-02-23_health-study_power-100_group-A1.csv## 2018-02-23_health-study_power-100_group-B1.csv## 2018-02-23_health-study_power-100_group-C1.csv## 2018-02-23_health-study_power-200_group-A1.csv## 2018-02-23_health-study_power-200_group-B1.csv## 2018-02-23_health-study_power-200_group-C1.csv
library(fs)dir_ls("data/", regexp = "health-study")
## 2018-02-23_health-study_power-100_group-A1.csv## 2018-02-23_health-study_power-100_group-B1.csv## 2018-02-23_health-study_power-100_group-C1.csv## 2018-02-23_health-study_power-200_group-A1.csv## 2018-02-23_health-study_power-200_group-B1.csv## 2018-02-23_health-study_power-200_group-C1.csv
stringr::str_split_fixed(x, "[_\\.]", 5)
## [,1] [,2] [,3] [,4] [,5] ## [1,] "2018-02-23" "health-study" "power-100" "group-A1" "csv"## [2,] "2018-02-23" "health-study" "power-100" "group-B1" "csv"## [3,] "2018-02-23" "health-study" "power-100" "group-C1" "csv"## [4,] "2018-02-23" "health-study" "power-200" "group-A1" "csv"## [5,] "2018-02-23" "health-study" "power-200" "group-B1" "csv"## [6,] "2018-02-23" "health-study" "power-200" "group-C1" "csv"
library(fs)dir_ls("data/", regexp = "health-study")
## 2018-02-23_health-study_power-100_group-A1.csv## 2018-02-23_health-study_power-100_group-B1.csv## 2018-02-23_health-study_power-100_group-C1.csv## 2018-02-23_health-study_power-200_group-A1.csv## 2018-02-23_health-study_power-200_group-B1.csv## 2018-02-23_health-study_power-200_group-C1.csv
library(fs)dir_ls("data/", regexp = "health-study")
## 2018-02-23_health-study_power-100_group-A1.csv## 2018-02-23_health-study_power-100_group-B1.csv## 2018-02-23_health-study_power-100_group-C1.csv## 2018-02-23_health-study_power-200_group-A1.csv## 2018-02-23_health-study_power-200_group-B1.csv## 2018-02-23_health-study_power-200_group-C1.csv
library(tidyverse)map_df(dir_ls("data/", regexp = "health-study"), read_csv)# ordir_ls("data/", regexp = "health-study") %>% map_df(read_csv)
_
_
# BaddfetuningVar# Goodhealth_dataerrortuning_var
lowercase letters + numbers = alpha-numeric characters (ish)
Never use attach()
Never use attach()
attach(mtcars)mean(mpg)
## [1] 20.09062
Loading lots of names into the search path, ambiguous selections.
Never use rm(list=ls())
Never use rm(list=ls())
Will not reset packages, reset options, ...
Restart the R session
CTRL+SHIFT+F10 for Windows
CMD+SHIFT+ALT+F10 for Mac OS
Most of your documents can be written in R Markdown.
Using the basics of markdown with the addition of R code chunks.
Reference Folder structure web page, PDF, MS Word document, slide show, handout, book, dashboard, package vignette or other format. rticles package
no more copy pasting results into your document.
R Markdown documents versus R scripts
Use R scripts for simple self contained tasks.
source()
R scripts into your R Markdown document where you will do analyses, visualizations and reporting.
- 01-import.R- 02-clean-names.R- 03-tidy.R- etc
- 01-import.R- 02-clean-names.R- 03-tidy.R- etc
Include at the start of R Markdown file
{r load_scripts, include = FALSE}library(here)source(here("scripts", "01-import.R"))source(here("scripts", "02-clean-names.R"))source(here("scripts", "03-tidy.R"))
Names can be placed after the comma
```{r, chunk-label, results='hide', fig.height=4}
or before
```{r chunk-label, results='hide', fig.height=4}
In general it is recommended to use alphabetic characters with words separated by - and avoid other characters. - Yihui Xie
Lower left corner of Rstudio have menu where sections and chunks can be selected with.
Caching on unnamed chunks are based on numbering.
In a fresh R Markdown document you see this
```{r setup, include=FALSE}knitr::opts_chunk$set(echo = TRUE)
In a fresh R Markdown document you see this
```{r setup, include=FALSE}knitr::opts_chunk$set(echo = TRUE)
The setup chuck is run before another code - use to your advantage
Defaults will rarely work for you 100% of the time.
Defaults will rarely work for you 100% of the time.
Set echo = TRUE
, knit document with code.
Set echo = FALSE
, reknit document with the code hidden.
Defaults will rarely work for you 100% of the time.
Set echo = TRUE
, knit document with code.
Set echo = FALSE
, reknit document with the code hidden.
Working with colleagues that knows R Working with colleagues that don't
```{r setup, include=FALSE}knitr::opts_chunk$set(fig.path = "figures/")
Will save all figures in the the folder figures with the chunk name.
highlight use of fig.path option
fig.path: ('figure/'; character) prefix to be used for figure filenames (fig.path and chunk labels are concatenated to make filenames)
Use consistent style when writing code
Use consistent style when writing code
All about preferences but keep it consistent!!!
Give examples of styles to follow
Your project contains a file called .Rprofile.
This file runs first in every session. Think of it as configuration file.
Your project contains a file called .Rprofile.
This file runs first in every session. Think of it as configuration file.
options(stringsAsFactors = FALSE)options(max.print = 100)
Your project contains a file called .Rprofile.
This file runs first in every session. Think of it as configuration file.
options(stringsAsFactors = FALSE)options(max.print = 100)
PANIC!!! Code will only work for you!
Avoid putting output altering code in .Rprofile!
Use it to change options and load packages
# Takes a data.frame (data) and replaces the columns with the names# (names) and converts them from factor variable to character # variables. Keeps characters variables unchanged.factor_to_text <- function(data, names) { for (i in seq_along(names)) { if(is.factor(data[, names[i], drop = TRUE])) data[, names[i]] <- as.character.factor(data[, names[i], drop = TRUE]) } data}
Once you’ve written the same code 3 times, write a function
mtcars %>% mutate(mpg = mpg / 2.5 + 2, cyl = cyl / 2.5 + 2, hp = hp / 2.5 + 2)
Once you’ve written the same code 3 times, write a function
mtcars %>% mutate(mpg = mpg / 2.5 + 2, cyl = cyl / 2.5 + 2, hp = hp / 2.5 + 2)
You should have added 3 instead of 2
mtcars %>% mutate(mpg = mpg / 2.5 + 2, cyl = cyl / 2.5 + 3, hp = hp / 2.5 + 3)
Once you’ve written the same code 3 times, write a function
Once you’ve written the same code 3 times, write a function
scale_fun <- function(x) x / 2.5 + 3mtcars %>% mutate(mpg = scale_fun(mpg), cyl = scale_fun(cyl), hp = scale_fun(hp))
Use tests in your pipeline to check assumptions.
library(tidyverse)data <- fancy_gov_data_api("health", year = 2018, month = 10)data %>% group_by(county) %>% summarize(happiness = (- death + love) / population)
Say you are downloading data from government API.
testhat is primarily used for unit testing in R packages.
Contains large collection of expect_
functions.
Use tests in your pipeline to check assumptions.
library(tidyverse)library(testthat)data <- fancy_gov_data_api("health", year = 2018, month = 10)expect_length(unique(data$county), 3007)expect_gt(mean(data$love == NA), 0.1)data %>% group_by(county) %>% summarize(happiness = (- death + love) / population)
Say you are downloading data from government API.
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |