====== Getting and Cleaning Data ====== Course page: https://class.coursera.org/getdata-034 By Jeff Leek, PhD, Roger D. Peng, PhD, Brian Caffo, PhD * Basic concepts: * Find and extract raw data * Tidy data principles * practical R packages * Interesting datasets from: https://data.baltimorecity.gov/ * Pipeline: **Raw data -> Processing script -> tidy data** -> data analysis -> data communication * **Components of tidy data** * Raw data: can have multiple levels * Tidy data * Should produce a code book (metadata): * could be in markdown * should have a section called "Study design" (eg. how raw data was collected) * must have section "Code cook": description of each variable and its units * Explicit and exact recipe to go from raw to tidy (instruction list) * R script * input = raw data, output = processed data * no parameter for script