Show pageOld revisionsBacklinksBack to top This page is read only. You can view the source, but not change it. Ask your administrator if you think this is wrong. ====== Getting and Cleaning Data ====== Course page: https://class.coursera.org/getdata-034 By Jeff Leek, PhD, Roger D. Peng, PhD, Brian Caffo, PhD * Basic concepts: * Find and extract raw data * Tidy data principles * practical R packages * Interesting datasets from: https://data.baltimorecity.gov/ * Pipeline: **Raw data -> Processing script -> tidy data** -> data analysis -> data communication * **Components of tidy data** * Raw data: can have multiple levels * Tidy data * Should produce a code book (metadata): * could be in markdown * should have a section called "Study design" (eg. how raw data was collected) * must have section "Code cook": description of each variable and its units * Explicit and exact recipe to go from raw to tidy (instruction list) * R script * input = raw data, output = processed data * no parameter for script public/courses/data_science/getting_and_cleaning_data/intro.txt Last modified: 2020/07/10 12:11by 127.0.0.1