====== Data Science Toolbox ====== Course page: https://www.coursera.org/learn/data-scientists-tools By Jeff Leek, PhD, Roger D. Peng, PhD, Brian Caffo, PhD * Get help from R: ?rnorm help.search("rnorm") # get the function arguments: args("rnorm") # See the code: rnorm * R reference card downloaded. * To ask for help on a problem with R: - What steps to reproduce the problem ? - What did we expect to see ? - What did we see instead ? - What version of R and packages ? - What operating system ? * To ask for help on data analysis question: - What is the question we try to answer ? - What steps were used to answer the question ? - What did we expect to see ? - What did we see instead ? - What other solutions did we think about ? * Places to get info on data science questions: * stackoverflow * R mailing list * CrossValidated * Google: "[data type] data analysis" or "[data type] R package" * To checkout a local copy of a github repo: cd my_folder git init git remote add origin https://github.com/roche-emmanue/my_repo.git * Help on git and github from: * http://git-scm.com/doc * https://help.github.com/ * Google/stackoverflow * Basic Markdown: * Headings with: ## This is a secondary heading ### This is tertiary heading * Unordered lists: * Item 1 * Item 2 * Item 3 * Types of data science questions: - Descriptive: * Just trying to describe the data - Exploratory: * Trying to find relationships (but not really trying to confirm) - Inferential: * Take small dataset and try to generalize that to a larger population. - Predictive: * X **predicts** Y, doesn't mean that X **causes** Y - Causal * What happen if we change the value of one variable... - Mechanistic * What is Data: * In a "set of items" = **population** * **Variables** = measurement of characteristics. * **qualitative** (discret scale) or **quantitative** (continuous scale) * Could be Raw file / API / Video / Audio * Is the second most important thing: (The question is the most important thing) * To share large amount of data: http://figshare.com * To avoid confounder we can fix the confounding variable, stratify variables, or randomize them if they cannot be fixed.