====== Data Science Toolbox ======

Course page: https://www.coursera.org/learn/data-scientists-tools
By Jeff Leek, PhD, Roger D. Peng, PhD, Brian Caffo, PhD


  * Get help from R: <code>?rnorm
help.search("rnorm")

# get the function arguments:
args("rnorm")

# See the code:
rnorm</code>
  * R reference card downloaded.
  * To ask for help on a problem with R: 
    - What steps to reproduce the problem ?
    - What did we expect to see ?
    - What did we see instead ?
    - What version of R and packages ?
    - What operating system ?
  * To ask for help on data analysis question:
    - What is the question we try to answer ?
    - What steps were used to answer the question ?
    - What did we expect to see ?
    - What did we see instead ?
    - What other solutions did we think about ?
  * Places to get info on data science questions:
    * stackoverflow
    * R mailing list
    * CrossValidated
    * Google: "[data type] data analysis" or "[data type] R package"

  * To checkout a local copy of a github repo: <code>cd my_folder
git init
git remote add origin https://github.com/roche-emmanue/my_repo.git</code>
  * Help on git and github from:
    * http://git-scm.com/doc
    * https://help.github.com/
    * Google/stackoverflow

  * Basic Markdown:
    * Headings with: <code>## This is a secondary heading
### This is tertiary heading</code>
    * Unordered lists: <code>* Item 1
* Item 2
* Item 3</code>

  * Types of data science questions:
    - Descriptive:
      * Just trying to describe the data
    - Exploratory:
      * Trying to find relationships (but not really trying to confirm)
    - Inferential:
      * Take small dataset and try to generalize that to a larger population.
    - Predictive:
      * X **predicts** Y, doesn't mean that X **causes** Y
    - Causal
      * What happen if we change the value of one variable...
    - Mechanistic

  * What is Data:
    * In a "set of items" = **population**
    * **Variables** = measurement of characteristics.
    * **qualitative** (discret scale) or **quantitative** (continuous scale)
    * Could be Raw file / API / Video / Audio
    * Is the second most important thing: (The question is the most important thing)

  * To share large amount of data: http://figshare.com

  * To avoid confounder we can fix the confounding variable, stratify variables, or randomize them if they cannot be fixed.