Data Science Toolbox

Course page: https://www.coursera.org/learn/data-scientists-tools By Jeff Leek, PhD, Roger D. Peng, PhD, Brian Caffo, PhD

  • Get help from R:
    ?rnorm
    help.search("rnorm")
    
    # get the function arguments:
    args("rnorm")
    
    # See the code:
    rnorm
  • R reference card downloaded.
  • To ask for help on a problem with R:
    1. What steps to reproduce the problem ?
    2. What did we expect to see ?
    3. What did we see instead ?
    4. What version of R and packages ?
    5. What operating system ?
  • To ask for help on data analysis question:
    1. What is the question we try to answer ?
    2. What steps were used to answer the question ?
    3. What did we expect to see ?
    4. What did we see instead ?
    5. What other solutions did we think about ?
  • Places to get info on data science questions:
    • stackoverflow
    • R mailing list
    • CrossValidated
    • Google: “[data type] data analysis” or “[data type] R package”
  • To checkout a local copy of a github repo:
    cd my_folder
    git init
    git remote add origin https://github.com/roche-emmanue/my_repo.git
  • Help on git and github from:
  • Basic Markdown:
    • Headings with:
      ## This is a secondary heading
      ### This is tertiary heading
    • Unordered lists:
      * Item 1
      * Item 2
      * Item 3
  • Types of data science questions:
    1. Descriptive:
      • Just trying to describe the data
    2. Exploratory:
      • Trying to find relationships (but not really trying to confirm)
    3. Inferential:
      • Take small dataset and try to generalize that to a larger population.
    4. Predictive:
      • X predicts Y, doesn't mean that X causes Y
    5. Causal
      • What happen if we change the value of one variable…
    6. Mechanistic
  • What is Data:
    • In a “set of items” = population
    • Variables = measurement of characteristics.
    • qualitative (discret scale) or quantitative (continuous scale)
    • Could be Raw file / API / Video / Audio
    • Is the second most important thing: (The question is the most important thing)
  • To avoid confounder we can fix the confounding variable, stratify variables, or randomize them if they cannot be fixed.