====== Data Science Toolbox ======
Course page: https://www.coursera.org/learn/data-scientists-tools
By Jeff Leek, PhD, Roger D. Peng, PhD, Brian Caffo, PhD
* Get help from R: ?rnorm
help.search("rnorm")
# get the function arguments:
args("rnorm")
# See the code:
rnorm
* R reference card downloaded.
* To ask for help on a problem with R:
- What steps to reproduce the problem ?
- What did we expect to see ?
- What did we see instead ?
- What version of R and packages ?
- What operating system ?
* To ask for help on data analysis question:
- What is the question we try to answer ?
- What steps were used to answer the question ?
- What did we expect to see ?
- What did we see instead ?
- What other solutions did we think about ?
* Places to get info on data science questions:
* stackoverflow
* R mailing list
* CrossValidated
* Google: "[data type] data analysis" or "[data type] R package"
* To checkout a local copy of a github repo: cd my_folder
git init
git remote add origin https://github.com/roche-emmanue/my_repo.git
* Help on git and github from:
* http://git-scm.com/doc
* https://help.github.com/
* Google/stackoverflow
* Basic Markdown:
* Headings with: ## This is a secondary heading
### This is tertiary heading
* Unordered lists: * Item 1
* Item 2
* Item 3
* Types of data science questions:
- Descriptive:
* Just trying to describe the data
- Exploratory:
* Trying to find relationships (but not really trying to confirm)
- Inferential:
* Take small dataset and try to generalize that to a larger population.
- Predictive:
* X **predicts** Y, doesn't mean that X **causes** Y
- Causal
* What happen if we change the value of one variable...
- Mechanistic
* What is Data:
* In a "set of items" = **population**
* **Variables** = measurement of characteristics.
* **qualitative** (discret scale) or **quantitative** (continuous scale)
* Could be Raw file / API / Video / Audio
* Is the second most important thing: (The question is the most important thing)
* To share large amount of data: http://figshare.com
* To avoid confounder we can fix the confounding variable, stratify variables, or randomize them if they cannot be fixed.