Statistical Computation for Programmers, Scientists, Quants, Excel Users, and Other Professionals
Using the open source R language, you can build powerful statistical models to answer many of your most challenging questions. R has traditionally been difficult for non-statisticians to learn, and most R books assume far too much knowledge to be of help. R for Everyone, Second Edition, is the solution.
Drawing on his unsurpassed experience teaching new users, professional data scientist Jared P. Lander has written the perfect tutorial for anyone new to statistical programming and modeling. Organized to make learning easy and intuitive, this guide focuses on the 20 percent of R functionality you’ll need to accomplish 80 percent of modern data tasks.
Lander’s self-contained chapters start with the absolute basics, offering extensive hands-on practice and sample code. You’ll download and install R; navigate and use the R environment; master basic program control, data import, manipulation, and visualization; and walk through several essential tests. Then, building on this foundation, you’ll construct several complete models, both linear and nonlinear, and use some data mining techniques. After all this you’ll make your code reproducible with LaTeX, RMarkdown, and Shiny.
By the time you’re done, you won’t just know how to write R programs, you’ll be ready to tackle the statistical problems you care about most.
Coverage includes
- Explore R, RStudio, and R packages
- Use R for math: variable types, vectors, calling functions, and more
- Exploit data structures, including data.frames, matrices, and lists
- Read many different types of data
- Create attractive, intuitive statistical graphics
- Write user-defined functions
- Control program flow with if, ifelse, and complex checks
- Improve program efficiency with group manipulations
- Combine and reshape multiple datasets
- Manipulate strings using R’s facilities and regular expressions
- Create normal, binomial, and Poisson probability distributions
- Build linear, generalized linear, and nonlinear models
- Program basic statistics: mean, standard deviation, and t-tests
- Train machine learning models
- Assess the quality of models and variable selection
- Prevent overfitting and perform variable selection, using the Elastic Net and Bayesian methods
- Analyze univariate and multivariate time series data
- Group data via K-means and hierarchical clustering
- Prepare reports, slideshows, and web pages with knitr
- Display interactive data with RMarkdown and htmlwidgets
- Implement dashboards with Shiny
- Build reusable R packages with devtools and Rcpp
Register your product at informit.com/register for convenient access to downloads, updates, and corrections as they become available.
Chapter 1: Getting R 11.1 Downloading R
1.2 R Version
1.3 32-bit vs. 64-bit
1.4 Installing
1.5 Revolution R Community Edition
1.6 Conclusion
Chapter 2: The R Environment
2.1 Command Line Interface
2.2 RStudio
2.3 Revolution Analytics RPE
2.4 Conclusion
Chapter 3: R Packages
3.1 Installing Packages
3.2 Loading Packages
3.3 Building a Package
3.4 Conclusion
Chapter 4: Basics of R
4.1 Basic Math
4.2 Variables
4.3 Data Types
4.4 Vectors
4.5 Calling Functions
4.6 Function Documentation
4.7 Missing Data
4.8 Conclusion
Chapter 5: Advanced Data Structures
5.1 data.frames
5.2 Lists
5.3 Matrices
5.4 Arrays
5.5 Conclusion
Chapter 6: Reading Data into R
6.1 Reading CSVs
6.2 Excel Data
6.3 Reading from Databases
6.4 Data from Other Statistical Tools
6.5 R Binary Files
6.6 Data Included with R
6.7 Extract Data from Web Sites
6.8 Conclusion
Chapter 7: Statistical Graphics
7.1 Base Graphics
7.2 ggplot2
7.3 Conclusion
Chapter 8: Writing R Functions
8.1 Hello, World!
8.2 Function Arguments
8.3 Return Values
8.4 do.call
8.5 Conclusion
Chapter 9: Control Statements
9.1 if and else
9.2 switch
9.3 ifelse
9.4 Compound Tests
9.5 Conclusion
Chapter 10: Loops, the Un-R Way to Iterate
10.1 for Loops
10.2 while Loops
10.3 Controlling Loops
10.4 Conclusion
Chapter 11: Group Manipulation
11.1 Apply Family
11.2 aggregate
11.3 plyr
11.4 data.table
11.5 Conclusion
Chapter 12: Data Reshaping
12.1 cbind and rbind
12.2 Joins
12.3 reshape2
12.4 Conclusion
Chapter 13: Manipulating Strings
13.1 paste
13.2 sprintf
13.3 Extracting Text
13.4 Regular Expressions
13.5 Conclusion
Chapter 14: Probability Distributions
14.1 Normal Distribution
14.2 Binomial Distribution
14.3 Poisson Distribution
14.4 Other Distributions
14.5 Conclusion
Chapter 15: Basic Statistics
15.1 Summary Statistics
15.2 Correlation and Covariance
15.3 T-Tests
15.4 ANOVA
15.5 Conclusion
Chapter 16: Linear Models
16.1 Simple Linear Regression
16.2 Multiple Regression
16.3 Conclusion
Chapter 17: Generalized Linear Models
17.1 Logistic Regression
17.2 Poisson Regression
17.3 Other Generalized Linear Models
17.4 Survival Analysis
17.5 Conclusion
Chapter 18: Model Diagnostics
18.1 Residuals
18.2 Comparing Models
18.3 Cross-Validation
18.4 Bootstrap
18.5 Stepwise Variable Selection
18.6 Conclusion
Chapter 19: Regularization and Shrinkage
19.1 Elastic Net
19.2 Bayesian Shrinkage
19.3 Conclusion
Chapter 20: Nonlinear Models
20.1 Nonlinear Least Squares
20.2 Splines
20.3 Generalized Additive Models
20.4 Decision Trees
20.5 Random Forests
20.6 Conclusion
Chapter 21: Time Series and Autocorrelation
21.1 Autoregressive Moving Average
21.2 VAR
21.3 GARCH
21.4 Conclusion
Chapter 22: Clustering
22.1 K-means
22.2 PAM
22.3 Hierarchical Clustering
22.4 Conclusion
Chapter 23: Reproducibility, Reports and Slide Shows with knitr
23.1 Installing a LATEX Program
23.2 LATEX Primer
23.3 Using knitr with LATEX
23.4 Markdown Tips
23.5 Using knitr and Markdown
23.6 pandoc
23.7 Conclusion
Chapter 24: Building R Packages
24.1 Folder Structure
24.2 Package Files
24.3 Package Documentation
24.4 Checking, Building and Installing
24.5 Submitting to CRAN
24.6 C++ Code
24.7 Conclusion
Appendix A: Real-Life Resources
A.1 Meetups
A.2 Stackoverflow
A.3 Twitter
A.4 Conferences
A.5 Web Sites
A.6 Documents
A.7 Books
A.8 Conclusion
Appendix B: Glossary