Resources
This is a work-in-progress collection of resources I find useful for data science work. I’ve currently split this into two sections. The first, “Machine Learning, is oriented toward production ML work with python (ie. tensorflow, pytorch, neural networks, etc.). The second section, “Statistics”, is focused on statistical analysis and modeling work with R (ie. regression, visualizations, etc.).
Machine Learning
Blogs
- Hugging Face - popular framework for working with transformer models
- Pytorch Lightning - A conveniant framework for working with pytorch
- Pytorch
- Explosion AI - blog from the creators of the NLP library spacy
- Jay Almammar - explanations of how ML algorithms like transformers work
- Sebatian Raschka
- The Batch - newsletter on AI from DeepLearning.AI
- The Sequence - AI newsletter
- Distill - Machine learning concepts explained intuitively
- Microsoft Research
Books
- Deep Learning With Python (Francois Chollet)
- Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow (Aurélien Géron)
- Natural Language Processing with Transformers (Lewis Tunstall, Leandro von Werra, Thomas Wolf)
Other
- DeepLearning.AI Deep Learning Coursera Specializaton - 5 courses starting with neural network basics and ending with computer vision and sequence models.
- ONNX Runtime
- Nvidia CUDA Toolkit
- Deepspeed
Python
- Chris Albon - Great practical code snippets for pandas, sklearn, etc.
- Python Data Science Handbook
- Pandas Cookbook
Statistics
Blogs
- RStudio’s Blog
- Julia Silge - Lots of great walkthroughs of machine learning problems with tidymodels
- Andrew Gelman - Bayesian statistics
- TJ Mahr - R and Bayesian stats
- Simply Statistics
- David Robinson - R and machine learning
- Monica Alexander - Bayesian statistics and demography
- Frank Harrell - Biostatistics
- Richard McElreath
Books
- Introduction to Statistical Learning - A great practical statistics and data science textbook with a very broad scope. The second edition includes survival analysis and neural network models.
- Statistical Rethinking (Richard McElreath) - Very useful for understanding Bayesian statistical methods. The book is not freely available online, but there are lots of related resources that are such as Statistical rethinking with brms (Solomon Kurz) and video lectures on YouTube.
- R for Data Science - An introduction to R
- Regression and Other Stories - An applied textbook on Bayesian statistics. Makes use of the rstanarm package. Not freely available online, but there are code examples provided on the website.
- Telling Stories With Data (Rohan Alexander) - Practical applications of R and Bayesian statistics
- Forecasting - Principles and Practice - Textbook on time series and forecasting methods
- Text Mining with R
Data Visualization
- data-to-viz - a general data visualization reference
- “Top 50” ggplot visualizations
- BBC R Cookbook - ggplot code reference
- Data Visualization - A Practical Introduction
Other
- RWeekly - a blog aggregator for keeping track of what’s going on in the R community.
- Tidymodels Documentation
- Data Science Digest
- Biostat Handbook - general statistics reference
- RStudio Cheatsheets
- R Markdown - The Definitive Guide
- R Markdown Cookbook
- Stan User’s Guide - A helpful reference for Bayesian modeling.