A curated list of the latest data science news and articles from experts in the field, on everything from big data to deep learning. Cheat Sheet for Pandas. Jun 1, 2020 6 min read. Let’s learn pandas. This is simplified version of a kaggle mini course based on pandas. This cheat sheet is inspired by the data wrangling cheat sheets from RStudio and pandas. Examples are based on the Kaggle Titanic data set. Created by Tom Kwong, September 2020. V0.21 rev3 Page 1 / 2 Mutation: use sort! Mutation: use select! Mutation: use dropmissing!, allowmissing!, or disallowmissing! Nrow(df) ncol(df) Number of rows and columns.
- Data Mining. PDF only.
- Importing Data. PDF.
- Keras. PDF.
Linear Algebra (with Numpy)¶
- Linear Algebra. PDF only.
- SciPy Linear Algebra. PDF.
Machine Learning. PDF only.
- Supervised Learning;
- Unsupervised Learning;
- Deep Learning;
- Machine Learning Tips and Tricks;
- Probabilities and Statistics;
- Linear Algebra and Calculus.
Super pense-bête Machine Learning. PDF only.
Microsoft Azure Machine Learning. PDF.
- scikit-learn. PDF.
- NumPy/SciPy/Pandas Cheat Sheet. PDF.
- Numpy. PDF.
- Pandas DataFrame Notes. PDF only.
- Pandas. PDF.
- Pandas. PDF.
- Data Wrangling with Pandas. PDF.
- PySpark. PDF.
- PySpark SQL. PDF.
- Bokeh. PDF.
- Folium. PDF.
- Matplotlib Notes. PDF only.
- Matplotlib. PDF.
- Plotly. PDF only.
- Seaborn. PDF.
Over the past few years, as the buzz and apparently the demand for data scientists has continued to grow, people are eager to learn how to join, learn, advance and thrive in this seemingly lucrative profession. As someone who writes on analytics and occasionally teaches it, I am often asked - How do I become a data scientist?
Adding to the complexity of my answer is data science seems to be a multi-disciplinary field, while the university departments of statistics, computer science and management deal with data quite differently.
But to cut the marketing created jargon aside, a data scientist is simply a person who can write code in a few languages (primarily R, Python and SQL) for data querying, manipulation , aggregation, and visualization using enough statistical knowledge to give back actionable insights to the business for making decisions.
Since this rather practical definition of a data scientist is reinforced by the accompanying words on a job website for “data scientists” , ergo, here are some tools for learning the primary languages in data science- Python, R and SQL. A cheat sheet or reference card is a compilation of mostly used commands to help you learn that language’s syntax at a faster rate.
The inclusion of SQL may lead to some to feel surprised (isn’t this the NoSQL era?) , but it is there for a logical reason. Both PIG and Hive Query Language are closely associated with SQL- the original Structured Query Language. In addition one can solely use the sqldf package within R (and the less widely used python-sql or
Kaggle Pandas Cheat Sheet Freepython-sqlparse libraries for Pythonic data scientists) or even the Proc SQL commands within the old champion language SAS, and do most of what a data scientist is expected to do (at least in data munging).
For Python, this is a rather partial list given the fact that Python, the most general purpose language within the data scientist quiver, can be used for many things. But for the data scientist, the packages of numpy, scipy , pandas and scikit-learn seem the most pertinent.
Kaggle Pandas Cheat SheetDo all the thousands of R packages have useful interest to the aspiring data scientist? No.
Accordingly we chose the appropriate cheat sheets for you. Note that this is a curated list of lists. If there is anything that can be assumed in the field of data science, it should be that the null hypothesis is that the data scientist is intelligent enough to make his own decisions based on data and it’s context. 3 printouts is all it takes to speed up the aspiring data scientist’s journey.
Please add additional cheat sheets in comments below.
Cheat Sheets for Python
- Python www.astro.up.pt/~sousasag/Python_For_Astronomers/Python_qr.pdf
- NumPy, SciPy and Pandas s3.amazonaws.com/quandl-static-content/Documents/Quandl+-+Pandas,+SciPy,+NumPy+Cheat+Sheet.pdf
Cheat Sheets for R
- Short Reference Card cran.r-project.org/doc/contrib/Short-refcard.pdf
- R Functions for Regression Analysis cran.r-project.org/doc/contrib/Ricci-refcard-regression.pdf
- Time Series cran.r-project.org/doc/contrib/Ricci-refcard-ts.pdf
- Data Mining cran.r-project.org/doc/contrib/YanchangZhao-refcard-data-mining.pdf
- Quandl s3.amazonaws.com/quandl-static-content/Documents/Quandl+-+R+Cheat+Sheet.pdf
Cross Reference between R, Python (and Matlab)
Cheat Sheets for SQL
- SQL Joins www.codeproject.com/Articles/33052/Visual-Representation-of-SQL-Joins
- SQL and Hive hortonworks.com/wp-content/uploads/downloads/2013/08/Hortonworks.CheatSheet.SQLtoHive.pdf
Kaggle Pandas Cheat Sheet Pdf
- Cheat Sheets for Java introcs.cs.princeton.edu/java/11cheatsheet/
- Linux Cheat Sheet www.linuxstall.com/linux-command-line-tips-that-every-linux-user-should-know/
Ajay Ohri is a popular writer and blogger on Analytics and Data Mining and is the author of R for Business Analytics book (Springer, 2012).