# Numpy Cheat Sheet

Posted : admin On 1/29/2022Explore and run machine learning code with Kaggle Notebooks Using data from multiple data sources. 2 Python For Data Science Cheat Sheet NumPy Basics Learn Python for Data Science Interactively at www.DataCamp.com NumPy DataCamp Learn Python for Data Science Interactively The NumPy library is the core library for scientific computing in Python. It provides a high-performance multidimensional array. SpaCy Cheat Sheet: Advanced NLP in Python March 12th, 2019 spaCy is a popular Natural Language Processing library with a concise API. This cheat sheet shows you how to load models, process text, and access linguistic annotations, all with a few handy objects and functions.

This post updates a previous very popular post 50+ Data Science, Machine Learning Cheat Sheets by Bhavya Geethika. If we missed some popular cheat sheets, add them in the comments below.

Cheatsheets on Python, R and Numpy, Scipy, Pandas

*Data science* is a multi-disciplinary field. Thus, there are thousands of packages and hundreds of programming functions out there in the data science world! An aspiring data enthusiast need not know all. A cheat sheet or reference card is a compilation of mostly used commands to help you learn that language’s syntax at a faster rate. Here are the most important ones that have been brainstormed and captured in a few compact pages.

Mastering *Data science* involves understanding of statistics, mathematics, programming knowledge especially in R, Python & SQL and then deploying a combination of all these to derive insights using the business understanding & a human instinct—that drives decisions.

Here are the cheat sheets by category:

**Cheat sheets for Python: **

Python is a popular choice for beginners, yet still powerful enough to back some of the world’s most popular products and applications. It's design makes the programming experience feel almost as natural as writing in English. Python basics or Python Debugger cheat sheets for beginners covers important syntax to get started. Community-provided libraries such as numpy, scipy, sci-kit and pandas are highly relied on and the NumPy/SciPy/Pandas Cheat Sheet provides a quick refresher to these.

- Python Cheat Sheet by DaveChild via cheatography.com
- Python Basics Reference sheet via cogsci.rpi.edu
- OverAPI.com Python cheatsheet
- Python 3 Cheat Sheet by Laurent Pointal

**Cheat sheets for R: **

The R's ecosystem has been expanding so much that a lot of referencing is needed. The R Reference Card covers most of the R world in few pages. The Rstudio has also published a series of cheat sheets to make it easier for the R community. The data visualization with ggplot2 seems to be a favorite as it helps when you are working on creating graphs of your results.

At cran.r-project.org:

At Rstudio.com:

- R markdown cheatsheet, part 2

Others:

- DataCamp’s Data Analysis the data.table way

**Cheat sheets for MySQL & SQL: **

For a data scientist basics of SQL are as important as any other language as well. Both PIG and Hive Query Language are closely associated with SQL- the original Structured Query Language. SQL cheatsheets provide a 5 minute quick guide to learning it and then you may explore Hive & MySQL!

- SQL for dummies cheat sheet

**Cheat sheets for Spark, Scala, Java: **

Apache Spark is an engine for large-scale data processing. For certain applications, such as iterative machine learning, Spark can be up to 100x faster than Hadoop (using MapReduce). The essentials of Apache Spark cheatsheet explains its place in the big data ecosystem, walks through setup and creation of a basic Spark application, and explains commonly used actions and operations.

- Dzone.com’s Apache Spark reference card
- DZone.com’s Scala reference card
- Openkd.info’s Scala on Spark cheat sheet
- Java cheat sheet at MIT.edu
- Cheat Sheets for Java at Princeton.edu

**Cheat sheets for Hadoop & Hive: **

Hadoop emerged as an untraditional tool to solve what was thought to be unsolvable by providing an open source software framework for the parallel processing of massive amounts of data. Explore the Hadoop cheatsheets to find out Useful commands when using Hadoop on the command line. A combination of SQL & Hive functions is another one to check out.

**Cheat sheets for web application framework Django: **

Django is a free and open source web application framework, written in Python. If you are new to Django, you can go over these cheatsheets and brainstorm quick concepts and dive in each one to a deeper level.

- Django cheat sheet part 1, part 2, part 3, part 4

**Cheat sheets for Machine learning: **

We often find ourselves spending time thinking which algorithm is best? And then go back to our big books for reference! These cheat sheets gives an idea about both the nature of your data and the problem you're working to address, and then suggests an algorithm for you to try.

- Machine Learning cheat sheet at scikit-learn.org
- Scikit-Learn Cheat Sheet: Python Machine Learning from yhat (added by GP)
- Patterns for Predictive Learning cheat sheet at Dzone.com
- Equations and tricks Machine Learning cheat sheet at Github.com
- Supervised learning superstitions cheatsheet at Github.com

**Cheat sheets for Matlab/Octave**

MATLAB (MATrix LABoratory) was developed by MathWorks in 1984. Matlab d has been the most popular language for numeric computation used in academia. It is suitable for tackling basically every possible science and engineering task with several highly optimized *toolboxes.* MATLAB is not an open-sourced tool however there is an alternative free GNU Octave re-implementation that follows the same syntactic rules so that most of coding is compatible to MATLAB.

**Cheat sheets for Cross Reference between languages**

**Related**:

### Numpy And Pandas Cheat Sheet

NumPy is the library that gives Python its ability to work with data at speed. Originally, launched in 1995 as ‘Numeric,’ NumPy is the foundation on whichmany important Python data science libraries are built, including Pandas, SciPy and scikit-learn.

## Key and Imports

In this cheat sheet, we use the following shorthand:

`arr` | A NumPy Array object |

You’ll also need to import numpy to get started:

## Importing/exporting

`np.loadtxt('file.txt')` | From a text file |

`np.genfromtxt('file.csv',delimiter=',')` | From a CSV file |

`np.savetxt('file.txt',arr,delimiter=' ')` | Writes to a text file |

`np.savetxt('file.csv',arr,delimiter=',')` | Writes to a CSV file |

## Creating Arrays

`np.empty((1, 2))`

create an empty `1`

x`2`

array. The value at each position is uninitialized (random value depending on the memory location).`np.array([1,2,3])`

One dimensional array. Keyword argument `dtype`

converts elements into specified type.`np.array([(1,2,3),(4,5,6)])`

Two dimensional array`np.zeros(3)`

1D array of length `3`

all values `0`

`np.ones((3,4))`

`3`

x`4`

array with all values `1`

`np.eye(5)`

`5`

x`5`

array of `0`

with `1`

on diagonal (Identity matrix)`np.linspace(0,100,6)`

Array of `6`

evenly divided values from `0`

to `100`

`np.arange(0,10,3)`

Array of values from `0`

to less than `10`

with step `3`

(eg `[0,3,6,9]`

)`np.full((2,3),8)`

`2`

x`3`

array with all values `8`

`np.random.rand(4,5)`

`4`

x`5`

array of random floats between `0`

-`1`

`np.random.rand(6,7)*100`

`6`

x`7`

array of random floats between `0`

-`100`

`np.random.randint(5,size=(2,3))`

`2`

x`3`

array with random ints between `0`

-`4`

## Inspecting Properties

`arr.size` | Returns number of elements in `arr` |

`arr.shape` | Returns dimensions of `arr` (rows,columns) |

`arr.dtype` | Returns type of elements in `arr` |

`arr.astype(dtype)` | Convert `arr` elements to type `dtype` |

`arr.tolist()` | Convert `arr` to a Python list |

`np.info(np.eye)` | View documentation for `np.eye` |

`np.copy(arr)` | Copies `arr` to new memory |

`arr.view(dtype)` | Creates view of `arr` elements with type `dtype` |

`arr.sort()` | Sorts `arr` |

`arr.sort(axis=0)` | Sorts specific axis of `arr` |

`two_d_arr.flatten()` | Flattens 2D array `two_d_arr` to 1D |

`arr.T` | Transposes `arr` (rows become columns and vice versa) |

`arr.reshape(3,4)` | Reshapes `arr` to `3` rows, `4` columns without changing data |

`arr.resize((5,6))` | Changes `arr` shape to `5` x`6` and fills new values with `0` |

## Adding/removing Elements

`np.append(arr,values)` | Appends values to end of `arr` |

`np.insert(arr,2,values)` | Inserts values into `arr` before index `2` |

`np.delete(arr,3,axis=0)` | Deletes row on index `3` of `arr` |

`np.delete(arr,4,axis=1)` | Deletes column on index `4` of `arr` |

## Combining

`np.vstack((arr1, arr2))` | Vertically stack multiple arrays. Think of it like the second arrays’s items being added as new rows to the first array. |

`np.hstack((arr1, arr2))` | horizontally stack multiple arrays. |

`np.concatenate((arr1,arr2),axis=0)` | Adds `arr2` as rows to the end of `arr1` . It’s a general-purpose `vstack` . |

`np.concatenate((arr1,arr2),axis=1)` | Adds `arr2` as columns to end of `arr1` . It’s a general-purpose `hstack` . |

`np.split(arr,3)` | Splits `arr` into `3` sub-arrays |

`np.hsplit(arr,5)` | Splits `arr` horizontally on the `5` th index |

## Indexing

`arr[5]` | Returns the element at index `5` |

`arr[2,5]` | Returns the 2D array element on index `[2][5]` |

`arr[1]=4` | Assigns array element on index `1` the value `4` |

`arr[1,3]=10` | Assigns array element on index `[1][3]` the value `10` |

`arr[0:3]` | Returns the elements at indices `0,1,2` (On a 2D array: returns rows `0,1,2` ) |

`arr[0:3,4]` | Returns the elements on rows `0,1,2` at column `4` |

`arr[:2]` | Returns the elements at indices `0,1` (On a 2D array: returns rows `0,1` ) |

`arr[:,1]` | Returns the elements at index `1` on all rows |

`arr<5` | Returns an array with boolean values |

`(arr1<3) & (arr2>5)` | Returns an array with boolean values |

`~arr` | Inverts a boolean array |

`arr[arr<5]` | Returns array elements smaller than `5` |

## Conditional Selecting

NumPy makes it possible to test to see if rows match certain values usingmathematical comparison operations like `<`

, `>`

, `>=`

, `<=`

, and . Forexample, if we want to see which wines have a quality rating higher than `5`

,we can do this:

We get a Boolean array that tells us which of the wines have a quality ratinggreater than `5`

. We can do something similar with the other operators. Forinstance, we can see if any wines have a quality rating equal to `10`

:

One of the powerful things we can do with a Boolean array and a NumPy array isselect only certain rows or columns in the NumPy array. For example, the belowcode will only select rows in `wines`

where the quality is over `7`

:

We select only the rows where `high_quality`

contains a `True`

value, and allof the columns. This subsetting makes it simple to filter arrays for certaincriteria. For example, we can look for wines with a lot of alcohol and highquality. In order to specify multiple conditions, we have to place eachcondition in parentheses, and separate conditions with an ampersand (`&`

):

We can combine subsetting and assignment to overwrite certain values in anarray:

## Reshaping NumPy Arrays

`numpy.transpose(arr)` | Transpose the array. |

`numpy.ravel(arr)` | Turn an array into a one-dimensional representation. |

`numpy.reshape(arr)` | Reshape an array to a certain shape we specify. |

## Scalar Math

If you do any of the basic mathematical operations (`/, *, -, +, ^`

) with an array and a value, it will apply the operation to each of the elements in the array.

`np.add(arr,1)` or `arr + 1` | Add `1` to each array element |

`np.subtract(arr,2)` or `arr - 2` | Subtract `2` from each array element |

`np.multiply(arr,3)` or `arr * 3` | Multiply each array element by `3` |

`np.divide(arr,4)` or `arr / 4` | Divide each array element by `4` (returns `np.nan` for division by zero) |

`np.power(arr,5)` or `arr ^ 5` | Raise each array element to the `5` th power |

Note that the above operation won’t change the wines array – it will return a new 1-dimensional array where 10 has been added to each element in the quality column of wines.

If we instead did `+=`

, we’d modify the array in place.

## Vector Math

All of the common operations (`/, *, -, +, ^`

) will work between arrays.

`np.add(arr1,arr2)` | Elementwise add `arr2` to `arr1` |

`np.subtract(arr1,arr2)` | Elementwise subtract `arr2` from `arr1` |

`np.multiply(arr1,arr2)` | Elementwise multiply `arr1` by `arr2` |

`np.divide(arr1,arr2)` | Elementwise divide `arr1` by `arr2` |

`np.power(arr1,arr2)` | Elementwise raise `arr1` raised to the power of `arr2` |

`np.array_equal(arr1,arr2)` | Returns `True` if the arrays have the same elements and shape |

`np.sqrt(arr)` | Square root of each element in the array |

`np.sin(arr)` | Sine of each element in the array |

`np.log(arr)` | Natural log of each element in the array |

`np.abs(arr)` | Absolute value of each element in the array |

`np.ceil(arr)` | Rounds up to the nearest int |

`np.floor(arr)` | Rounds down to the nearest int |

`np.round(arr)` | Rounds to the nearest int |

## Statistics

`np.mean(arr,axis=0)` | Returns mean along specific axis |

`arr.sum()` | Returns sum of `arr` |

`arr.min()` | Returns minimum value of `arr` |

`arr.max(axis=0)` | Returns maximum value of specific axis |

`np.var(arr)` | Returns the variance of array |

`np.std(arr,axis=1)` | Returns the standard deviation of specific axis |

`arr.corrcoef()` | Returns correlation coefficient of array |

### Scipy Cheat Sheet Pdf

## Acknowledgement

### Python Data Analysis Cheat Sheet

The original post can be found at dataquest.io.