# Deep Learning Cheat Sheet

Would you like to see this cheatsheet in your native language? You can help us translating it on GitHub!
CS 229 - Machine Learning

By Afshine Amidi and Shervine Amidi

## Neural Networks

Neural networks are a class of models that are built with layers. Commonly used types of neural networks include convolutional and recurrent neural networks.

Architecture The vocabulary around neural networks architectures is described in the figure below:

Deep Learning Cheat Sheet Tips & Tricks - GlobalSQA Deep Learning is a part of Machine Learning. Deep Learning Algorithms are inspired by brain function. Although, it’s a subset but below image represents the difference between Machine Learning and Deep Learning. Shervine Amidi, graduate student at Stanford, and Afshine Amidi, of MIT and Uber - creators of a recent set of machine leanring cheat sheets - have just published a new set of deep learning cheat sheets. These 'VIP cheat sheets' are based on the materials from Stanford's CS 230 (Github repo with PDFs available here), and include topics such as.

By noting \$i\$ the \$i^{th}\$ layer of the network and \$j\$ the \$j^{th}\$ hidden unit of the layer, we have:

where we note \$w\$, \$b\$, \$z\$ the weight, bias and output respectively.

Activation function Activation functions are used at the end of a hidden unit to introduce non-linear complexities to the model. Here are the most common ones:

 Sigmoid Tanh ReLU Leaky ReLU \$g(z)=displaystylefrac{1}{1+e^{-z}}\$ \$g(z)=displaystylefrac{e^{z}-e^{-z}}{e^{z}+e^{-z}}\$ \$g(z)=textrm{max}(0,z)\$ \$g(z)=textrm{max}(epsilon z,z)\$ with \$epsilonll1\$

Cross-entropy loss In the context of neural networks, the cross-entropy loss \$L(z,y)\$ is commonly used and is defined as follows:

[boxed{L(z,y)=-Big[ylog(z)+(1-y)log(1-z)Big]}] Learning rate The learning rate, often noted \$alpha\$ or sometimes \$eta\$, indicates at which pace the weights get updated. This can be fixed or adaptively changed. The current most popular method is called Adam, which is a method that adapts the learning rate.

Backpropagation Backpropagation is a method to update the weights in the neural network by taking into account the actual output and the desired output. The derivative with respect to weight \$w\$ is computed using chain rule and is of the following form:

[boxed{frac{partial L(z,y)}{partial w}=frac{partial L(z,y)}{partial a}timesfrac{partial a}{partial z}timesfrac{partial z}{partial w}}]

As a result, the weight is updated as follows:

[boxed{wlongleftarrow w-alphafrac{partial L(z,y)}{partial w}}] Updating weights In a neural network, weights are updated as follows:

• Step 1: Take a batch of training data.
• Step 2: Perform forward propagation to obtain the corresponding loss.
• Step 3: Backpropagate the loss to get the gradients.
• Step 4: Use the gradients to update the weights of the network.

Dropout Dropout is a technique meant to prevent overfitting the training data by dropping out units in a neural network. In practice, neurons are either dropped with probability \$p\$ or kept with probability \$1-p.\$

## Convolutional Neural Networks

Convolutional layer requirement By noting \$W\$ the input volume size, \$F\$ the size of the convolutional layer neurons, \$P\$ the amount of zero padding, then the number of neurons \$N\$ that fit in a given volume is such that:

Batch normalization It is a step of hyperparameter \$gamma, beta\$ that normalizes the batch \${x_i}\$. By noting \$mu_B, sigma_B^2\$ the mean and variance of that we want to correct to the batch, it is done as follows:

[boxed{x_ilongleftarrowgammafrac{x_i-mu_B}{sqrt{sigma_B^2+epsilon}}+beta}]
It is usually done after a fully connected/convolutional layer and before a non-linearity layer and aims at allowing higher learning rates and reducing the strong dependence on initialization.

## Recurrent Neural Networks

Types of gates Here are the different types of gates that we encounter in a typical recurrent neural network:

 Input gate Forget gate Gate Output gate Write to cell or not? Erase a cell or not? How much to write to cell? How much to reveal cell?

LSTM A long short-term memory (LSTM) network is a type of RNN model that avoids the vanishing gradient problem by adding 'forget' gates.

For a more detailed overview of the concepts above, check out the Deep Learning cheatsheets!

## Reinforcement Learning and Control

The goal of reinforcement learning is for an agent to learn how to evolve in an environment.

### Definitions

Markov decision processes A Markov decision process (MDP) is a 5-tuple \$(mathcal{S},mathcal{A},{P_{sa}},gamma,R)\$ where:

• \$mathcal{S}\$ is the set of states
• \$mathcal{A}\$ is the set of actions
• \${P_{sa}}\$ are the state transition probabilities for \$sinmathcal{S}\$ and \$ainmathcal{A}\$
• \$gammain[0,1[\$ is the discount factor
• \$R:mathcal{S}timesmathcal{A}longrightarrowmathbb{R}\$ or \$R:mathcal{S}longrightarrowmathbb{R}\$ is the reward function that the algorithm wants to maximize

Policy A policy \$pi\$ is a function \$pi:mathcal{S}longrightarrowmathcal{A}\$ that maps states to actions.

Remark: we say that we execute a given policy \$pi\$ if given a state \$s\$ we take the action \$a=pi(s)\$.

Value function For a given policy \$pi\$ and a given state \$s\$, we define the value function \$V^{pi}\$ as follows:

[boxed{V^pi(s)=EBig[R(s_0)+gamma R(s_1)+gamma^2 R(s_2)+... s_0=s,piBig]}]

Bellman equation The optimal Bellman equations characterizes the value function \$V^{pi^*}\$ of the optimal policy \$pi^*\$:

[boxed{V^{pi^*}(s)=R(s)+max_{ainmathcal{A}}gammasum_{s'in S}P_{sa}(s')V^{pi^*}(s')}]

Remark: we note that the optimal policy \$pi^*\$ for a given state \$s\$ is such that:

[boxed{pi^*(s)=underset{ainmathcal{A}}{textrm{argmax}}sum_{s'inmathcal{S}}P_{sa}(s')V^*(s')}]

Value iteration algorithm The value iteration algorithm is in two steps:

1) We initialize the value:

2) We iterate the value based on the values before:

[boxed{V_{i+1}(s)=R(s)+max_{ainmathcal{A}}left[sum_{s'inmathcal{S}}gamma P_{sa}(s')V_i(s')right]}]

Maximum likelihood estimate The maximum likelihood estimates for the state transition probabilities are as follows:

[boxed{P_{sa}(s')=frac{#textrm{times took action }atextrm{ in state }stextrm{ and got to }s'}{#textrm{times took action }atextrm{ in state }s}}]

Q-learning \$Q\$-learning is a model-free estimation of \$Q\$, which is done as follows:

[boxed{Q(s,a)leftarrow Q(s,a)+alphaBig[R(s,a,s')+gammamax_{a'}Q(s',a')-Q(s,a)Big]}]

For a more detailed overview of the concepts above, check out the States-based Models cheatsheets!

Have a little time to learn Tensorflow 2.0 with your Machine Learning? In this article, I have put together the 10 best Tensorflow cheat sheets for you to hang on the wall above your desk. Whenever you need a reference, keep these handy cheat sheets available!!

## Cheat Sheet 1: BecomingHuman.AI

becominghuman.ai has multiple cheat sheets but this one I have found to be one of the best. Easy to read and understand, this cheat sheet is great for beginners and advanced Tensorflow learners alike!

Pros: Rated ‘E’ for everyone.

Cons: Color can be distracting

## Cheat Sheet 2: Altoros

Altoros is another great website to find cheat sheets for machine learning on!! They cover a wide range of subjects. This Tensorflow cheat sheet is 3 pages long but it is totality worth the wealth of information you can receive.

Pros: Easy to read and understand.

Cons: Condensed information can be difficult for some readers. ## Cheat Sheet 3: Tech Republic

This cheat sheet from Tech Republic is chock full of information for you! Including an introduction, how to begin, top competitors and additional resources. This is great if you are looking to become an IT Pro

Pros: Great for those who are looking to upgrade their skills.

Cons: It is a lot of reading material (8 pages, condensed)

## Cheat Sheet 4: Dummies

Sometimes, the best way to learn is from a dummy!! Tensorflow for dummies is a great way to get an introduction to Tensorflow, what it is and how it works. Perfect for beginners in machine learning!

Pros: Great information for beginners

Cons: Condensed information, no working examples

## Cheat Sheet 5: TensorFlow

To learn Tensorflow, one must go to Tensorflow.org! This website and cheat sheet will teach you everything you need to learn Tensorflow correctly and effectively. There is a lot of materials here so be prepared for a lengthy read! Great for beginners and Advanced Tensorflowers!

Pros: Rated ‘E’ for everyone, the best way to Tensorflow.

Cons: Can be confusing to the absolute beginner.

## Cheat Sheet 6: Github This is a really great cheat sheet written by Patrick on Github! He shows examples and syntax. This is a great sheet for all Tensorflow learners!!

Pros: Rated ‘E’ for everyone.

Cons: None that I can see.

## Cheat sheet 7: Stanford.edu

This cheat sheet shows you the ins and outs of Tensorflow what it is, how it works and how it compares to other data science tools compare. Easily readable for beginners and advanced Tensorflow users alike.

Pros: Easy to understand

Cons: None that I can see

## Cheat Sheet 8: Tensorflow Core

This cheat sheet is from Tensorflow Core. It shows api documentation for Tensorflow in Python!! Alongside other languages, it shows the correct explanation and syntax for the method you are trying to perform.

Pros: Easy to read, rated ‘E’ for everyone.

Cons: none that I can see.

## Cheat Sheet 9: HackerNoon

To understand Tensorflow, you must understand Deep Learning. This cheat sheet is one to keep handy as a dog-eared reference in the desk drawer or right next to your working laptop.

Camron will take you from beginning to end understanding Tensorflow and deep learning easier with explanations plus the best resources. This is a great resource for those who are serious about looking into Data Science as a career with Python and Tensorflow.

Pros: Tons of resources, rated ‘E’ for everyone.

Cons: A lot of research materials and reading.

## Cheat Sheet 10: Cheatography

This cheat sheet will show you the types of models from machine learning you can build with Tensorflow!! It has graphics, explanations and examples on what you need to know for Tensorflow, Machine and Deep Learning!

Pros: Rated ‘E’ for everyone

## Deep Learning Cheat Sheet

Cons: None that I can see.

Thank you for joining me on another journey to find the top 10 best cheat sheets on Tensorflow! I hope that they are useful to you on your journey in Deep Learning and Tensorflow!

Related Articles: