Gradient Descent in a Nutshell

Gradient descent is by far the most popular optimization strategy, used in machine learning and deep learning at the moment. It is used while training your model, can be combined with every algorithm and is easy to understand and implement. Therefore, everyone who works with Machine Learning should understand it’s concept. After reading this posts you will understand how Gradient Descent works, what types of it are used today and what are their advantages and tradeoffs.

Table of Contents:


What is a Gradient?

How it works

Learning Rate

How to make sure that it works properly

Types of Gradient Descent (Batch Gradient Descent, Stochastic Gradient Descent, Mini Batch Gradient Descent)


IntroductionGradient Descent is used while training a machine learning model. It is an optimization algorithm, based on a convex function, that tweaks it’s parameters iteratively to minimize a given function to its local minimum.

It is simply used to find the values of a functions parameters (coefficients) that minimize a cost function as far as possible.

You start by defining the initial parameters values and from there on Gradient Descent iteratively adjusts the values, using calculus, so that they minimize the given cost-function. But to understand it’s concept fully, you first need to know what a gradient is.

What is a Gradient?“A gradient measures how much the output of a function changes if you change the inputs a little bit.” — Lex Fridman (MIT)

It simply measures the change in all weights with regard to the change in error. You can also think of a gradient as the slope of a function. The higher the gradient, the steeper the slope and the faster a model can learn. But if the slope is zero, the model stops learning. Said it more mathematically, a gradient is a partial derivative with respect to its inputs.