294. Gradient Descent

The gradient tells us the direction of steepest increase

For a function

the gradient of , written , is:

Example

Function

Compute

  • Differentiate w.r.t. (treat as constant):
  • Since is constant w.r.t. , its derivative is 0:
  • The derivative of w.r.t. is:
  • So,

Compute

  • Differentiate w.r.t. (treat as constant):
  • Since is constant w.r.t. , its derivative is 0:
  • The derivative of w.r.t. is:

So,

Combine to form gradient

Evaluate at a point

Interpretation

  • The gradient vector at points in the direction where the function increases most rapidly

  • Its magnitude measures the steepness of that increase

  • Moving one unit along the axis alone would increase the function by the partial derivative with respect to , which is 2

  • Moving one unit along the axis alone would increase the function by the partial derivative with respect to , which is 4

  • Moving from in the direction of , the function value will rise fastest and at a rate roughly proportional to per unit distance moved

294.0.1. Algorithm

Find the minimum of a function

In gradient descent, we move in the opposite direction of the gradient to minimize a function

Direction of maximum increase:

Direction of maximum decrease:

  1. Initialize: Start with an initial guess for the parameters
  2. Compute Gradient: Find the gradient of the function at the current parameters
  3. Update Parameters: Adjust the parameters by moving in the opposite direction of the gradient, scaled by the learning rate
  4. Repeat: Continue the process until the parameters converge to a minimum or the changes are minimal

Update Rule:

Where:

gradient_descent.py
import numpy as np

def gradient_descent(
  f,              # Function to minimize, f(x)
  grad_f,         # Gradient of f, grad_f(x)
  x0,             # Initial guess for x
  alpha=0.01,     # Learning rate
  max_iter=1000,  # Maximum number of iterations
  tol=1e-6        # Tolerance for stopping
):
    x = x0.copy()
    for _ in range(max_iter):
        grad = grad_f(x)
        if np.linalg.norm(grad) < tol:
            break
        x -= alpha * grad
    return x

def f(x):
  """Example: minimize f(x) = x^2 + 2x + 1"""
  return x**2 + 2*x + 1

def grad_f(x):
    """Gradient of f(x) = x^2 + 2x + 1 is f'(x) = 2x + 2"""
    return 2*x + 2

gradient_descent(
    f,
    grad_f,
    x_initial,
    alpha=0.1,
    max_iter=100
)
Example

Function

Gradient

Learning Rate

Iteration 1

  • Initial Parameter Values:
  • Gradient Calculation
  • Parameter Update:
  • Updated Parameter Values:

Iteration 2

  • Current Parameter Values
  • Gradient Calculation:
  • Parameter Update:
  • Updated Parameter Values:

Iteration 3

  • Current Values
  • Gradient Calculation:
  • Parameter Update:
  • Updated Parameter Values:
Example

Problem Setup

Minimize the quadratic function:

Gradient of

The gradient is:

Step 1: Compute

Take the derivative of each term with respect to (treat as a constant):

So:

Step 2: Find

Take the derivative of each term with respect to (treat as a constant):

So:

So the gradient of is:

Optimal Solution

Gradient Descent Iterations

Iteration 1

  • Initial guess:
  • Gradient at :
  • Line Search:
  • Minimizing:
  • Update Step:
  • Check Progress:
  1. New point:
  1. Function value decreases
  1. Gradient at new point:
  1. Gradient magnitude

Note: Gradient magnitude stays the same in this iteration due to the specific structure of this quadratic function

Iteration 2

  • New Point:
  • Gradient at :
  • Line search:
  • Minimize:
  • Update Step:
  • Improvement:
  1. New point:
  1. Function value decreases
  1. Gradient at new point:
  1. Gradient magnitude
Example

Problem Setup

Minimize the quadratic function:

Gradient of