294. Gradient Descent
The gradient tells us the direction of steepest increase
For a function
the gradient of , written , is:
- The magnitude of the gradient vector tells you how fast the function is increasing (steepness)
- The direction of the gradient vector points to where the function increases most rapidly (direction)
Example
Function
Compute
- Differentiate w.r.t. (treat as constant):
- Since is constant w.r.t. , its derivative is 0:
- The derivative of w.r.t. is:
- So,
Compute
- Differentiate w.r.t. (treat as constant):
- Since is constant w.r.t. , its derivative is 0:
- The derivative of w.r.t. is:
So,
Combine to form gradient
Evaluate at a point
Interpretation
-
The gradient vector at points in the direction where the function increases most rapidly
-
Its magnitude measures the steepness of that increase
-
Moving one unit along the axis alone would increase the function by the partial derivative with respect to , which is 2
-
Moving one unit along the axis alone would increase the function by the partial derivative with respect to , which is 4
-
Moving from in the direction of , the function value will rise fastest and at a rate roughly proportional to per unit distance moved
294.0.1. Algorithm
Find the minimum of a function
In gradient descent, we move in the opposite direction of the gradient to minimize a function
Direction of maximum increase:
Direction of maximum decrease:
- Initialize: Start with an initial guess for the parameters
- Compute Gradient: Find the gradient of the function at the current parameters
- Update Parameters: Adjust the parameters by moving in the opposite direction of the gradient, scaled by the learning rate
- Repeat: Continue the process until the parameters converge to a minimum or the changes are minimal
Update Rule:
Where:
- : current iterate (point in domain of )
- : step size or learning rate at iteration
- : gradient of the function with respect to
- : next iterate (point in domain of )
gradient_descent.py
import numpy as np
def gradient_descent(
f, # Function to minimize, f(x)
grad_f, # Gradient of f, grad_f(x)
x0, # Initial guess for x
alpha=0.01, # Learning rate
max_iter=1000, # Maximum number of iterations
tol=1e-6 # Tolerance for stopping
):
x = x0.copy()
for _ in range(max_iter):
grad = grad_f(x)
if np.linalg.norm(grad) < tol:
break
x -= alpha * grad
return x
def f(x):
"""Example: minimize f(x) = x^2 + 2x + 1"""
return x**2 + 2*x + 1
def grad_f(x):
"""Gradient of f(x) = x^2 + 2x + 1 is f'(x) = 2x + 2"""
return 2*x + 2
gradient_descent(
f,
grad_f,
x_initial,
alpha=0.1,
max_iter=100
)
Example
Function
Gradient
Learning Rate
Iteration 1
- Initial Parameter Values:
- Gradient Calculation
- Parameter Update:
- Updated Parameter Values:
Iteration 2
- Current Parameter Values
- Gradient Calculation:
- Parameter Update:
- Updated Parameter Values:
Iteration 3
- Current Values
- Gradient Calculation:
- Parameter Update:
- Updated Parameter Values:
Example
Problem Setup
Minimize the quadratic function:
Gradient of
The gradient is:
Step 1: Compute
Take the derivative of each term with respect to (treat as a constant):
So:
Step 2: Find
Take the derivative of each term with respect to (treat as a constant):
So:
So the gradient of is:
Optimal Solution
Gradient Descent Iterations
Iteration 1
- Initial guess:
- Gradient at :
- Line Search:
- Minimizing:
- Update Step:
- Check Progress:
- New point:
- Function value decreases
- Gradient at new point:
- Gradient magnitude
Note: Gradient magnitude stays the same in this iteration due to the specific structure of this quadratic function
Iteration 2
- New Point:
- Gradient at :
- Line search:
- Minimize:
- Update Step:
- Improvement:
- New point:
- Function value decreases
- Gradient at new point:
- Gradient magnitude
Example
Problem Setup
Minimize the quadratic function:
Gradient of