Calculus, AI, and linear algebra: a compact field guide

← Back to Blog

You write or review ML code and want a fast, code-first refresher on the calculus and linear algebra behind gradients, Jacobians, and SVD.

Most ML code is just calculus and linear algebra in disguise. Here is a concise refresher with runnable snippets.

Gradients in plain sight

A gradient is the vector of partial derivatives. For a scalar function f(x,y)f(x, y):

f=[fx,fy] \nabla f = \left[\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}\right]

Example: f(x,y)=x2+xy+3y2f(x, y) = x^2 + xy + 3y^2 yields f=[2x+y, x+6y]\nabla f = [2x + y,\ x + 6y].

import numpy as np

def f(xy):
    x, y = xy
    return x**2 + x*y + 3*y**2

# analytic gradient
def grad(xy):
    x, y = xy
    return np.array([2*x + y, x + 6*y])

pt = np.array([2.0, -1.0])
print("f:", f(pt))
print("grad:", grad(pt))

Finite differences are a quick sanity check:

def finite_diff(fn, pt, eps=1e-5):
    g = np.zeros_like(pt)
    for i in range(len(pt)):
        step = np.zeros_like(pt)
        step[i] = eps
        g[i] = (fn(pt + step) - fn(pt - step)) / (2 * eps)
    return g

print("finite diff:", finite_diff(f, pt))

Jacobians: vector outputs

For g:RnRmg: \mathbb{R}^n \to \mathbb{R}^m, the Jacobian stacks the gradients of each output component. A simple two-output function:

g(x,y)=[x2+yxy] g(x, y) = \begin{bmatrix} x^2 + y \\ xy \end{bmatrix}

Its Jacobian is:

J=[2x1yx] J = \begin{bmatrix} 2x & 1 \\ y & x \end{bmatrix}
def g(xy):
    x, y = xy
    return np.array([x**2 + y, x*y])

def jacobian(xy):
    x, y = xy
    return np.array([[2*x, 1], [y, x]])

pt = np.array([1.5, 0.5])
print("g(pt):", g(pt))
print("J(pt):\n", jacobian(pt))

Linear algebra fuel: projections and SVD

Principal component analysis (PCA) is just the singular value decomposition (SVD): X=UΣVTX = U\Sigma V^T. The top right singular vectors in VV are the principal directions.

rng = np.random.default_rng(7)
X = rng.normal(size=(6, 3))  # 6 samples, 3 features

# center
Xc = X - X.mean(axis=0, keepdims=True)

# SVD
U, S, Vt = np.linalg.svd(Xc, full_matrices=False)

print("singular values:", S)
print("first principal direction:", Vt[0])

# project to 2D
X2 = Xc @ Vt[:2].T
print("projected shape:", X2.shape)

Projection of a vector vv onto a direction uu is:

proju(v)=vuu2u \text{proj}_u(v) = \frac{v \cdot u}{\lVert u \rVert^2} u
v = np.array([2.0, 1.0, -1.0])
u = Vt[0]  # principal direction
proj = (v @ u) / (u @ u) * u
print("projection:", proj)
  graph LR;
    Data["High-dimensional data X"] --> Center["Center columns"];
    Center --> SVD["SVD: X = U Σ Vᵀ"];
    SVD --> PCs["Take top k rows of Vᵀ (principal directions)"];
    PCs --> Project["Project: X · V_kᵀ"];
    Project --> Embeddings["Lower-dimensional embeddings"];

Why this matters for AI

  • Gradients drive optimizers (SGD, Adam); Jacobians underpin backprop.
  • SVD/PCA reduces dimensionality and denoises embeddings.
  • Projections help in retrieval and similarity search by isolating informative axes.

If you keep these primitives sharp, most model code becomes easier to reason about and debug.