Calculus, AI, and linear algebra: a compact field guide

January 26, 2026 · 3 min read

ai calculus linear-algebra machine-learning

Table of Contents

You write or review ML code and want a fast, code-first refresher on the calculus and linear algebra behind gradients, Jacobians, and SVD.

Most ML code is just calculus and linear algebra in disguise. Here is a concise refresher with runnable snippets.

Gradients in plain sight

A gradient is the vector of partial derivatives. For a scalar function $f(x, y)$ :

\nabla f = \left[\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}\right]

Example: $f(x, y) = x^2 + xy + 3y^2$ yields $\nabla f = [2x + y,\ x + 6y]$ .

import numpy as np

def f(xy):
    x, y = xy
    return x**2 + x*y + 3*y**2

# analytic gradient
def grad(xy):
    x, y = xy
    return np.array([2*x + y, x + 6*y])

pt = np.array([2.0, -1.0])
print("f:", f(pt))
print("grad:", grad(pt))

Finite differences are a quick sanity check:

def finite_diff(fn, pt, eps=1e-5):
    g = np.zeros_like(pt)
    for i in range(len(pt)):
        step = np.zeros_like(pt)
        step[i] = eps
        g[i] = (fn(pt + step) - fn(pt - step)) / (2 * eps)
    return g

print("finite diff:", finite_diff(f, pt))

Jacobians: vector outputs

For $g: \mathbb{R}^n \to \mathbb{R}^m$ , the Jacobian stacks the gradients of each output component. A simple two-output function:

g(x, y) = \begin{bmatrix} x^2 + y \\ xy \end{bmatrix}

Its Jacobian is:

J = \begin{bmatrix} 2x & 1 \\ y & x \end{bmatrix}

def g(xy):
    x, y = xy
    return np.array([x**2 + y, x*y])

def jacobian(xy):
    x, y = xy
    return np.array([[2*x, 1], [y, x]])

pt = np.array([1.5, 0.5])
print("g(pt):", g(pt))
print("J(pt):\n", jacobian(pt))

Linear algebra fuel: projections and SVD

Principal component analysis (PCA) is just the singular value decomposition (SVD): $X = U\Sigma V^T$ . The top right singular vectors in $V$ are the principal directions.

rng = np.random.default_rng(7)
X = rng.normal(size=(6, 3))  # 6 samples, 3 features

# center
Xc = X - X.mean(axis=0, keepdims=True)

# SVD
U, S, Vt = np.linalg.svd(Xc, full_matrices=False)

print("singular values:", S)
print("first principal direction:", Vt[0])

# project to 2D
X2 = Xc @ Vt[:2].T
print("projected shape:", X2.shape)

Projection of a vector $v$ onto a direction $u$ is:

\text{proj}_u(v) = \frac{v \cdot u}{\lVert u \rVert^2} u

v = np.array([2.0, 1.0, -1.0])
u = Vt[0]  # principal direction
proj = (v @ u) / (u @ u) * u
print("projection:", proj)

graph LR;
    Data["High-dimensional data X"] --> Center["Center columns"];
    Center --> SVD["SVD: X = U Σ Vᵀ"];
    SVD --> PCs["Take top k rows of Vᵀ (principal directions)"];
    PCs --> Project["Project: X · V_kᵀ"];
    Project --> Embeddings["Lower-dimensional embeddings"];

Why this matters for AI

Gradients drive optimizers (SGD, Adam); Jacobians underpin backprop.
SVD/PCA reduces dimensionality and denoises embeddings.
Projections help in retrieval and similarity search by isolating informative axes.

If you keep these primitives sharp, most model code becomes easier to reason about and debug.