"Newton-Raphson: When Fast Convergence Becomes a Liability"
BusinessMath Quarterly Series
10 min read
Part 33 of 12-Week BusinessMath Series
What You’ll Learn
- How Newton-Raphson achieves quadratic convergence
- Why it’s the gold standard for small, smooth problems
- When numerical Hessians cause crashes and NaN propagation
- Portfolio optimization: A case study in Newton-Raphson failure
- Practical rules for choosing Newton-Raphson vs alternatives
- What “numerically unstable” really means in practice
The Promise: Quadratic Convergence
Newton-Raphson is the Ferrari of optimization algorithms:
- Converges in 3-5 iterations (vs 100+ for gradient descent)
- Uses second-order information (Hessian matrix)
- Near-optimal for smooth, unconstrained problems
Example: Finding the minimum of f(x, y) = (x - 1)² + (y - 2)²
import BusinessMath
import Foundation
// Simple quadratic objective
let simpleObjective: (VectorN
) -> Double = { v in let x = v[0] - 1.0 let y = v[1] - 2.0 return x*x + y*y } let nrOptimizer = MultivariateNewtonRaphson
>( maxIterations: 10, tolerance: 1e-8 ) do { let result = try nrOptimizer.minimize( function: simpleObjective, gradient: { try numericalGradient(simpleObjective, at: $0) }, hessian: { try numericalHessian(simpleObjective, at: $0) }, initialGuess: VectorN([0.0, 0.0]) ) print("Newton-Raphson on Simple Quadratic:") print(" Solution: [\(result.solution[0].number(6)), \(result.solution[1].number(6))]") print(" Iterations: \(result.iterations)") print(" Converged: \(result.converged)") print(" Final value: \(result.value.number(10))") }
Output:
Newton-Raphson on Simple Quadratic:
Solution: [1.000000, 2.000000]
Iterations: 1
Converged: true
Final value: 0.0000000000
Perfect! One iteration to machine precision. For smooth quadratics, Newton-Raphson is unbeatable.
The Problem: When Theory Meets Reality
Now let’s try Newton-Raphson on a real business problem: portfolio optimization with Sharpe ratio maximization.
The Crash
import BusinessMath
import Foundation
let assets = ["US Stocks", "Intl Stocks", "Bonds", "Real Estate"]
let expectedReturns = VectorN([0.10, 0.12, 0.04, 0.09])
let riskFreeRate = 0.03
let covarianceMatrix = [
[0.0400, 0.0150, 0.0020, 0.0180],
[0.0150, 0.0625, 0.0015, 0.0200],
[0.0020, 0.0015, 0.0036, 0.0010],
[0.0180, 0.0200, 0.0010, 0.0400]
]
// Portfolio Sharpe ratio (the objective that crashes Newton-Raphson)
let portfolioObjective: (VectorN
) -> Double = { weights in let expectedReturn = weights.dot(expectedReturns) var variance = 0.0 for i in 0..
>( maxIterations: 100, tolerance: 1e-6 ) do { print("Attempting Newton-Raphson on Portfolio Optimization...") print("(This will likely crash or timeout)\n") let result = try nrOptimizer.minimize( function: portfolioObjective, gradient: { try numericalGradient(portfolioObjective, at: $0) }, hessian: { try numericalHessian(portfolioObjective, at: $0) }, initialGuess: VectorN.equalWeights(dimension: 4) ) print("Somehow succeeded:") print(" Solution: \(result.solution.toArray().map { $0.percent() })") } catch { print("Newton-Raphson FAILED (as expected):") print(" Error: \(error)") }
What happens:
- Playground crash: Thread exits during execution
- Or: Takes >2 minutes and times out
- Or: Returns NaN values that propagate through calculation
Why Newton-Raphson Crashes: A Deep Dive
1. Computational Explosion
Hessian matrix for n variables:
- Requires n² second-order partial derivatives
- Each second derivative needs ~5 function evaluations
- 4 variables = 16 × 5 = 80 function calls per iteration
// What numericalHessian actually does internally:
func numericalHessian
( _ f: (V) -> Double, at x: V, h: Double = 1e-5 ) throws -> [[Double]] where V.Scalar == Double { let n = x.toArray().count var hessian = [[Double]](repeating: [Double](repeating: 0, count: n), count: n) for i in 0..
For portfolio optimization:
80 evaluations × 20 iterations = 1,600 Sharpe ratio calculationsEach calculation involves matrix multiplication and sqrtTotal time: minutes instead of seconds
2. Numerical Instability: Division by Near-Zero
The Sharpe ratio formula:
let sharpeRatio = (expectedReturn - riskFreeRate) / sqrt(variance)
What goes wrong:
// During Hessian computation, we perturb weights: weights[0] += h // Tiny perturbation (1e-5) // This might create an invalid portfolio: // [0.25001, 0.25, 0.25, 0.25] → variance changes unpredictably // If variance becomes tiny (0.0001): let risk = sqrt(0.0001) // = 0.01 let sharpe = 0.07 / 0.01 // = 7.0 (huge!) // Then another perturbation: weights[1] += h // Now variance = 0.00001 let risk2 = sqrt(0.00001) // = 0.003 let sharpe2 = 0.07 / 0.003 // = 23.3 (even bigger!) // The second derivative of 1/sqrt(variance) explodes // Result: ∂²f/∂w² ≈ 10^6 or NaN
3. Constraint Violations During Perturbation
The problem:
// Your constraints: weights sum to 1, all ≥ 0 let weights = VectorN([0.25, 0.25, 0.25, 0.25]) // Valid // During numerical differentiation: weights[0] -= h // = 0.24999 // Still valid, but sum ≠ 1.0 // Multiple perturbations compound: weights[0] -= h weights[1] -= h // Now sum = 0.99998, and covariance calculation is slightly off // Eventually: weights[2] = -0.00001 // INVALID! Negative weight // Matrix multiplication with negative weights → meaningless variance
4. Playground Execution Limits
// Playgrounds have hard limits: // - 2 minute timeout // - Limited memory for intermediate calculations // - Can't recover from NaN propagation // When Newton-Raphson encounters NaN: let hessian = try numericalHessian(portfolioObjective, at: weights) // hessian[2][3] = NaN // Matrix inversion fails: let hessianInverse = try invertMatrix(hessian) // Throws or returns garbage // Next iteration uses garbage: weights = weights - learningRate * hessianInverse * gradient // All NaN // Playground crashes with "Thread exited"
When to Use Newton-Raphson: Decision Tree
Is your problem smooth and twice-differentiable?
├─ NO → Don't use Newton-Raphson
│ Use: Gradient descent, genetic algorithms, simulated annealing
│
└─ YES → How many variables?
├─ > 10 variables → Don't use Newton-Raphson (too expensive)
│ Use: BFGS, L-BFGS, conjugate gradient
│
└─ ≤ 10 variables → Do you have analytical Hessian?
├─ NO (numerical Hessian) → Is objective numerically stable?
│ ├─ NO (involves 1/x, sqrt, exp) → Don't use Newton-Raphson
│ │ Use: BFGS (approximate Hessian)
│ │
│ └─ YES (simple polynomial) → Are there constraints?
│ ├─ YES → Don't use Newton-Raphson
│ │ Use: Constrained optimizer, penalty methods
│ │
│ └─ NO → ✓ USE NEWTON-RAPHSON
│ (Fast convergence, no issues)
│
└─ YES (analytical Hessian) → ✓ USE NEWTON-RAPHSON
(Best case scenario)
Safe Use Cases for Newton-Raphson
1. Simple Unconstrained Quadratics
✓ Perfect for Newton-Raphson:
// Least-squares regression: minimize ||Ax - b||²
let leastSquares: (VectorN
) -> Double = { x in let residual = matrixMultiply(A, x) - b return residual.dot(residual) } // Analytical gradient: ∇f = 2Aᵀ(Ax - b) let gradient: (VectorN
) -> VectorN
= { x in let residual = matrixMultiply(A, x) - b return 2.0 * matrixMultiply(A_transpose, residual) } // Analytical Hessian: H = 2AᵀA (constant!) let hessian: (VectorN
) -> [[Double]] = { _ in return 2.0 * matrixMultiply(A_transpose, A) } let result = try nrOptimizer.minimize( function: leastSquares, gradient: gradient, hessian: hessian, initialGuess: VectorN(repeating: 0.0, count: numVariables) ) // Converges in 1 iteration (Hessian is constant)
2. Small-Dimensional Root Finding
✓ Newton-Raphson for solving f(x) = 0:
// Find interest rate r where NPV = 0
// NPV(r) = Σ (cashFlow[t] / (1 + r)^t) - initialInvestment
let npvObjective: (VectorN
) -> Double = { v in let r = v[0] var npv = -initialInvestment for (t, cashFlow) in cashFlows.enumerated() { npv += cashFlow / pow(1.0 + r, Double(t + 1)) } return npv * npv // Minimize squared NPV (find root) } // Only 1 variable, smooth function, no constraints let result = try nrOptimizer.minimize( function: npvObjective, gradient: { try numericalGradient(npvObjective, at: $0) }, hessian: { try numericalHessian(npvObjective, at: $0) }, initialGuess: VectorN([0.10]) // Start at 10% IRR ) print("Internal Rate of Return: \(result.solution[0].percent(2))")
3. Maximum Likelihood with Analytical Derivatives
✓ When you have closed-form derivatives:
// Normal distribution MLE: maximize log-likelihood
// ℓ(μ, σ) = -n/2 log(2π) - n log(σ) - Σ(xᵢ - μ)²/(2σ²)
let logLikelihood: (VectorN
) -> Double = { params in let mu = params[0] let sigma = params[1] var sumSquares = 0.0 for x in data { sumSquares += (x - mu) * (x - mu) } return -Double(data.count) * log(sigma) - sumSquares / (2 * sigma * sigma) } // Analytical gradient and Hessian available from statistics textbook let gradient: (VectorN
) -> VectorN
= { params in // ... closed-form derivatives } let hessian: (VectorN
) -> [[Double]] = { params in // ... closed-form second derivatives } // Fast convergence with analytical derivatives let result = try nrOptimizer.minimize( function: { -logLikelihood($0) }, // Maximize = minimize negative gradient: gradient, hessian: hessian, initialGuess: VectorN([sampleMean, sampleStdDev]) )
Dangerous Use Cases: When Newton-Raphson Crashes
1. Portfolio Optimization (Sharpe Ratio)
✗ Crashes due to 1/sqrt(variance):
// Sharpe ratio: (return - rf) / sqrt(variance)
// Second derivative of 1/sqrt(x) → explodes near zero
// Result: NaN propagation, playground crash
// USE INSTEAD: BFGS, gradient descent, or constrained optimizers
let optimizer = InequalityOptimizer
>() // Safe alternative
2. Constrained Problems
✗ Perturbations violate constraints:
// Weights must sum to 1 and be ≥ 0
// Numerical Hessian perturbs weights → temporarily invalid
// Matrix calculations with invalid weights → garbage
// USE INSTEAD: ConstrainedOptimizer, penalty methods
let optimizer = ConstrainedOptimizer
>()
3. Large-Scale Problems (>10 variables)
✗ Hessian computation too expensive:
// 100 variables → 10,000 second derivatives
// 10,000 × 5 evaluations = 50,000 function calls per iteration
// Minutes to hours of computation
// USE INSTEAD: BFGS (approximates Hessian), L-BFGS (memory-efficient)
let optimizer = AdaptiveOptimizer
>() // Chooses BFGS for you
4. Non-Smooth Objectives
✗ Discontinuities break second derivatives:
// Transaction costs: cost = |turnover| * rate
// Absolute value is not differentiable at 0
// Numerical Hessian returns garbage near discontinuities
// USE INSTEAD: Genetic algorithms, simulated annealing
let optimizer = ParallelOptimizer
>(algorithm: .gradientDescent(learningRate: 0.01))
Practical Alternatives
When Newton-Raphson Fails → Use BFGS
BFGS approximates the Hessian using gradient information:
// BusinessMath doesn't expose BFGS directly yet, but AdaptiveOptimizer
// uses BFGS-like methods internally for medium-sized problems
let optimizer = AdaptiveOptimizer
>( preferAccuracy: true, // Use sophisticated methods maxIterations: 1000, tolerance: 1e-6 ) // For portfolio optimization: let result = try optimizer.optimize( objective: portfolioObjective, initialGuess: VectorN.equalWeights(dimension: 4), constraints: constraints ) // AdaptiveOptimizer detects: // - 4 variables → small enough for advanced methods // - Has constraints → uses InequalityOptimizer (not Newton-Raphson) // - Result: Safe, fast convergence without crashes
When You Need Guaranteed Stability → Gradient Descent
Gradient descent never crashes (just slower):
let safeOptimizer = MultivariateGradientDescent
>( learningRate: 0.01, // Conservative step size maxIterations: 2000, // More iterations needed tolerance: 1e-6 ) let result = try safeOptimizer.minimize( function: portfolioObjective, gradient: { try numericalGradient(portfolioObjective, at: $0) }, initialGuess: VectorN.equalWeights(dimension: 4) ) // Takes 100-200 iterations instead of 5 // But guaranteed to converge without crashing
Key Takeaways
The Good
Newton-Raphson is unbeatable when:
- Small problem (≤5 variables)
- Smooth, twice-differentiable objective
- No constraints
- Analytical Hessian available (or numerically stable)
- Need fastest possible convergence
Example: Least-squares regression, simple curve fitting, root finding
The Bad
Newton-Raphson crashes when:
- Numerical Hessian on unstable objectives (1/x, sqrt, exp)
- Constraints that get violated during perturbation
- Large problems (>10 variables) → too expensive
- Non-smooth objectives (absolute value, max/min)
Example: Portfolio optimization, constrained problems, non-convex objectives
The Solution
Let AdaptiveOptimizer choose for you:
let optimizer = AdaptiveOptimizer
>() // Automatically selects: // - Newton-Raphson for tiny, smooth problems (≤5 vars, unconstrained) // - Gradient descent for medium problems (10-100 vars) // - InequalityOptimizer for constrained problems // - Never crashes, always picks appropriate algorithm let result = try optimizer.optimize( objective: yourObjective, initialGuess: yourInitialGuess, constraints: yourConstraints )
Experiments to Try
- Crash Test: Try Newton-Raphson on portfolio optimization. Watch it fail. Then try AdaptiveOptimizer.
- Variable Scaling: Test Newton-Raphson on 2, 4, 8, 16 variables. When does it become impractical?
- Constraint Impact: Add constraints to a simple quadratic. See how perturbations violate them.
- Numerical Stability: Test Newton-Raphson on
f(x) = 1/x² near x=0. See NaN propagation.
Next Steps
Next Week: Week 10 explores Performance Benchmarking - systematically comparing algorithms on your specific problems, not just theory.
Coming Soon: Advanced optimization algorithms including BFGS, L-BFGS, conjugate gradient, and when to use each.
Series: [Week 9 of 12] | Topic: [Part 5 - Business Applications] | Completed: [5/6]
Topics Covered: Newton-Raphson • Numerical Hessians • Algorithm stability • When fast isn’t best • Practical algorithm selection
Key Insight: The fastest algorithm isn’t always the best algorithm. Stability matters more than speed for real-world problems.
Tagged with: optimization