Simulated Annealing: Global Optimization Without Gradients

BusinessMath Quarterly Series

20 min read

Part 36 of 12-Week BusinessMath Series

What You’ll Learn

Understanding simulated annealing for global optimization
Cooling schedules: exponential, linear, adaptive
Acceptance probability and the Metropolis criterion
When to use simulated annealing vs. gradient methods
Escaping local minima through controlled randomness
Parameter tuning: initial temperature, cooling rate, iterations

The Problem

Many business optimization problems have multiple local minima:

Production scheduling with setup costs (discontinuous objective)
Portfolio optimization with transaction costs and lot sizes
Facility location with discrete choices (city A vs. city B)
Hyperparameter tuning for machine learning models

Gradient-based methods get stuck in local minima. Need global search capability.

The Solution

Simulated annealing mimics the physical process of metal cooling: accept worse solutions with probability that decreases over time. This allows escaping local minima early while converging to global optimum later.

Pattern 1: Portfolio Optimization with Transaction Costs

Business Problem: Rebalance portfolio from suboptimal starting point, but minimize transaction costs (non-smooth objective).

import BusinessMath

// Portfolio with transaction costs (20 assets for clarity)
let numAssets = 20

// Create a suboptimal starting portfolio: heavily concentrated in first 5 assets
// These happen to be LOW return assets - clear opportunity to improve!
var currentWeights = [Double](repeating: 0.0, count: numAssets)
// First 5 assets: 60% of portfolio (12% each) - these are low-return!
for i in 0..<5 {
    currentWeights[i] = 0.12
}
// Remaining 15 assets: 40% of portfolio (2.67% each) - these include high-return!
for i in 5..
            
              = 5 && i < 15 && j >= 5 && j < 15) || (i >= 15 && j >= 15) return sameTier ? Double.random(in: 0.5...0.7) : Double.random(in: 0.2...0.4) } } func portfolioObjective(_ weights: VectorN
              
                ) -> Double { // 1. Portfolio expected return (we want to MAXIMIZE this) var portfolioReturn = 0.0 for i in 0..
                
                  >( config: config, searchSpace: searchSpace ) // Create initial guess from current weights let initialGuess = VectorN(currentWeights) // Define constraints (FIXED API - no 'constraint:' or 'tolerance:' parameters!) let constraints: [MultivariateConstraint
                  
                    >] = [ // Equality: Sum to 1 (must be fully invested) .equality { weights in (0..
                    
                       abs($1.change) }.prefix(8) print("\nTop 8 Position Changes:") print("───────────────────────────────────────────────────────────") print("Asset Return Old Weight New Weight Change Action") print("───────────────────────────────────────────────────────────") for change in changes { let direction = change.change > 0 ? "BUY " : "SELL" print(String(format: " %2d %5.2f%% %6.2f%% %6.2f%% %+6.2f%% %s", change.index, change.return * 100, change.oldWeight * 100, change.newWeight * 100, change.change * 100, direction)) } print("═══════════════════════════════════════════════════════════") print("\n💡 Key Insight:") print(" The optimizer balanced improving returns (moving to high-return assets)") print(" with minimizing transaction costs. Notice it didn't eliminate all") print(" low-return holdings - the transaction costs made that too expensive.")

Pattern 2: Configuration Comparison

Pattern: Compare different cooling configurations.

// Simple test function: Rastrigin in 2D (many local minima)
func rastrigin(_ x: VectorN
            
              ) -> Double { let A = 10.0 let n = 2.0 return A * n + (0..<2).reduce(0.0) { sum, i in sum + (x[i] * x[i] - A * cos(2 * .pi * x[i])) } } let searchSpace = [(-5.12, 5.12), (-5.12, 5.12)] // Test different configurations let configs: [(name: String, config: SimulatedAnnealingConfig)] = [ ("Fast", .fast), ("Default", .default), ("Thorough", .thorough), ("Custom (slow)", SimulatedAnnealingConfig( initialTemperature: 100.0, finalTemperature: 0.001, coolingRate: 0.98, // Slower cooling maxIterations: 20_000, perturbationScale: 0.1, reheatInterval: nil, reheatTemperature: nil, seed: 42 )) ] print("Configuration Comparison (Rastrigin Function)") print("═══════════════════════════════════════════════════════════") print("Config | Final Value | Iterations | Acceptance Rate") print("───────────────────────────────────────────────────────────") for (name, config) in configs { let optimizer = SimulatedAnnealing
              
                >( config: config, searchSpace: searchSpace ) let result = optimizer.optimizeDetailed( objective: rastrigin, initialSolution: VectorN([2.0, 3.0]) ) let rate = result.acceptanceRate * 100 print("\(name.padding(toLength: 15, withPad: " ", startingAt: 0)) | " + "\(result.fitness.formatted(.number.precision(.fractionLength(6))).padding(toLength: 11, withPad: " ", startingAt: 0)) | " + "\(String(format: "%10d", result.iterations)) | " + "\(rate.formatted(.number.precision(.fractionLength(1))))%") } print("\nRecommendation: Use .default for most problems, .thorough for difficult landscapes")

Pattern 3: Ackley Function (Multimodal Optimization)

Pattern: Global optimization for highly multimodal functions.

// Ackley function: highly multimodal with many local minima
// Global minimum at (0, 0) with value 0
func ackley(_ x: VectorN
            
              ) -> Double { let a = 20.0 let b = 0.2 let c = 2.0 * .pi let d = 2 // dimensions let sum1 = (0..
              
                >( config: .thorough, searchSpace: searchSpace ) print("Ackley Function Optimization (Highly Multimodal)") print("═══════════════════════════════════════════════════════════") // Start from a poor initial guess let initialGuess = VectorN([4.0, -3.5]) let result = sa.optimizeDetailed( objective: ackley, initialSolution: initialGuess ) print("Optimization Results:") print(" Solution: (\(result.solution[0].formatted(.number.precision(.fractionLength(4)))), " + "\(result.solution[1].formatted(.number.precision(.fractionLength(4)))))") print(" Function Value: \(result.fitness.formatted(.number.precision(.fractionLength(6)))) (target: 0.0)") print(" Iterations: \(result.iterations)") print(" Acceptance Rate: \((result.acceptanceRate * 100).formatted(.number.precision(.fractionLength(1))))%") print(" Converged: \(result.converged)") print(" Reason: \(result.convergenceReason)") // Distance from global optimum let distanceFromOptimum = sqrt(result.solution[0]*result.solution[0] + result.solution[1]*result.solution[1]) print(" Distance from global optimum: \(distanceFromOptimum.formatted(.number.precision(.fractionLength(4))))")

How It Works

Simulated Annealing Algorithm

Initialize: Set T = T_0, x = x_0
Generate Neighbor: x’ = random perturbation of x
Calculate ΔE: ΔE = f(x’) - f(x)
Accept/Reject:
- If ΔE < 0: Always accept (improvement)
- Else: Accept with probability P = exp(-ΔE / T)
Cool Down: T = α * T (exponential) or T = T - β (linear)
Repeat: Until T < T_min or max iterations

Acceptance Probability

Metropolis Criterion: P(accept worse solution) = exp(-ΔE / T)

Temperature	ΔE = 0.1	ΔE = 1.0	ΔE = 10.0
T = 10	99.0%	90.5%	36.8%
T = 1	90.5%	36.8%	0.005%
T = 0.1	36.8%	0.005%	~0%

Insight: Early (high T), accept almost anything. Late (low T), only accept improvements.

Cooling Schedule Impact

Problem: 100-variable portfolio optimization

Schedule	Final Value	Iterations	Quality
Fast (α=0.90)	0.0245	2,500	Good
Medium (α=0.95)	0.0238	5,800	Better
Slow (α=0.98)	0.0235	18,000	Best
Adaptive	0.0237	7,200	Better

Tradeoff: Slower cooling = better solution, more time

Real-World Application

Manufacturing: Batch Sizing with Setup Costs

Company: Electronics manufacturer optimizing production batch sizes for 15 products Challenge: Minimize total costs (inventory + setup) subject to demand and capacity

Problem Characteristics:

Continuous variables: Batch sizes (treated as continuous for optimization)
Setup costs: Fixed cost per production run (non-smooth)
Holding costs: Penalty for excess inventory (smooth)
Multiple local minima: ~100+ feasible configurations

Why Simulated Annealing:

Setup costs create discontinuous objective
Global search needed (many local minima)
Can escape poor local solutions

Implementation:

import BusinessMath

import BusinessMath

// Model with 15 products
let numProducts = 15

// Weekly demand per product (units/week)
let demand: [Double] = [
    100.0, 150.0, 80.0, 200.0, 120.0,  // Products 0-4
    90.0, 175.0, 110.0, 140.0, 95.0,   // Products 5-9
    160.0, 85.0, 130.0, 105.0, 145.0   // Products 10-14
]

// Fixed setup cost per production run ($)
let setupCost: [Double] = [
    500.0, 750.0, 400.0, 850.0, 600.0,  // Products 0-4
    450.0, 800.0, 550.0, 700.0, 480.0,  // Products 5-9
    720.0, 420.0, 650.0, 520.0, 680.0   // Products 10-14
]

// Holding cost per unit per week ($/unit/week)
let holdingCost: [Double] = [
    2.0, 3.5, 1.8, 4.0, 2.5,    // Products 0-4
    1.9, 3.8, 2.2, 3.0, 2.0,    // Products 5-9
    3.3, 1.7, 2.8, 2.1, 3.2     // Products 10-14
]

func totalCost(_ batchSizes: VectorN
            
              ) -> Double { var cost = 0.0 for i in 0..
              
                >( config: config, searchSpace: searchSpace ) print("\nRunning Simulated Annealing...") let result = sa.optimizeDetailed( objective: totalCost, initialSolution: naiveBatches ) print("\nOptimization Results:") print(" Converged: \(result.converged)") print(" Iterations: \(result.iterations)") print(" Final Temperature: \(result.finalTemperature.formatted(.number.precision(.fractionLength(4))))") print(" Acceptance Rate: \(result.acceptanceRate.percent(1))") let optimizedCost = result.fitness let costReduction = naiveCost - optimizedCost let percentReduction = (costReduction / naiveCost) * 100 let weeklySavings = costReduction print("\n═══════════════════════════════════════════════════════════") print("Cost Comparison:") print("═══════════════════════════════════════════════════════════") print("Naive Approach: $\(naiveCost.formatted(.number.precision(.fractionLength(2))))") print("Optimized (SA): $\(optimizedCost.formatted(.number.precision(.fractionLength(2))))") print("Cost Reduction: $\(costReduction.formatted(.number.precision(.fractionLength(2)))) (\(percentReduction.formatted(.number.precision(.fractionLength(1))))%)") print("Weekly Savings: $\(weeklySavings.formatted(.number.precision(.fractionLength(2))))") print("═══════════════════════════════════════════════════════════") // Show optimal batch sizes for top 5 products by demand let productInfo = (0..
                
                   $1.demand } print("\nOptimal Batch Sizes (Top 5 by Demand):") print("───────────────────────────────────────────────────────────") print("Product Demand Optimal Batch Runs/Week Setup $ Hold $") print("───────────────────────────────────────────────────────────") for info in productInfo.prefix(5) { let runsPerWeek = info.demand / info.optimalBatch let setupCostWeekly = runsPerWeek * info.setupCost let holdCostWeekly = (info.optimalBatch / 2.0) * info.holdingCost print(String(format: " %2d %5.0f %6.1f %5.2f $%5.0f $%5.0f", info.id, info.demand, info.optimalBatch, runsPerWeek, setupCostWeekly, holdCostWeekly)) } print("═══════════════════════════════════════════════════════════")

Typical Results:

Cost reduction: 10-15% vs. naive approach (batch size = demand)
Weekly savings: $6,000-$10,000 depending on product mix
Computation time: 30-60 seconds (acceptable for weekly planning)
Solution quality: Near-optimal (escapes local minima that gradient methods miss)

Try It Yourself

Click to expand full playground code

import Foundation
import BusinessMath

// Portfolio with transaction costs (20 assets for clarity)
let numAssets = 20

// Create a suboptimal starting portfolio: heavily concentrated in first 5 assets
// These happen to be LOW return assets - clear opportunity to improve!
var currentWeights = [Double](repeating: 0.0, count: numAssets)
// First 5 assets: 60% of portfolio (12% each) - these are low-return!
for i in 0..<5 {
    currentWeights[i] = 0.12
}
// Remaining 15 assets: 40% of portfolio (2.67% each) - these include high-return!
for i in 5..
              
                = 5 && i < 15 && j >= 5 && j < 15) || (i >= 15 && j >= 15) return sameTier ? Double.random(in: 0.5...0.7) : Double.random(in: 0.2...0.4) } } @MainActor func portfolioObjective(_ weights: VectorN
                
                  ) -> Double { // 1. Portfolio expected return (we want to MAXIMIZE this) var portfolioReturn = 0.0 for i in 0..
                  
                    >( config: config, searchSpace: searchSpace ) // Create initial guess from current weights let initialGuess = VectorN(currentWeights) // Define constraints let constraints: [MultivariateConstraint
                    
                      >] = [ // Equality: Sum to 1 (must be fully invested) .equality { weights in (0..
                      
                         abs($1.change) }.prefix(8) print("\nTop 8 Position Changes:") print("───────────────────────────────────────────────────────────") print("Asset Return Old Weight New Weight Change Action") print("───────────────────────────────────────────────────────────") for change in changes { let direction = change.change > 0 ? "BUY " : "SELL" print("\("\(change.index)".paddingLeft(toLength: 4))\(change.return.percent().paddingLeft(toLength: 9))\(change.oldWeight.percent(2).paddingLeft(toLength: 13))\(change.newWeight.percent().paddingLeft(toLength: 12))\(change.change.percent(2).paddingLeft(toLength: 8))\(direction.paddingLeft(toLength: 12))") } print("═══════════════════════════════════════════════════════════") print("\n💡 Key Insight:") print(" The optimizer balanced improving returns (moving to high-return assets)") print(" with minimizing transaction costs. Notice it didn't eliminate all") print(" low-return holdings - the transaction costs made that too expensive.") // MARK: - Configuration Comparison // Simple test function: Rastrigin in 2D (many local minima) func rastrigin(_ x: VectorN
                        
                          ) -> Double { let A = 10.0 let n = 2.0 return A * n + (0..<2).reduce(0.0) { sum, i in sum + (x[i] * x[i] - A * cos(2 * .pi * x[i])) } } let searchSpace_config = [(-5.12, 5.12), (-5.12, 5.12)] // Test different configurations let configs: [(name: String, config: SimulatedAnnealingConfig)] = [ ("Fast", .fast), ("Default", .default), ("Thorough", .thorough), ("Custom (slow)", SimulatedAnnealingConfig( initialTemperature: 100.0, finalTemperature: 0.001, coolingRate: 0.98, // Slower cooling maxIterations: 20_000, perturbationScale: 0.1, reheatInterval: nil, reheatTemperature: nil, seed: 42 )) ] print("Configuration Comparison (Rastrigin Function)") print("═══════════════════════════════════════════════════════════") print("Config | Final Value | Iterations | Acceptance Rate") print("───────────────────────────────────────────────────────────") for (name, config) in configs { let optimizer = SimulatedAnnealing
                          
                            >( config: config, searchSpace: searchSpace_config ) let result = optimizer.optimizeDetailed( objective: rastrigin, initialSolution: VectorN([2.0, 3.0]) ) let rate = result.acceptanceRate print("\(name.padding(toLength: 15, withPad: " ", startingAt: 0)) | " + "\(result.fitness.formatted(.number.precision(.fractionLength(6))).padding(toLength: 11, withPad: " ", startingAt: 0)) | " + "\(String(format: "%10d", result.iterations)) | " + "\(rate.percent(1))") } print("\nRecommendation: Use .default for most problems, .thorough for difficult landscapes") // MARK: - Ackley Function (Multimodal Optimization) // Ackley function: highly multimodal with many local minima // Global minimum at (0, 0) with value 0 func ackley(_ x: VectorN
                            
                              ) -> Double { let a = 20.0 let b = 0.2 let c = 2.0 * .pi let d = 2 // dimensions let sum1 = (0..
                              
                                >( config: .thorough, searchSpace: searchSpace_ackley ) print("Ackley Function Optimization (Highly Multimodal)") print("═══════════════════════════════════════════════════════════") // Start from a poor initial guess let initialGuess_ackley = VectorN([4.0, -3.5]) let result_ackley = sa_ackley.optimizeDetailed( objective: ackley, initialSolution: initialGuess_ackley ) print("Optimization Results:") print(" Solution: (\(result_ackley.solution[0].number(4)), " + "\(result_ackley.solution[1].number(4)))") print(" Function Value: \(result_ackley.fitness.number(6)) (target: 0.0)") print(" Iterations: \(result_ackley.iterations)") print(" Acceptance Rate: \(result_ackley.acceptanceRate.percent(1))") print(" Converged: \(result_ackley.converged)") print(" Reason: \(result_ackley.convergenceReason)") // Distance from global optimum let distanceFromOptimum = sqrt(result_ackley.solution[0]*result_ackley.solution[0] + result_ackley.solution[1]*result_ackley.solution[1]) print(" Distance from global optimum: \(distanceFromOptimum.number(4))") // MARK: - Real-World Example: Manufacturing Batch Sizing with Setup Costs print("\n\nManufacturing Batch Sizing Optimization") print("═══════════════════════════════════════════════════════════") print("\nProblem: 15 products with setup costs and inventory holding costs") // Model with 15 products let numProducts_batch = 15 // Weekly demand per product (units/week) let demand_batch: [Double] = [ 100.0, 150.0, 80.0, 200.0, 120.0, // Products 0-4 90.0, 175.0, 110.0, 140.0, 95.0, // Products 5-9 160.0, 85.0, 130.0, 105.0, 145.0 // Products 10-14 ] // Fixed setup cost per production run ($) let setupCost_batch: [Double] = [ 500.0, 750.0, 400.0, 850.0, 600.0, // Products 0-4 450.0, 800.0, 550.0, 700.0, 480.0, // Products 5-9 720.0, 420.0, 650.0, 520.0, 680.0 // Products 10-14 ] // Holding cost per unit per week ($/unit/week) let holdingCost_batch: [Double] = [ 2.0, 3.5, 1.8, 4.0, 2.5, // Products 0-4 1.9, 3.8, 2.2, 3.0, 2.0, // Products 5-9 3.3, 1.7, 2.8, 2.1, 3.2 // Products 10-14 ] func totalCost_batch(_ batchSizes: VectorN
                                
                                  ) -> Double { var cost = 0.0 for i in 0..
                                  
                                    >( config: config_batch, searchSpace: searchSpace_batch ) print("\nRunning Simulated Annealing...") let result_batch = sa_batch.optimizeDetailed( objective: totalCost_batch, initialSolution: naiveBatches ) print("\nOptimization Results:") print(" Converged: \(result_batch.converged)") print(" Iterations: \(result_batch.iterations)") print(" Final Temperature: \(result_batch.finalTemperature.number(4))") print(" Acceptance Rate: \(result_batch.acceptanceRate.percent(1))") let optimizedCost = result_batch.fitness let costReduction = naiveCost - optimizedCost let percentReduction = (costReduction / naiveCost) * 100 let weeklySavings = costReduction print("\n═══════════════════════════════════════════════════════════") print("Cost Comparison:") print("═══════════════════════════════════════════════════════════") print("Naive Approach: \(naiveCost.currency())") print("Optimized (SA): \(optimizedCost.currency())") print("Cost Reduction: \(costReduction.currency()) (\(percentReduction.number(1))%)") print("Weekly Savings: \(weeklySavings.currency())") print("═══════════════════════════════════════════════════════════") // Show optimal batch sizes for top 5 products by demand let productInfo = (0..
                                    
                                       $1.demand } print("\nOptimal Batch Sizes (Top 5 by Demand):") print("─────────────────────────────────────────────────────────────") print("Product Demand Optimal Batch Runs/Week Setup $ Hold $") print("─────────────────────────────────────────────────────────────") for info in productInfo.prefix(5) { let runsPerWeek = info.demand / info.optimalBatch let setupCostWeekly = runsPerWeek * info.setupCost let holdCostWeekly = (info.optimalBatch / 2.0) * info.holdingCost print("\("\(info.id)".paddingLeft(toLength: 7))\(info.demand.number(0).paddingLeft(toLength: 8))\(info.optimalBatch.number(1).paddingLeft(toLength: 15))\(runsPerWeek.number(2).paddingLeft(toLength: 11))\(setupCostWeekly.currency().paddingLeft(toLength: 9))\(holdCostWeekly.currency().paddingLeft(toLength: 11))") } print("═════════════════════════════════════════════════════════════")

→ Full API Reference: BusinessMath Docs – Simulated Annealing Tutorial

Experiments to Try

Temperature Tuning: Test initial temperatures 0.1, 1.0, 10.0, 100.0
Cooling Rates: Compare α = 0.90, 0.95, 0.98, 0.99
Neighbor Generation: Different perturbation strategies for portfolio
Hybrid Approach: SA for global search, then local refinement with BFGS

Next Steps

Next Week: Week 11 begins with Nelder-Mead Simplex (another gradient-free method), then Particle Swarm Optimization, concluding with Case Study #5: Real-Time Portfolio Rebalancing.

Final Week: Week 12 covers reflections (What Worked, What Didn’t) and Case Study #6: Investment Strategy DSL.

Series: [Week 10 of 12] | Topic: [Part 5 - Advanced Methods] | Case Studies: [4/6 Complete]

Topics Covered: Simulated annealing • Global optimization • Cooling schedules • Metropolis criterion • Discrete optimization

Playgrounds: [Week 1-10 available] • [Next week: Nelder-Mead and particle swarm]

Tagged with: metaheuristics