Simulated Annealing: Global Optimization Without Gradients

BusinessMath Quarterly Series

19 min read

Part 36 of 12-Week BusinessMath Series


What You’ll Learn

  • Understanding simulated annealing for global optimization
  • Cooling schedules: exponential, linear, adaptive
  • Acceptance probability and the Metropolis criterion
  • When to use simulated annealing vs. gradient methods
  • Escaping local minima through controlled randomness
  • Parameter tuning: initial temperature, cooling rate, iterations

The Problem

Many business optimization problems have multiple local minima:

  • Production scheduling with setup costs (discontinuous objective)
  • Portfolio optimization with transaction costs and lot sizes
  • Facility location with discrete choices (city A vs. city B)
  • Hyperparameter tuning for machine learning models

Gradient-based methods get stuck in local minima. Need global search capability.


The Solution

Simulated annealing mimics the physical process of metal cooling: accept worse solutions with probability that decreases over time. This allows escaping local minima early while converging to global optimum later.

Pattern 1: Portfolio Optimization with Transaction Costs

Business Problem: Rebalance portfolio from suboptimal starting point, but minimize transaction costs (non-smooth objective).

import BusinessMath

// Portfolio with transaction costs (20 assets for clarity)
let numAssets = 20

// Create a suboptimal starting portfolio: heavily concentrated in first 5 assets
// These happen to be LOW return assets - clear opportunity to improve!
var currentWeights = [Double](repeating: 0.0, count: numAssets)
// First 5 assets: 60% of portfolio (12% each) - these are low-return!
for i in 0..<5 {
    currentWeights[i] = 0.12
}
// Remaining 15 assets: 40% of portfolio (2.67% each) - these include high-return!
for i in 5..= 5 && i < 15 && j >= 5 && j < 15) ||
                       (i >= 15 && j >= 15)
        return sameTier ? Double.random(in: 0.5...0.7) : Double.random(in: 0.2...0.4)
    }
}

func portfolioObjective(_ weights: VectorN) -> Double {
    // 1. Portfolio expected return (we want to MAXIMIZE this)
    var portfolioReturn = 0.0
    for i in 0..>(
    config: config,
    searchSpace: searchSpace
)

// Create initial guess from current weights
let initialGuess = VectorN(currentWeights)

// Define constraints (FIXED API - no 'constraint:' or 'tolerance:' parameters!)
let constraints: [MultivariateConstraint>] = [
    // Equality: Sum to 1 (must be fully invested)
    .equality { weights in
        (0.. abs($1.change) }.prefix(8)

print("\nTop 8 Position Changes:")
print("───────────────────────────────────────────────────────────")
print("Asset  Return   Old Weight  New Weight  Change      Action")
print("───────────────────────────────────────────────────────────")
for change in changes {
    let direction = change.change > 0 ? "BUY " : "SELL"
    print(String(format: " %2d    %5.2f%%    %6.2f%%    %6.2f%%   %+6.2f%%    %s",
                 change.index, change.return * 100,
                 change.oldWeight * 100, change.newWeight * 100,
                 change.change * 100, direction))
}
print("═══════════════════════════════════════════════════════════")

print("\n💡 Key Insight:")
print("   The optimizer balanced improving returns (moving to high-return assets)")
print("   with minimizing transaction costs. Notice it didn't eliminate all")
print("   low-return holdings - the transaction costs made that too expensive.")

Pattern 2: Configuration Comparison

Pattern: Compare different cooling configurations.

// Simple test function: Rastrigin in 2D (many local minima)
func rastrigin(_ x: VectorN) -> Double {
    let A = 10.0
    let n = 2.0
    return A * n + (0..<2).reduce(0.0) { sum, i in
        sum + (x[i] * x[i] - A * cos(2 * .pi * x[i]))
    }
}

let searchSpace = [(-5.12, 5.12), (-5.12, 5.12)]

// Test different configurations
let configs: [(name: String, config: SimulatedAnnealingConfig)] = [
    ("Fast", .fast),
    ("Default", .default),
    ("Thorough", .thorough),
    ("Custom (slow)", SimulatedAnnealingConfig(
        initialTemperature: 100.0,
        finalTemperature: 0.001,
        coolingRate: 0.98,  // Slower cooling
        maxIterations: 20_000,
        perturbationScale: 0.1,
        reheatInterval: nil,
        reheatTemperature: nil,
        seed: 42
    ))
]

print("Configuration Comparison (Rastrigin Function)")
print("═══════════════════════════════════════════════════════════")
print("Config          | Final Value | Iterations | Acceptance Rate")
print("───────────────────────────────────────────────────────────")

for (name, config) in configs {
    let optimizer = SimulatedAnnealing>(
        config: config,
        searchSpace: searchSpace
    )

    let result = optimizer.optimizeDetailed(
        objective: rastrigin,
        initialSolution: VectorN([2.0, 3.0])
    )

    let rate = result.acceptanceRate * 100
    print("\(name.padding(toLength: 15, withPad: " ", startingAt: 0)) | " +
          "\(result.fitness.formatted(.number.precision(.fractionLength(6))).padding(toLength: 11, withPad: " ", startingAt: 0)) | " +
          "\(String(format: "%10d", result.iterations)) | " +
          "\(rate.formatted(.number.precision(.fractionLength(1))))%")
}

print("\nRecommendation: Use .default for most problems, .thorough for difficult landscapes")

Pattern 3: Ackley Function (Multimodal Optimization)

Pattern: Global optimization for highly multimodal functions.

// Ackley function: highly multimodal with many local minima
// Global minimum at (0, 0) with value 0
func ackley(_ x: VectorN) -> Double {
    let a = 20.0
    let b = 0.2
    let c = 2.0 * .pi
    let d = 2  // dimensions

    let sum1 = (0..>(
    config: .thorough,
    searchSpace: searchSpace
)

print("Ackley Function Optimization (Highly Multimodal)")
print("═══════════════════════════════════════════════════════════")

// Start from a poor initial guess
let initialGuess = VectorN([4.0, -3.5])

let result = sa.optimizeDetailed(
    objective: ackley,
    initialSolution: initialGuess
)

print("Optimization Results:")
print("  Solution: (\(result.solution[0].formatted(.number.precision(.fractionLength(4)))), " +
      "\(result.solution[1].formatted(.number.precision(.fractionLength(4)))))")
print("  Function Value: \(result.fitness.formatted(.number.precision(.fractionLength(6)))) (target: 0.0)")
print("  Iterations: \(result.iterations)")
print("  Acceptance Rate: \((result.acceptanceRate * 100).formatted(.number.precision(.fractionLength(1))))%")
print("  Converged: \(result.converged)")
print("  Reason: \(result.convergenceReason)")

// Distance from global optimum
let distanceFromOptimum = sqrt(result.solution[0]*result.solution[0] +
                               result.solution[1]*result.solution[1])
print("  Distance from global optimum: \(distanceFromOptimum.formatted(.number.precision(.fractionLength(4))))")

How It Works

Simulated Annealing Algorithm

  1. Initialize: Set T = T_0, x = x_0
  2. Generate Neighbor: x’ = random perturbation of x
  3. Calculate ΔE: ΔE = f(x’) - f(x)
  4. Accept/Reject:
    • If ΔE < 0: Always accept (improvement)
    • Else: Accept with probability P = exp(-ΔE / T)
  5. Cool Down: T = α * T (exponential) or T = T - β (linear)
  6. Repeat: Until T < T_min or max iterations

Acceptance Probability

Metropolis Criterion: P(accept worse solution) = exp(-ΔE / T)

TemperatureΔE = 0.1ΔE = 1.0ΔE = 10.0
T = 1099.0%90.5%36.8%
T = 190.5%36.8%0.005%
T = 0.136.8%0.005%~0%

Insight: Early (high T), accept almost anything. Late (low T), only accept improvements.

Cooling Schedule Impact

Problem: 100-variable portfolio optimization

ScheduleFinal ValueIterationsQuality
Fast (α=0.90)0.02452,500Good
Medium (α=0.95)0.02385,800Better
Slow (α=0.98)0.023518,000Best
Adaptive0.02377,200Better

Tradeoff: Slower cooling = better solution, more time


Real-World Application

Manufacturing: Batch Sizing with Setup Costs

Company: Electronics manufacturer optimizing production batch sizes for 15 productsChallenge: Minimize total costs (inventory + setup) subject to demand and capacity

Problem Characteristics:

  • Continuous variables: Batch sizes (treated as continuous for optimization)
  • Setup costs: Fixed cost per production run (non-smooth)
  • Holding costs: Penalty for excess inventory (smooth)
  • Multiple local minima: ~100+ feasible configurations

Why Simulated Annealing:

  • Setup costs create discontinuous objective
  • Global search needed (many local minima)
  • Can escape poor local solutions

Implementation:

import BusinessMath

import BusinessMath

// Model with 15 products
let numProducts = 15

// Weekly demand per product (units/week)
let demand: [Double] = [
    100.0, 150.0, 80.0, 200.0, 120.0,  // Products 0-4
    90.0, 175.0, 110.0, 140.0, 95.0,   // Products 5-9
    160.0, 85.0, 130.0, 105.0, 145.0   // Products 10-14
]

// Fixed setup cost per production run ($)
let setupCost: [Double] = [
    500.0, 750.0, 400.0, 850.0, 600.0,  // Products 0-4
    450.0, 800.0, 550.0, 700.0, 480.0,  // Products 5-9
    720.0, 420.0, 650.0, 520.0, 680.0   // Products 10-14
]

// Holding cost per unit per week ($/unit/week)
let holdingCost: [Double] = [
    2.0, 3.5, 1.8, 4.0, 2.5,    // Products 0-4
    1.9, 3.8, 2.2, 3.0, 2.0,    // Products 5-9
    3.3, 1.7, 2.8, 2.1, 3.2     // Products 10-14
]

func totalCost(_ batchSizes: VectorN) -> Double {
    var cost = 0.0

    for i in 0..>(
    config: config,
    searchSpace: searchSpace
)

print("\nRunning Simulated Annealing...")
let result = sa.optimizeDetailed(
    objective: totalCost,
    initialSolution: naiveBatches
)

print("\nOptimization Results:")
print("  Converged: \(result.converged)")
print("  Iterations: \(result.iterations)")
print("  Final Temperature: \(result.finalTemperature.formatted(.number.precision(.fractionLength(4))))")
print("  Acceptance Rate: \(result.acceptanceRate.percent(1))")

let optimizedCost = result.fitness
let costReduction = naiveCost - optimizedCost
let percentReduction = (costReduction / naiveCost) * 100
let weeklySavings = costReduction

print("\n═══════════════════════════════════════════════════════════")
print("Cost Comparison:")
print("═══════════════════════════════════════════════════════════")
print("Naive Approach:      $\(naiveCost.formatted(.number.precision(.fractionLength(2))))")
print("Optimized (SA):      $\(optimizedCost.formatted(.number.precision(.fractionLength(2))))")
print("Cost Reduction:      $\(costReduction.formatted(.number.precision(.fractionLength(2)))) (\(percentReduction.formatted(.number.precision(.fractionLength(1))))%)")
print("Weekly Savings:      $\(weeklySavings.formatted(.number.precision(.fractionLength(2))))")
print("═══════════════════════════════════════════════════════════")

// Show optimal batch sizes for top 5 products by demand
let productInfo = (0.. $1.demand }

print("\nOptimal Batch Sizes (Top 5 by Demand):")
print("───────────────────────────────────────────────────────────")
print("Product  Demand  Optimal Batch  Runs/Week  Setup $  Hold $")
print("───────────────────────────────────────────────────────────")

for info in productInfo.prefix(5) {
    let runsPerWeek = info.demand / info.optimalBatch
    let setupCostWeekly = runsPerWeek * info.setupCost
    let holdCostWeekly = (info.optimalBatch / 2.0) * info.holdingCost

    print(String(format: "  %2d     %5.0f      %6.1f      %5.2f    $%5.0f   $%5.0f",
                 info.id, info.demand, info.optimalBatch, runsPerWeek,
                 setupCostWeekly, holdCostWeekly))
}
print("═══════════════════════════════════════════════════════════")

Typical Results:

  • Cost reduction: 10-15% vs. naive approach (batch size = demand)
  • Weekly savings: $6,000-$10,000 depending on product mix
  • Computation time: 30-60 seconds (acceptable for weekly planning)
  • Solution quality: Near-optimal (escapes local minima that gradient methods miss)

Try It Yourself

Click to expand full playground code
import Foundation
import BusinessMath

// Portfolio with transaction costs (20 assets for clarity)
let numAssets = 20

// Create a suboptimal starting portfolio: heavily concentrated in first 5 assets
// These happen to be LOW return assets - clear opportunity to improve!
var currentWeights = [Double](repeating: 0.0, count: numAssets)
// First 5 assets: 60% of portfolio (12% each) - these are low-return!
for i in 0..<5 {
    currentWeights[i] = 0.12
}
// Remaining 15 assets: 40% of portfolio (2.67% each) - these include high-return!
for i in 5..= 5 && i < 15 && j >= 5 && j < 15) ||
                       (i >= 15 && j >= 15)
        return sameTier ? Double.random(in: 0.5...0.7) : Double.random(in: 0.2...0.4)
    }
}

@MainActor func portfolioObjective(_ weights: VectorN) -> Double {
    // 1. Portfolio expected return (we want to MAXIMIZE this)
    var portfolioReturn = 0.0
    for i in 0..>(
    config: config,
    searchSpace: searchSpace
)

// Create initial guess from current weights
let initialGuess = VectorN(currentWeights)

// Define constraints
let constraints: [MultivariateConstraint>] = [
    // Equality: Sum to 1 (must be fully invested)
    .equality { weights in
        (0.. abs($1.change) }.prefix(8)

print("\nTop 8 Position Changes:")
print("───────────────────────────────────────────────────────────")
print("Asset  Return   Old Weight  New Weight  Change      Action")
print("───────────────────────────────────────────────────────────")
for change in changes {
    let direction = change.change > 0 ? "BUY " : "SELL"
	print("\("\(change.index)".paddingLeft(toLength: 4))\(change.return.percent().paddingLeft(toLength: 9))\(change.oldWeight.percent(2).paddingLeft(toLength: 13))\(change.newWeight.percent().paddingLeft(toLength: 12))\(change.change.percent(2).paddingLeft(toLength: 8))\(direction.paddingLeft(toLength: 12))")
}
print("═══════════════════════════════════════════════════════════")

print("\n💡 Key Insight:")
print("   The optimizer balanced improving returns (moving to high-return assets)")
print("   with minimizing transaction costs. Notice it didn't eliminate all")
print("   low-return holdings - the transaction costs made that too expensive.")


// MARK: - Configuration Comparison

	// Simple test function: Rastrigin in 2D (many local minima)
	func rastrigin(_ x: VectorN) -> Double {
		let A = 10.0
		let n = 2.0
		return A * n + (0..<2).reduce(0.0) { sum, i in
			sum + (x[i] * x[i] - A * cos(2 * .pi * x[i]))
		}
	}

	let searchSpace_config = [(-5.12, 5.12), (-5.12, 5.12)]

	// Test different configurations
	let configs: [(name: String, config: SimulatedAnnealingConfig)] = [
		("Fast", .fast),
		("Default", .default),
		("Thorough", .thorough),
		("Custom (slow)", SimulatedAnnealingConfig(
			initialTemperature: 100.0,
			finalTemperature: 0.001,
			coolingRate: 0.98,  // Slower cooling
			maxIterations: 20_000,
			perturbationScale: 0.1,
			reheatInterval: nil,
			reheatTemperature: nil,
			seed: 42
		))
	]

	print("Configuration Comparison (Rastrigin Function)")
	print("═══════════════════════════════════════════════════════════")
	print("Config          | Final Value | Iterations | Acceptance Rate")
	print("───────────────────────────────────────────────────────────")

	for (name, config) in configs {
		let optimizer = SimulatedAnnealing>(
			config: config,
			searchSpace: searchSpace_config
		)

		let result = optimizer.optimizeDetailed(
			objective: rastrigin,
			initialSolution: VectorN([2.0, 3.0])
		)

		let rate = result.acceptanceRate
		print("\(name.padding(toLength: 15, withPad: " ", startingAt: 0)) | " +
			  "\(result.fitness.formatted(.number.precision(.fractionLength(6))).padding(toLength: 11, withPad: " ", startingAt: 0)) | " +
			  "\(String(format: "%10d", result.iterations)) | " +
			  "\(rate.percent(1))")
	}

	print("\nRecommendation: Use .default for most problems, .thorough for difficult landscapes")


// MARK: - Ackley Function (Multimodal Optimization)

// Ackley function: highly multimodal with many local minima
// Global minimum at (0, 0) with value 0
func ackley(_ x: VectorN) -> Double {
	let a = 20.0
	let b = 0.2
	let c = 2.0 * .pi
	let d = 2  // dimensions

	let sum1 = (0..>(
	config: .thorough,
	searchSpace: searchSpace_ackley
)

print("Ackley Function Optimization (Highly Multimodal)")
print("═══════════════════════════════════════════════════════════")

// Start from a poor initial guess
let initialGuess_ackley = VectorN([4.0, -3.5])

let result_ackley = sa_ackley.optimizeDetailed(
	objective: ackley,
	initialSolution: initialGuess_ackley
)

print("Optimization Results:")
print("  Solution: (\(result_ackley.solution[0].number(4)), " +
	  "\(result_ackley.solution[1].number(4)))")
print("  Function Value: \(result_ackley.fitness.number(6)) (target: 0.0)")
print("  Iterations: \(result_ackley.iterations)")
print("  Acceptance Rate: \(result_ackley.acceptanceRate.percent(1))")
print("  Converged: \(result_ackley.converged)")
print("  Reason: \(result_ackley.convergenceReason)")

// Distance from global optimum
let distanceFromOptimum = sqrt(result_ackley.solution[0]*result_ackley.solution[0] +
							   result_ackley.solution[1]*result_ackley.solution[1])
print("  Distance from global optimum: \(distanceFromOptimum.number(4))")


// MARK: - Real-World Example: Manufacturing Batch Sizing with Setup Costs

print("\n\nManufacturing Batch Sizing Optimization")
print("═══════════════════════════════════════════════════════════")
print("\nProblem: 15 products with setup costs and inventory holding costs")

// Model with 15 products
let numProducts_batch = 15

// Weekly demand per product (units/week)
let demand_batch: [Double] = [
	100.0, 150.0, 80.0, 200.0, 120.0,  // Products 0-4
	90.0, 175.0, 110.0, 140.0, 95.0,   // Products 5-9
	160.0, 85.0, 130.0, 105.0, 145.0   // Products 10-14
]

// Fixed setup cost per production run ($)
let setupCost_batch: [Double] = [
	500.0, 750.0, 400.0, 850.0, 600.0,  // Products 0-4
	450.0, 800.0, 550.0, 700.0, 480.0,  // Products 5-9
	720.0, 420.0, 650.0, 520.0, 680.0   // Products 10-14
]

// Holding cost per unit per week ($/unit/week)
let holdingCost_batch: [Double] = [
	2.0, 3.5, 1.8, 4.0, 2.5,    // Products 0-4
	1.9, 3.8, 2.2, 3.0, 2.0,    // Products 5-9
	3.3, 1.7, 2.8, 2.1, 3.2     // Products 10-14
]

func totalCost_batch(_ batchSizes: VectorN) -> Double {
	var cost = 0.0

	for i in 0..>(
	config: config_batch,
	searchSpace: searchSpace_batch
)

print("\nRunning Simulated Annealing...")
let result_batch = sa_batch.optimizeDetailed(
	objective: totalCost_batch,
	initialSolution: naiveBatches
)

print("\nOptimization Results:")
print("  Converged: \(result_batch.converged)")
print("  Iterations: \(result_batch.iterations)")
print("  Final Temperature: \(result_batch.finalTemperature.number(4))")
print("  Acceptance Rate: \(result_batch.acceptanceRate.percent(1))")

let optimizedCost = result_batch.fitness
let costReduction = naiveCost - optimizedCost
let percentReduction = (costReduction / naiveCost) * 100
let weeklySavings = costReduction

print("\n═══════════════════════════════════════════════════════════")
print("Cost Comparison:")
print("═══════════════════════════════════════════════════════════")
print("Naive Approach:      \(naiveCost.currency())")
print("Optimized (SA):      \(optimizedCost.currency())")
print("Cost Reduction:      \(costReduction.currency()) (\(percentReduction.number(1))%)")
print("Weekly Savings:      \(weeklySavings.currency())")
print("═══════════════════════════════════════════════════════════")

// Show optimal batch sizes for top 5 products by demand
let productInfo = (0.. $1.demand }

print("\nOptimal Batch Sizes (Top 5 by Demand):")
print("─────────────────────────────────────────────────────────────")
print("Product  Demand  Optimal Batch  Runs/Week  Setup $     Hold $")
print("─────────────────────────────────────────────────────────────")

for info in productInfo.prefix(5) {
	let runsPerWeek = info.demand / info.optimalBatch
	let setupCostWeekly = runsPerWeek * info.setupCost
	let holdCostWeekly = (info.optimalBatch / 2.0) * info.holdingCost

	print("\("\(info.id)".paddingLeft(toLength: 7))\(info.demand.number(0).paddingLeft(toLength: 8))\(info.optimalBatch.number(1).paddingLeft(toLength: 15))\(runsPerWeek.number(2).paddingLeft(toLength: 11))\(setupCostWeekly.currency().paddingLeft(toLength: 9))\(holdCostWeekly.currency().paddingLeft(toLength: 11))")
}
print("═════════════════════════════════════════════════════════════")

→ Full API Reference: BusinessMath Docs – Simulated Annealing Tutorial

Experiments to Try

  1. Temperature Tuning: Test initial temperatures 0.1, 1.0, 10.0, 100.0
  2. Cooling Rates: Compare α = 0.90, 0.95, 0.98, 0.99
  3. Neighbor Generation: Different perturbation strategies for portfolio
  4. Hybrid Approach: SA for global search, then local refinement with BFGS

Next Steps

Next Week: Week 11 begins with Nelder-Mead Simplex (another gradient-free method), then Particle Swarm Optimization, concluding with Case Study #5: Real-Time Portfolio Rebalancing.

Final Week: Week 12 covers reflections (What Worked, What Didn’t) and Case Study #6: Investment Strategy DSL.


Series: [Week 10 of 12] | Topic: [Part 5 - Advanced Methods] | Case Studies: [4/6 Complete]

Topics Covered: Simulated annealing • Global optimization • Cooling schedules • Metropolis criterion • Discrete optimization

Playgrounds: [Week 1-10 available] • [Next week: Nelder-Mead and particle swarm]


Tagged with: businessmath, swift, optimization, simulated-annealing, global-optimization, metaheuristic, gradient-free