BusinessMath Development Journey
8 min read
Development Journey Series
When we began implementing BusinessMath’s TVM (Time Value of Money) functions, we faced a fundamental question: How do we ensure AI-generated code is correct?
When you set out to build a financial library, errors can cost real money. A bug in present value calculation could lead to bad retirement planning. An error in IRR could result in misallocated capital.
We needed a way to specify exactly what we wanted and verify that we got it.
We’re all coming around to the idea that AI is incredibly powerful at generating code, but we’ve all also heard of it’s dangerous tendency to “hallucinate.” Code can look reasonable but may be subtly wrong.
The symptoms we encountered:
A traditional approach—write code, then write tests—simply doesn’t make sense for AI collaboration. If we did it that way, by the time we got around to writing tests, we’d already be invested in understanding and debugging the AI’s output. We needed a better way.
Instead, we adopted a strict test-first development with a specific workflow designed for AI collaboration:
1. RED - Write a Failing Test
Before asking AI for any implementation, we wrote tests that specified exactly what wanted:
@Test("Future value compounds correctly")
func testFutureValue() throws {
let fv = calculateFutureValue(
presentValue: 100.0,
rate: 0.05,
periods: 10.0
)
// Expected: 100 * (1.05)^10 = 162.89
#expect(abs(fv - 162.89) < 0.01)
}
This test will fail—the function doesn’t exist yet. That’s the point.
2. GREEN - AI Implements from Specification
Now you give AI a clear specification:
“Implement
calculateFutureValuethat makes this test pass. Use compound interest formula: FV = PV × (1 + r)^n. Make it generic over types conforming toRealprotocol from swift-numerics.”
AI generates:
public func calculateFutureValue(
presentValue: T,
rate: T,
periods: T
) -> T {
return presentValue * T.pow((1 + rate), periods)
}
Run the test. It passes. Green!
3. REFACTOR - Improve with Safety Net
Now that tests pass, you can refactor fearlessly:
// Extract reusable compound interest calculation
private func compoundFactor(rate: T, periods: T) -> T {
return T.pow((1 + rate), periods)
}
public func calculateFutureValue(
presentValue: T,
rate: T,
periods: T
) -> T {
return presentValue * compoundFactor(rate: rate, periods: periods)
}
Tests still pass. Refactor succeeded.
After implementing BusinessMath using strict test-first development:
Metrics that improved:
Time investment:
1. Failing Tests as Specifications
AI works best when given concrete, executable specifications. A failing test is the clearest possible spec.
Example: We wanted NPV calculation. Instead of saying “implement net present value,” we wrote:
@Test("NPV calculation matches known value")
func testNPV() throws {
let cashFlows = [-100.0, 50.0, 50.0, 50.0]
let npv = calculateNPV(rate: 0.10, cashFlows: cashFlows)
// Manual calculation: -100 + 50/1.1 + 50/1.1^2 + 50/1.1^3 = 24.34
#expect(abs(npv - 24.34) < 0.01)
}
AI immediately understood: discount each cash flow, sum them. Perfect implementation on first try.
2. Tests Caught AI Errors Immediately
First AI attempt at calculateFutureValue used simple interest: FV = PV * (1 + rate * periods).
Test failed. We saw the error instantly. Corrected the prompt. Next attempt used compound interest correctly.
Total debugging time: 30 seconds.
3. Generic Implementations Validated
We used the Swift Numerics as our only real dependency, but it allowed us to work generically over and “Real” number. Writing tests for multiple types ensured generics worked:
@Test("Future value works with Double")
func testFVDouble() {
let fv: Double = calculateFutureValue(presentValue: 100.0, rate: 0.05, periods: 10.0)
#expect(abs(fv - 162.89) < 0.01)
}
@Test("Future value works with Float")
func testFVFloat() {
let fv: Float = calculateFutureValue(presentValue: 100.0, rate: 0.05, periods: 10.0)
#expect(abs(fv - 162.89) < 0.1) // Looser tolerance for Float
}
Both passed. Generic implementation validated.
1. Vague Tests
A test has to be specific to be useful. A test-driven approach therefore works best when you have domain expertise and can give concrete guidance:
@Test("Present value works")
func testPV() {
let pv = presentValue(futureValue: 1000.0, rate: 0.05, periods: 10.0)
#expect(pv > 0) // Too vague!
}
AI would generate code here that passes, but wouldn’t necessarily be write. Just specifying that the value be positive won’t ensure that it is the correct value.
Fix: Always test against known, calculated values.
2. Missing Edge Cases
Just getting the right value is great, but you also have to think through and test against edge cases:
AI would happily implement code that crashed or returned nonsense for these inputs.
Fix: Enumerate edge cases explicitly. Write tests for them all.
@Test("Future value with zero rate")
func testFVZeroRate() {
let fv = calculateFutureValue(presentValue: 100.0, rate: 0.0, periods: 10.0)
#expect(fv == 100.0) // No growth
}
@Test("Future value with negative periods throws")
func testFVNegativePeriods() {
#expect(throws: FinancialError.self) {
try calculateFutureValue(presentValue: 100.0, rate: 0.05, periods: -5.0)
}
}
We’re not in a place to just trust AI to do what you’re thinking. But by specifying test-first development, you can use AI not as a code generator, but instead into a specification executor.
Without tests first: “Implement present value calculation” → AI guesses what you mean → You debug AI’s interpretation
With tests first: Failing test shows exactly what you want → AI implements to spec → Tests verify correctness
Key Takeaway: AI works best when given failing tests as specifications. Vague requests produce vague code. Concrete, executable specs produce correct code.
For your next project:
1. Write the Test First (RED)
2. Give AI the Test as Specification (GREEN)
3. Refactor with Confidence (REFACTOR)
Starting template:
# For each new function:
1. Write failing test with expected value
2. Prompt AI: "Implement [function name] to make this test pass: [paste test]"
3. Run test, verify it passes
4. Add edge case tests
5. Refactor if needed
This practice is demonstrated in the following technical posts:
Technical Examples:
presentValue implemented test-firstRelated Practices:
Problem: You’ve already invested in understanding AI’s code. Tests feel like busy work.Solution: Discipline. Tests first, always. No exceptions.
Problem: #expect(result != nil) passes for wrong implementations.Solution: Test against known, correct values. Do the math yourself first.
Problem: AI handles normal cases fine, but crashes on zero/negative/nil.Solution: Explicitly enumerate edge cases. Write tests for all of them.
Technical foundation:
#expect vs XCTAssert differencesTools mentioned:
Real, ElementaryFunctions)Questions to consider:
Share your experience: Have you tried test-first development with AI? What worked? What didn’t?
Series Progress:
Tagged with: ai-collaboration, tdd, testing, red-green-refactor, development journey