Test-First Development with AI
BusinessMath Development Journey
8 min read
Development Journey Series
The Context
When we began implementing BusinessMath’s TVM (Time Value of Money) functions, we faced a fundamental question: How do we ensure AI-generated code is correct?When you set out to build a financial library, errors can cost real money. A bug in present value calculation could lead to bad retirement planning. An error in IRR could result in misallocated capital.
We needed a way to specify exactly what we wanted and verify that we got it.
The Challenge
We’re all coming around to the idea that AI is incredibly powerful at generating code, but we’ve all also heard of it’s dangerous tendency to “hallucinate.” Code can look reasonable but may be subtly wrong.The symptoms we encountered:
- AI might confidently implement simple interest when we needed compound interest
- Generic type constraints would be almost correct but not quite right
- Edge cases (zero rate, negative periods) would be silently mishandled
The Solution
Instead, we adopted a strict test-first development with a specific workflow designed for AI collaboration:The RED-GREEN-REFACTOR Cycle
1. RED - Write a Failing TestBefore asking AI for any implementation, we wrote tests that specified exactly what wanted:
@Test(“Future value compounds correctly”)
func testFutureValue() throws {
let fv = calculateFutureValue(
presentValue: 100.0,
rate: 0.05,
periods: 10.0
)
// Expected: 100 * (1.05)^10 = 162.89
#expect(abs(fv - 162.89) < 0.01)
}
This test will fail—the function doesn’t exist yet.
That’s the point.
2. GREEN - AI Implements from Specification
Now you give AI a clear specification:
“ImplementAI generates:calculateFutureValuethat makes this test pass. Use compound interest formula: FV = PV × (1 + r)^n. Make it generic over types conforming toRealprotocol from swift-numerics.”
public func calculateFutureValue
(
presentValue: T,
rate: T,
periods: T
) -> T {
return presentValue * T.pow((1 + rate), periods)
}
Run the test.
It passes. Green!
3. REFACTOR - Improve with Safety Net
Now that tests pass, you can refactor fearlessly:
// Extract reusable compound interest calculation
private func compoundFactor
(rate: T, periods: T) -> T {
return T.pow((1 + rate), periods)
}
public func calculateFutureValue
(
presentValue: T,
rate: T,
periods: T
) -> T {
return presentValue * compoundFactor(rate: rate, periods: periods)
}
Tests still pass. Refactor succeeded.
The Results
After implementing BusinessMath using strict test-first development:Metrics that improved:
- 0 regression bugs across 247 tests after major refactorings
- 180+ bugs caught before they reached “implementation” status
- 3 API redesigns caught during test writing (before any code existed)
- Initial setup: ~2 hours (learning Swift Testing framework)
- Per-function overhead: ~5-10 minutes (writing tests first)
- ROI: Massive—debugging time dropped from hours to minutes
What Worked
1. Failing Tests as SpecificationsAI works best when given concrete, executable specifications. A failing test is the clearest possible spec.
Example: We wanted NPV calculation. Instead of saying “implement net present value,” we wrote:
@Test(“NPV calculation matches known value”)
func testNPV() throws {
let cashFlows = [-100.0, 50.0, 50.0, 50.0]
let npv = calculateNPV(rate: 0.10, cashFlows: cashFlows)
// Manual calculation: -100 + 50/1.1 + 50/1.1^2 + 50/1.1^3 = 24.34
#expect(abs(npv - 24.34) < 0.01)
}
AI immediately understood: discount each cash flow, sum them. Perfect implementation on first try.
2. Tests Caught AI Errors Immediately
First AI attempt at calculateFutureValue used simple interest: FV = PV * (1 + rate * periods).
Test failed. We saw the error instantly. Corrected the prompt. Next attempt used compound interest correctly.
Total debugging time: 30 seconds.
3. Generic Implementations Validated
We used the Swift Numerics as our only real dependency, but it allowed us to work generically over and “Real” number. Writing tests for multiple types ensured generics worked:
@Test(“Future value works with Double”)
func testFVDouble() {
let fv: Double = calculateFutureValue(presentValue: 100.0, rate: 0.05, periods: 10.0)
#expect(abs(fv - 162.89) < 0.01)
}
@Test(“Future value works with Float”)
func testFVFloat() {
let fv: Float = calculateFutureValue(presentValue: 100.0, rate: 0.05, periods: 10.0)
#expect(abs(fv - 162.89) < 0.1) // Looser tolerance for Float
}
Both passed. Generic implementation validated.
What Didn’t Work
1. Vague TestsA test has to be specific to be useful. A test-driven approach therefore works best when you have domain expertise and can give concrete guidance:
@Test(“Present value works”)
func testPV() {
let pv = presentValue(futureValue: 1000.0, rate: 0.05, periods: 10.0)
#expect(pv > 0) // Too vague!
}
AI would generate code here that passes, but wouldn’t necessarily be write. Just specifying that the value be positive won’t ensure that it is the
correct value.
Fix: Always test against known, calculated values.
2. Missing Edge Cases
Just getting the right value is great, but you also have to think through and test against edge cases:
- What if rate is zero?
- What if periods is negative?
- What if present value is negative?
Fix: Enumerate edge cases explicitly. Write tests for them all.
@Test(“Future value with zero rate”)
func testFVZeroRate() {
let fv = calculateFutureValue(presentValue: 100.0, rate: 0.0, periods: 10.0)
#expect(fv == 100.0) // No growth
}
@Test(“Future value with negative periods throws”)
func testFVNegativePeriods() {
#expect(throws: FinancialError.self) {
try calculateFutureValue(presentValue: 100.0, rate: 0.05, periods: -5.0)
}
}
Key Takeaway
We’re not in a place to just trust AI to do what you’re thinking. But by specifying test-first development, you can use AI not as a code generator, but instead into a specification executor.Without tests first: “Implement present value calculation” → AI guesses what you mean → You debug AI’s interpretation
With tests first: Failing test shows exactly what you want → AI implements to spec → Tests verify correctness
Key Takeaway: AI works best when given failing tests as specifications. Vague requests produce vague code. Concrete, executable specs produce correct code.
How to Apply This
For your next project:1. Write the Test First (RED)
- Before asking AI for implementation, write the failing test
- Include expected values calculated manually or from reference
- Cover edge cases explicitly
- Paste the test into your AI prompt
- Say: “Implement this function to make the test pass”
- Run the test to verify
- Extract patterns, improve names, optimize
- Tests protect against regressions
- If tests still pass, refactor succeeded
# For each new function:
1. Write failing test with expected value
2. Prompt AI: “Implement [function name] to make this test pass: [paste test]”
3. Run test, verify it passes
4. Add edge case tests
5. Refactor if needed
See It In Action
This practice is demonstrated in the following technical posts:Technical Examples:
- Getting Started (Monday): Shows
presentValueimplemented test-first - Time Series Foundation (Wednesday): Period arithmetic validated with tests
- Time Value of Money (Week 1 Friday case study): Multiple TVM functions integrated
- Documentation as Design (Week 2): Write docs before implementation
- Coding Standards (Week 5): Forbidden patterns caught by tests
Common Pitfalls
❌ Pitfall 1: Writing tests after implementation
Problem: You’ve already invested in understanding AI’s code. Tests feel like busy work. Solution: Discipline. Tests first, always. No exceptions.❌ Pitfall 2: Tests that just check “doesn’t crash”
Problem:#expect(result != nil) passes for wrong implementations.
Solution: Test against known, correct values. Do the math yourself first.
❌ Pitfall 3: Skipping edge cases
Problem: AI handles normal cases fine, but crashes on zero/negative/nil. Solution: Explicitly enumerate edge cases. Write tests for all of them.Further Reading
Technical foundation:- Swift Testing framework documentation
#expectvsXCTAssertdifferences
- Swift Testing: Modern testing framework for Swift
- Swift Numerics: Generic numeric protocols (
Real,ElementaryFunctions)
Discussion
Questions to consider:- How does test-first development change when AI is writing the implementation?
- What level of test coverage is “enough” for financial calculations?
- How do you balance test-first discipline with exploration/prototyping?
Series Progress:
- Week: 1/12
- Posts Published: 2/~48
- Methodology Posts: 1/12
- Practices Covered: Test-First Development
Tagged with: development-process