Implementation & workflows

Prompt chains for controllers — breaking multi-step analyses apart

How to break a complex finance analysis into multiple steps and deploy the strongest model per step — with attention to context handoff and numeric verification between steps.

6 min
  • prompting
  • workflows
  • multi-model
  • finance

A prompt chain is breaking a complex finance analysis into multiple sequential prompts, with a specific model assigned per step and output becoming context for the next. For controllers and finance managers this delivers three things: the strongest model per step, a traceable reasoning trail for audit, and numerical verification moments where it really counts.

One prompt, one model, one answer — that's beginner mode. Finance work of any size (a variance analysis, a board-pack section, a due-diligence summary, a reconciliation between balance sheet and tax return) consists of multiple thinking steps. Stuffing them all into one prompt yields mediocre output across the board. Breaking the task apart and deploying the right model per step yields the best result per step — and in finance also a controllable trail of how the answer came about.

What is a prompt chain

A prompt chain is a sequence of prompts where the output of one step is the input of the next. Simple form, big impact for finance:

  1. Step 1 — Data pull and summary. "Analyze the variance between actuals and budget for April, per category above €5,000 absolute."
  2. Step 2 — Explanation per variance. "Using the variances from step 1, draft a per-category explanation based on earlier commentary and known patterns."
  3. Step 3 — Communication. "Based on step 2, write a board-pack section of at most one page with executive summary, headlines, and critical points."

Every step is a self-contained, bounded task. You can control quality per step. You can rerun one step without redoing everything. And: you can pick a different model per step.

Why chains instead of one mega-prompt

Three reasons that weigh extra for finance:

  • Full attention per subtask. A model doing one thing at a time (first analyze data, then explain, then write), does each better. Fewer errors, better accuracy.
  • Modular design. Isolates failure. If step 3 is weak, you know exactly where to intervene — not "somewhere in the prompt."
  • Traceability. You can log every step, judge it, archive it. Critical for finance: if an external auditor later asks "how did this number come about," you have per step the input, the prompt, and the output.

Rule of thumb: if you're asking for more than three cognitive operations in one prompt (analyze + explain + summarize + rewrite in board tone), that's a chain in disguise.

Multi-model orchestration for finance

Every model has strengths. For finance work:

  • Claude writes nuanced prose and does strong close reading. Good for commentary and contract analysis.
  • GPT-5 Thinking is strong at multi-step reasoning. Good for due diligence and complex variance analyses.
  • Gemini has the largest context window. Good for long dossiers (transfer-pricing documentation, full contract portfolios).
  • Perplexity is built for live web research with source citation. Good for current VAT rates, sector benchmarks.
  • Copilot sits in your Office environment and knows your SharePoint and Outlook. Good for the "last mile" — output in your Word template, mail to the right people.

A typical pattern for a customer-focused board pack:

  • Perplexity pulls current sector benchmarks with sources.
  • Claude Opus reads the benchmarks plus your monthly numbers (full context) and writes the substantive analysis.
  • GPT-5 turns the analysis into an executive summary and generates a chart.
  • Copilot places the final result in your board-pack template with house style and shares it via SharePoint.

None of these models can do all four steps equally well. Per step, the right vehicle.

Context handoff — the weak point of every chain

The biggest risk in multi-model chains is context handoff. Every model starts without memory of what the previous one did. You literally copy the output of step 1 as input for step 2. If that output is vague, incomplete, or contains implicit assumptions, the chain starts rattling. Every handoff is a potential leak — compare it to a game of telephone: each pass loses meaning.

Practical guidelines for finance:

  • Make context packets explicit. Ask the first model to structure its output as a context packet: key findings, assumptions (period, currency, filters), open questions, sources (which ledger account, which period). Not just a flowing text.
  • Add the next step's brief verbatim. "What follows is step 2, where another model has to produce an explanation per variance. Make sure your output contains everything needed for that — including exact amounts and ledger accounts."
  • Check the handoff. Read the output of step N with the question: "Could a colleague without the rest of the chain pick up directly from here?" If no — rewrite.
  • Numeric verification. At every handoff where numbers travel: check that the numbers stay consistent between steps. A variance of €127,000 in step 1 cannot suddenly become €172,000 in step 2 — that's a hallucination corrupting the whole chain.

Parallel versus sequential

Not every chain is linear. Two patterns:

  • Sequential. Step 2 can only start once step 1 is done. Research → analysis → communication.
  • Parallel. Two or more models work simultaneously on the same question, outputs are merged. For example: three models each draft a variance commentary; a fourth (or you) selects the strongest elements and synthesizes. A form of self-consistency — multiple routes to the same answer reduce the chance of nonsense.

In practice you see mixes: a sequential chain with a parallel branch on one critical step (three variance drafts, one synthesis).

When a chain — and when not

A chain is powerful but costs time and tokens. Use it in finance when:

  • The analysis has clear phases (data pull, analysis, communication).
  • Accuracy matters: each phase has its own demands and its own quality criteria.
  • You want to reuse the output or rerun a step.
  • The full task is too large for one context window.
  • The analysis has to provide audit evidence — per step a controllable input and output.

Not when:

  • The task is small and fits in one prompt.
  • You want a quick first draft, not a finished product.
  • The extra quality doesn't justify the extra latency and cost.

Where this lands in practice

  • A CFO has Perplexity Deep Research pull a sector benchmark on EBITDA margins, Claude writes a competitive analysis with the company's own numbers alongside, GPT-5 writes the board summary. One report, three models, each in its strongest role — and a complete audit trail of where each number and each claim comes from.
  • A controller builds a chain for the monthly variance analysis: GPT-5 Thinking for data decomposition, Claude for the commentary, Copilot for the board-pack output. Every step is logged; on an audit question about a specific variance, the source can be retrieved in seconds.
  • A finance manager pastes three monthly reports as flat context into a Claude Project and asks for a trend analysis — simplest form, works fine at this scale, no orchestration needed.
  • An AP team builds a chain where step 1 extracts an inbound invoice (vision model), step 2 retrieves similar historical postings (Q&A layer on Exact), and step 3 drafts a journal entry in your booking standard. Three steps, three tool calls, one draft landing in the approval queue.

Audit-grade perspective

Prompt chains in finance belong logged: per step input, prompt, model name, output, and any human intervention. Not in loose files on the controller's laptop — in a central, append-only logging environment. This is exactly what the EU AI Act means by "appropriate logging" for high-risk systems, and what an external auditor needs to test internal control around AI output. Build it in from day one, not as an afterthought.

Saldus in practice

In Saldus these chains are visible via an "Agent Canvas" layer: per agent run you see the steps that were executed, which tool or model was used, and which output was produced. For the user this is a form of transparency ("why does the agent give this answer"); for the auditor it's audit evidence. For finance teams working with chains outside Saldus, the same transparency can be built with Langfuse or a comparable tracing tool — pick something either way; keeping chains only in chat windows is not enough for finance.

Further reading

GDPR-compliant processor
Audit-grade logging
Pen-tested platform