From prompts to agents in finance — the maturity ladder

The maturity ladder for AI in finance has four levels: one-off prompts (ad-hoc use), templates (reusable prompt structures with context), workflows (predefined step chains), and agents (autonomous systems with mandate). For finance teams each rung makes different demands on people, processes, and governance — and delivers a different weight of results.

Most finance teams get stuck at the first level: a controller writing one-off prompts in ChatGPT, an AP clerk occasionally having Copilot draft a mail. That's a fine start, but the real ROI in finance starts a few rungs higher. From one-off prompts via templates and workflows to full agents — each rung makes different demands on people, processes, and governance, and delivers a different weight of results.

For finance the ladder is the same as in other departments, with one important addition: the step to level 4 (agents with access to the books) demands considerably more governance than in other domains, because the ledger is legal truth and every action has tax-legal force.

Level 1 — better prompts

The individual controller or finance manager uses AI as a smart assistant. Asks for a rewrite of a variance commentary, summarizes an EU AI Act text, asks for a first draft of a board memo. Quality depends entirely on who's at the keyboard. Someone who has mastered prompting — supplying context, defining a role, giving examples, specifying output format — gets an order of magnitude more out of it than someone typing "make a summary."

This is the level where almost every finance team sits today. Limitations:

Not scalable: what one smart controller does, the others don't do automatically.
Not reproducible: the same task produces different output the same day — a bigger problem for finance than for other departments.
No audit trail: who wrote the prompt, what the model answered, what was done with it — disappears in a chat window nobody revisits.
Knowledge loss: the leaver takes the prompts.

Level 1 is individual productivity. Fine for ad-hoc tasks, but no foundation for company-wide change — and for finance not enough for anything that needs to land in the close cycle or audit evidence.

Level 2 — reusable templates

Custom GPTs, Claude Projects, Gemini Gems, Copilot Agents. A template is a pre-baked prompt with instructions, examples, tone of voice, and uploadable context files. Anyone on the finance team can use the same template and get more consistent output.

Examples of useful finance templates:

Variance-commentary drafter with your KPI definitions, prior reports as examples, and house style.
Board-pack formatter that turns monthly numbers and draft commentary into a structured overview.
AR mail drafter that crafts a fitting tone per customer segment.
VAT question assistant with your tax handbook and historical correspondence as context.

Level 2 demands what level 1 doesn't: curation and governance. Who is allowed to create a template? Who reviews? What happens on a model update? Without those agreements, thirteen variants of the same variance-commentary drafter spring up, and the problem shifts from "individually smart" to "collectively messy." For finance this is extra sensitive: three controllers each with a different KPI definition in their template produce three different analyses on the same monthly numbers.

At this level you also make your first strategic finance mistake: building a Custom GPT for something that should be a workflow. If human input after step 1 isn't really needed — for example with bank reconciliation where the match logic is the same everywhere — the task doesn't belong in a template, it belongs in a workflow.

Level 3 — finance workflows

A workflow is a chain of steps where software and AI collaborate. A trigger (inbound invoice, calendar business day 1, bank-transaction feed) starts a series of actions: pull data, call model, make decision, update system, notify human. Tools like n8n, Inngest, Make, and Power Automate are the orchestrators; models supply the "soft" pieces of the chain.

The essential difference from level 2: the human is no longer in every step. Input comes in, output gets written, without anyone needing to press a button. That's where finance actually frees up time.

Realistic finance workflows:

Inbound-invoice ingest: invoice in mail folder → OCR + extraction → classification + draft posting → in approval queue. AP clerk reviews 80% in seconds, focuses on the doubts.
Bank-reconciliation flow: daily bank statements → match proposals → autonomous match under threshold, queue above → controller reviews at the end of the day.
Variance-commentary flow: monthly numbers in → AI compares to budget and last year → generates draft commentary → controller reviews and publishes.
AR follow-up flow: daily scan → AI classifies open invoices → generates personalized reminders → AR clerk reviews and sends.

Level 3 demands process design and governance by design. Which steps are deterministic (software)? Which are context-sensitive (AI)? Where do you want a human check? What happens on an error? What does the audit log look like? Those are operations and compliance questions, not technology questions. The complexity doesn't sit in the tools; they're intuitive. The complexity sits in thinking the process through properly.

Level 4 — finance agents

An agent is an AI that autonomously picks tools, makes decisions, and executes multi-step plans — often over long stretches of time, sometimes in collaboration with other agents. The difference from a workflow: a workflow is a predefined chain; an agent decides what the next step is.

For finance:

Workflow: "on every new inbound invoice, do step 1, 2, 3." Deterministic skeleton, AI only in the soft steps.
Agent: "monitor the books; when something demands attention, decide yourself what's needed and resolve it within your scope; escalate only when you get stuck or step outside your mandate."

A few typical finance agents on the horizon:

Close agent: orchestrates the month-end close from bank reconciliation to board pack, with HITL on every posting.
Cash agent: maintains a rolling cash forecast, signals critical weeks, runs what-if scenarios on request.
VAT agent: monitors VAT reconciliation periodically, performs ICP checks, proposes corrections.
AR / Communicate agent: runs the AR flow with personalized communication and escalation.

Level 4 demands infrastructure and monitoring at a different level. How do you see what your agents are doing? How do you stop them when they go wrong? What are the cost limits per run? How do you secure auditability to the detail? This is no longer a productivity tool; it's a piece of digital finance staff running in production that needs oversight. See Agentic AI and access to your ledger for the risk side.

When do you move up a rung in finance?

Three heuristics, sharper than for other departments:

Frequency. Under 3 times per week, stay at level 1 or 2. From daily: workflow. From multiple times a day across multiple finance roles: agent.
Determinism. The more fixed the steps, the sooner to workflow. The more judgment per case (annual-report disclosures, provisions, M&A work), the longer you stay at template level.
Consequences. The greater the impact of an error (financial, tax, audit, customer-facing), the more human control you retain. For now, agents rarely belong in the decision path of high-stakes finance tasks; they belong alongside, with the human as final reviewer and signatory.

For finance specifically: the step from level 3 to level 4 demands a well-stocked governance backpack — authorization matrix, approval flow, audit trail, sandbox tenant, an incident procedure. Without it, level 4 is premature regardless of how clever the tech is.

The typical pitfalls in finance

Skipping levels 2 and 3. Trying to build a "Close agent" before the team has mastered templates and workflows. Result: a brittle system that breaks on the first exception.
Stuck on level 1 for years. Waiting on "a strategy" before starting. Result: the team loses momentum, others overtake.
Level 4 without governance. An agent with write rights on production Exact without an approval layer. Waiting for the first incident, then rebuilding — usually while the recovery is painful and visible.

Where this lands in practice

A 30-person finance function starts at level 2: a shared Claude Project for variance commentary, with KPI definitions and three example reports as context. Within a month, consistent output, foundation for the next step.
A 60-FTE finance team builds at level 3: a workflow that extracts inbound purchase invoices, classifies them, and puts them in an approval queue. Saves half an FTE on routine work, audit trail fully automatic.
A 120-FTE scale-up runs a Cash agent at level 4 that updates the rolling forecast daily and prepares what-if scenarios for the CFO. The CFO only looks at the what-if outcomes; the updating is autonomous.

Choosing the right level is choosing where you want the first, second, and third gain. Nobody goes from nothing to a complete agent swarm. But anyone who doesn't know where the ladder leads invests at level 2 in templates for something that will have to become a workflow anyway — and builds twice.

Audit-grade perspective

The audit requirement shifts per level. Level 1: nothing formal, provided no Tier 3+ data lands in a consumer account. Level 2: documentation of templates and their source context. Level 3: workflow design, audit log per step, ownership registration. Level 4: everything from level 3 plus action whitelist, sandbox strategy, and periodic independent review of agent behavior. Audit requirements scale with autonomy.

Saldus in practice

Saldus today delivers level 2 and 3 building blocks directly usable for finance work: a Q&A layer on the books (level 2-3), reporting building blocks, an approval inbox for write actions (level 3). Level 4 — fully autonomous Close, Cash, or VAT agents — sits on the roadmap and gets built first with launching customers on real finance data. For most teams the next useful step is level 2 or 3, not level 4.