Human in the loop in finance workflows — where do you cut the chain?

Human-in-the-loop (HITL) in finance workflows means deliberately building in human review moments at specific steps — not on every action, but where amount, risk, or impact demand it. For finance teams a matrix: routine below the threshold runs autonomously, anything above requires approval, and red flags block — so the four-eyes principle stays scalable without AI undermining it.

A finance workflow that runs from trigger to final result fully autonomously sounds appealing: the close runs at night, the board pack is ready in the morning, no manual work. For some steps, that is achievable. For other steps, any experienced controller would immediately say: a human stays in the chain here, end of discussion. The question is not "can I take the human out," but "where exactly in the chain should the human sit." That is a design choice per step — not an on/off switch for the whole workflow.

Human in the loop (HITL) means automation stops at one or more points in the workflow, a human looks at the output, and gives approval, correction, or rejection before the next step starts. It's the hinge between speed and responsibility — and in a finance context also between autonomous efficiency and what an external auditor or supervisor expects from you.

Why finance is different

Three reasons HITL sits more often and more strictly in finance than in other departments.

The ledger is legal truth. A journal entry, a payment, a VAT return: those aren't internal notes but formal, legally binding statements that the tax authority and external auditor can review. An AI that books correctly in 95% of cases hallucinates subtly but wrongly in the 5% — and in finance "subtly wrong" is a finding that surfaces six months later in an audit report.

Four-eyes is already an established discipline. Finance teams already know authorization matrices, double checks, and periodic controls. HITL isn't a strange new habit; it's the same discipline applied to a new input — the AI draft instead of the intern draft. Good news for adoption: it slots neatly into the culture.

Irreversibility stacks up. A sent email you can correct. A booked payment, a filed VAT return, a posted journal entry in a closed period: not, or only with effort. Irreversible is more often the norm in finance than the exception.

Four oversight levels per step

For every activity in a finance workflow there's a choice between four oversight levels.

1. Fully autonomous

The AI executes, output flows on. Suitable for: internal classifications that get reviewed later anyway, summaries for your own use, non-financial data transformations (CSV formatting, adding a mailbox tag).

Example: an LLM categorizes inbound invoices by supplier type and routes them to the right AP clerk. If the classification is sometimes off, the AP clerk notices and corrects — no impact on the ledger.

2. Autonomous with post-review

The AI executes, output flows on, but a human reviews on a sample basis or periodically. Suitable for: activities where quality monitoring is needed but real-time review would kill the speed gain.

Example: an agent that matches all inbound bank transactions against open invoices daily. The match runs autonomously; the controller reviews 20 random matches at the end of the week to check the pattern holds. Sample-based testing is the norm at external auditors — we apply the same principle to internal automation.

3. Human approval (HITL)

The AI produces a draft, the workflow pauses, a human approves or corrects, then the workflow continues. Suitable for: external communication, every journal entry and payment, VAT returns, formal reports.

Example: an AI drafts an AR follow-up email with a payment-plan proposal. The AR clerk sees the draft in a review interface, adjusts as needed, clicks "send." The customer doesn't know the original was AI-generated; the clerk remains accountable for the content.

4. Human in the driver's seat, AI-assisted

The human does the work; the AI is a suggestion engine. Suitable for: complex judgments, M&A work, IFRS disclosures with material assessments, conversations with the external auditor.

Example: a controller drafts a note on a provision; the AI suggests phrasing and checks the reasoning is consistent with the numbers. The AI isn't the workflow — the controller is.

Where do you cut the chain in finance?

Rule of thumb: put a HITL point just before every step with external impact, financial posting, or tax force. Inside the workflow, multiple AI steps can follow each other without intervention as long as the output stays between systems. The moment it heads for the ledger, a payment, a return, or third parties: human in.

In a month-end-close workflow that typically looks like this:

Pull and categorize bank transactions (agent, autonomous): internal, reversible.
Propose matches with open invoices (agent, autonomous): proposal, no posting.
Draft journal entries for unexplained items (agent, autonomous): draft in the approval inbox.
Approve postings (HITL): this is the cut point. Posted is posted.
Generate reconciliation report (agent, autonomous): output to the controller.
Variance commentary on deviations (agent, autonomous): draft for the board pack.
Publish board pack (HITL): controller reads and approves.

Two HITL points in seven steps. Not an approval dialog at every step — that makes the workflow unworkable and trains people to click approve mechanically.

Four review patterns that work in finance

Inbox review. The draft arrives as a normal email or message to the reviewer, who uses reply-to-send or edit-and-send. Low threshold, fits the existing workflow of a controller already living in Outlook. Suitable for draft mails to customers and suppliers.

Approval queue. A dashboard with open items awaiting approval — draft postings, payment proposals, VAT corrections. Works at volume (tens of items per day per person). Make sure the queue doesn't pile up — then it becomes a blocker and the team approves everything to clear it. A good rule: if the queue is more than 48 hours behind, something is wrong with the threshold or the content.

Inline review in the source app. Draft postings in Exact, draft invoices in the invoicing system, draft mails in the Drafts folder of Outlook. The reviewer works where they were already working. The strongest pattern for finance because it disrupts no habits.

Threshold review. The AI acts autonomously up to a threshold; above it, a human comes in. Examples: bank transactions below €500 matched autonomously, HITL above. Purchase invoices below €1,000 processed autonomously, AP review above. Postings in period N autonomous; back-postings to closed periods always HITL. Combines speed with risk coverage and connects to the authorization matrices most finance teams already have.

Threshold review is usually the most effective in finance — because it translates directly to existing policy, and because it puts humans specifically on what matters instead of on everything.

Four-eyes and HITL — the same thing or not?

Four-eyes is a specific form of HITL: the second pair of eyes is a different person from the first. Not every HITL is four-eyes — if the AI makes a draft and the same employee approves it, that is HITL but not four-eyes.

For finance the distinction matters. The CISO handbook and most authorization matrices prescribe four-eyes for payments above a threshold, for postings after period close, and for sensitive changes (changing IBANs, writing off debtors). An AI doesn't replace the first pair of eyes — that role shifts from "data entry" to "draft review" — and therefore cannot replace the second pair either. Four-eyes stays four human eyes.

In practice: AI makes the draft, employee A reviews and approves (first HITL), employee B does the second review (four-eyes). The time gain sits not in eliminating a role but in the fact that the draft work is ready in a fraction of the time and both human reviews can focus on judgment rather than typing.

Common mistakes

Too much HITL. Every step requires approval. Result: time gains evaporate, reviewers click approve mechanically, errors slip through anyway.
Too little HITL. AI sends mails autonomously on behalf of finance. One bad hallucination ("your balance is €87,000" when it's €8,700) and the customer impact outweighs the entire time gain.
HITL without context. The reviewer only sees the AI's output, not the input and reasoning trail. Without context, they can't judge whether the output is right. Always include input and reasoning in the approval view.
HITL without feedback loop. The reviewer corrects the draft, but the correction goes nowhere. The model learns nothing; the next draft is just as wrong. Keep corrections as training and monitoring data.
Review fatigue. If 98% of drafts pass through without changes, nobody is really checking anymore. Time to make that step autonomous or raise the threshold.

Audit grade — why HITL also does compliance work

The EU AI Act requires "appropriate human oversight" for high-risk systems — a requirement that applies directly for finance applications above Tier 2. Practically: document per AI step who reviews it, based on what information, with what proof of approval. An approval inbox with an audit log covers this by default; an inbox review without a log does not. Build this into the design from the start.

For external auditors something comparable applies from an internal-control perspective. The question "who entered this journal entry" is, in an AI age, really two questions: "which AI drafted it from what input" and "which human approved it at what time." Keep both, the same way and with the same discipline as the books themselves, and an audit doesn't turn this into a finding.

The hard test, per step

Two questions to answer for every step in your workflow:

If this step goes wrong and nobody notices, what is the damage? Small → autonomous. Big → HITL. In finance, "big" comes into view faster: a wrong tag is small, a wrongly posted amount is big.
If a human always has to look, does the automation still save time? No → redesign, or accept that this isn't a workflow candidate. That's not capitulation — some work doesn't deserve AI because the review is as expensive as the work itself.

Where both answers come out right, the human sits in the right spot: not everywhere, not nowhere, but exactly on the hinge where their judgment counts.

Saldus in practice

Saldus has an approval inbox built in by default for write actions: every AI draft of a journal entry, payment proposal, or external communication lands in a queue, gets approved or rejected by a human, and the entire decision trail — input, AI step, reviewer, timestamp, final decision — is logged automatically. Thresholds per agent are configurable (for example: automatic match up to €500, HITL above). It doesn't free the team from designing the workflow itself, but it provides the review infrastructure where self-built tools often go wrong.