1. Why Reconstruction Is More Than a Swap-Back
When enterprise AI teams design a workflow that sends tokenised documents to an external LLM, most of the attention goes to the preparation step: detection, transformation, what crosses the boundary. The response side gets less thought. The assumption is that once the model returns its output, the workflow just needs to swap the tokens back and the result is ready.
That assumption is approximately right and operationally wrong. The reconstruction step is technically straightforward — look up tokens in a mapping, substitute back — and architecturally critical. It is where the difference between a pilot that demos well and a workflow that runs in production gets decided. Teams that treat reconstruction as an afterthought consistently find their AI projects stalling at the same point: the model works, the integration works, but the output is not in a form anyone can actually use without manual cleanup, and the productivity gain that justified the project evaporates.
In the simplest case, reconstruction is symmetric with tokenisation. The input had Marlene Schmidt replaced with CUST-7F2A; the output references CUST-7F2A; the reconstruction step swaps CUST-7F2A back to Marlene Schmidt and the workflow continues.
If every workflow looked like this, reconstruction would be a trivial concern. Workflows don't look like this for a few reasons.
- The model's response is generative, not just substitutive. The LLM doesn't just copy tokens from input to output; it produces new text that reasons over the tokens. The output references tokens in new sentences, in new combinations, sometimes paraphrased, sometimes summarised, sometimes synthesised across multiple input tokens. A reconstruction layer has to handle tokens appearing in contexts the input never had.
- The response can contain tokens the input didn't. A summarisation task that reasons over five tickets might produce a sentence like "Three of the affected customers (
CUST-7F2A,CUST-3B91,CUST-9D2C) share the same firmware version." That construction is new. The reconstruction layer has to find each token, look it up, and substitute back into a sentence the model wrote. - The response sometimes contains malformed references to tokens. Models occasionally lose track of token formatting, especially across long outputs. A token that went in as
CUST-7F2Amight come back asCUST 7F2A,CUST7F2A, or simply "the customer designated 7F2A." A reconstruction layer that only does exact string matching fails in these cases, and the output to the user contains visible token fragments that should have been resolved. - The response may include the model's commentary or hedging. "Based on the information about customer
CUST-7F2A, the most likely root cause is..." The reconstruction has to handle the token appearing in a clause the model added on its own, with the same correctness as a token that appears in a direct extraction.
What looks like swap-back, then, is actually a small but real text-processing problem: robust token recognition across diverse output forms, with the substitution happening in a way that preserves the grammatical coherence of the model's response.
2. Where Reconstruction Has to Happen
The location of reconstruction is non-negotiable: it has to happen inside the enterprise environment, before the output reaches the user or any downstream system.
The reason is the same reason the mapping has to stay in the enterprise environment. Reconstruction requires reading the mapping. If reconstruction happens outside the enterprise — on a vendor's infrastructure, in a third-country region, on any system the enterprise doesn't fully control — then the mapping has to be made available to that location, which collapses the protection the tokenisation provided in the first place.
This is the most common architectural mistake in deployments of this pattern: teams set up tokenisation inside the environment, send to the external LLM, and then run reconstruction in a cloud service or middleware that happens to be convenient. The convenience is real. The protection is gone. The mapping that was supposed to stay under the enterprise's exclusive control has been replicated to a location where the original promises no longer hold.
The correct architecture is that reconstruction is colocated with the source systems and the mapping — on-premise, in the enterprise's own VPC, in whatever EU-region infrastructure the workflow runs in. The output of the LLM comes back tokenised, traverses to the reconstruction layer inside the boundary, and emerges from that layer as business-ready content. The external journey of the data ends at the reconstruction step.
For workflows where the rest of the architecture is meticulous about boundaries — encapsulation inside, mapping inside, audit logs inside — and reconstruction is the one piece that wandered outside, the rest of the architecture's promises are weakened to whatever the reconstruction location can guarantee.
3. How Reconstruction Integrates With the Workflow
Reconstruction is not a standalone step the user invokes. It is infrastructure that has to integrate into wherever the AI's output is delivered. Three integration patterns cover most enterprise deployments.
3.1 Inline Reconstruction in the Response Path
The AI integration layer — whatever middleware sits between the workflow and the LLM endpoint — performs reconstruction before returning the response to the calling system. The calling system never sees tokens; it receives the business-ready output. This is the cleanest pattern and works for synchronous request-response workflows: contract review, summarisation, classification.
3.2 Streaming Reconstruction
For LLM responses that stream token-by-token (in the NLP sense, not the data-protection sense — note the unfortunate vocabulary collision), reconstruction has to work on the stream, recognising data-protection tokens as they appear and substituting in real time. This is harder than batch reconstruction because the data-protection token may be partially streamed at any given moment, and the reconstruction layer has to buffer enough to recognise it. Workflows that use streaming UIs (chat-style interfaces, live summarisation feeds) need this pattern; workflows that wait for the full response don't.
3.3 Event-Driven Reconstruction
For workflows where the AI's output triggers downstream actions — creating a ticket in the operations platform, writing a report into the document management system, updating a record in the CRM — reconstruction has to happen at the boundary between the AI integration and the downstream system. The tokenised response can be processed for routing, classification, or triage in the integration layer; reconstruction happens just before the data is written into the system the user will see.
The architecture has to make explicit which of these patterns applies to which workflow. A misalignment — running streaming reconstruction on a workflow that should be event-driven, or inline reconstruction on a workflow that should stream — produces user-visible defects that look like AI quality problems but are actually integration problems.
4. When the Model Produces Tokens That Don't Exist
A specific failure mode worth treating carefully: the model occasionally hallucinates tokens. It produces a string that looks like a token in the system's format but doesn't correspond to anything in the mapping.
This happens for predictable reasons. The model has seen CUST-7F2A and CUST-3B91 in the input and produces CUST-5D44 in the output, drawing on the pattern. Or the model summarises and invents a token-shaped placeholder for an entity it inferred. Or, more rarely, the model echoes a token format from its training data that happens to collide with the enterprise's token format.
The reconstruction layer cannot silently substitute a hallucinated token, because there is nothing to substitute it with. It also cannot leave the hallucinated token in the output, because the user will see a fragment that looks like a system identifier. There are three reasonable responses.
- Flag the hallucinated token in the output and surface it to the user as an explicit gap — for example, "[Reference to an entity the model produced but the system cannot resolve.]" This preserves transparency at the cost of some output cleanliness.
- Drop the hallucinated reference and rewrite the surrounding sentence. This produces cleaner output but requires the reconstruction layer to do non-trivial text manipulation, and may obscure that the model produced something not grounded in the input.
- Reject the response and re-prompt the model, with a system instruction that constrains it to use only the tokens that appeared in the input. This produces the highest-quality output but adds latency and cost.
Different workflows want different responses. A summarisation for internal review may prefer the first option (flag and surface). A document going to a customer may prefer the third (re-prompt). The choice should be configurable at the workflow level, not hard-coded into the reconstruction layer.
5. Audit and Traceability
Reconstruction is the moment where the original sensitive values re-enter the workflow. From an audit perspective, it is one of the most important moments in the architecture, because it is where the access controls on the original data are exercised.
A well-designed reconstruction layer logs every operation: which token was looked up, when, in service of which workflow, by which integration. The log doesn't need to contain the original values — that would defeat the point of access controls — but it has to contain enough metadata to answer the question "who triggered the reconstruction of which token, and where did the result go."
This matters operationally for two reasons. First, it makes the architecture auditable: an internal review of the workflow can verify that reconstruction is happening only for legitimate workflows and that the integration is behaving as designed. Second, it makes incident response possible: if a reconstruction integration is misbehaving, the log shows what happened and what was exposed.
The audit also matters for the failure case where reconstruction emits to a downstream system that shouldn't have received the original values. If a reconstruction integration accidentally writes business-ready output to a logging system that wasn't supposed to see customer names, the audit trail is what tells the team what was exposed and to whom. Without the log, the team is guessing.
Reconstruction logs should be retained separately from the workflow logs, with different access controls, and under the same boundary constraints as the mapping itself. They are, in effect, an audit trail of the most sensitive operation in the architecture.
6. The Operational Mistakes Most Teams Make
Across deployments of this pattern, a small set of mistakes show up repeatedly. They are worth naming explicitly.
- Building reconstruction as a manual cleanup step. The most common mistake. The team gets tokenisation working, sees the AI output coming back tokenised, and adds a manual "now find and replace the tokens" step to the user's workflow. Users skip the step. Or they do it inconsistently. Or they paste tokenised output into a system that wasn't supposed to see it, and the cleanup never happens. Reconstruction has to be infrastructure, automatic and invisible. If it requires a human action, it will fail intermittently in ways that are hard to detect.
- Running reconstruction in the wrong location. Discussed above. The convenience of running reconstruction in a vendor cloud or middleware service is real; the protection cost is also real. The architecture promises the original values stay in the enterprise environment, and reconstruction has to honour that promise.
- Treating reconstruction as a static substitution. Real reconstruction has to handle malformed tokens, hallucinated tokens, tokens in unexpected contexts, and streaming responses. A naive implementation that does exact string match-and-replace will work in the demo and fail in production where the model's actual output is messier than the demo cases.
- Not logging reconstruction. Reconstruction without an audit trail is reconstruction the team cannot defend. The first time someone asks "did the AI ever see this customer's name, and if so, where did the result go," the team without reconstruction logs cannot answer.
- Coupling reconstruction tightly to a specific LLM provider. Reconstruction logic that assumes ChatGPT's response format will break when the workflow switches to Claude or Gemini, even though the underlying tokenisation didn't change. The reconstruction layer should be provider-agnostic, treating the model's response as text-to-process rather than a known structure.
7. What Good Reconstruction Looks Like
A reconstruction layer that works in production has a small set of properties.
- Runs inside the enterprise environment, colocated with the mapping
- Invoked automatically at the integration boundary, never as a manual step
- Handles streaming, batch, and event-driven workflows through different invocation patterns but a shared core
- Recognises tokens robustly across the variations real model output produces — formatting drift, partial references, paraphrases
- Distinguishes between legitimate tokens and hallucinated token-shaped strings, and handles each according to a configurable policy
- Logs every operation in a separate audit trail under the enterprise's exclusive control
- Provider-agnostic, so the workflow can swap LLM endpoints without rewriting the reconstruction layer
When these properties hold, reconstruction becomes invisible infrastructure. The user submits a document, the workflow runs, the result comes back with real values in real structure, and the user never sees a token. The architecture's promise — that sensitive data stayed inside the boundary while the AI did useful work — holds across both halves of the workflow.
When these properties don't hold, reconstruction is the place the workflow breaks. The tokenisation can be perfect, the model can be excellent, the boundaries can be meticulous — and the user still ends up with output they can't use, or with sensitive data accidentally appearing in a downstream system that wasn't supposed to receive it. The last mile is where the architecture either delivers on its promise or quietly fails to.
8. Where This Fits in the Broader Pattern
Reconstruction is one of the four stages of the broader pattern — detection, transformation (tokenisation), external processing, reconstruction — that lets external LLMs operate on data that cannot leave the enterprise environment in raw form. The four stages compose. The strength of the architecture is the weakest of the four.
For the architecture as a whole, and the design decisions that the other three stages carry, see the pillar overview on running external LLMs on sensitive enterprise data. For why removal-based approaches (masking, redaction, PII guardrails) break on operational data — and why this pattern was needed in the first place — see the article on why AI workflows stall at tables, tickets, and operational documents. For the tokenisation patterns on the input side that this article's reconstruction reverses, see the article on tokenisation for LLM inputs.
- Reconstruction is technically straightforward and architecturally critical — it's where pilots that demo well diverge from workflows that run in production
- It's not a simple swap-back: LLM output is generative, contains new token combinations, suffers formatting drift, and sometimes hallucinates token-shaped strings
- Location is non-negotiable — reconstruction has to run inside the enterprise environment, colocated with the mapping; external reconstruction collapses the protection
- Three integration patterns cover most workflows: inline, streaming, and event-driven — and misaligning the pattern produces defects that look like model problems
- Hallucinated tokens need an explicit, configurable policy: flag and surface, drop and rewrite, or reject and re-prompt
- Audit is essential — reconstruction is the moment original values re-enter the workflow, and logs are how the team defends or investigates that moment
- Five recurring mistakes: manual cleanup, wrong location, static substitution, no logging, provider lock-in
- Good reconstruction is invisible infrastructure; bad reconstruction is where the architecture quietly fails on the last mile
Frequently Asked Questions
Why isn't reconstruction just a simple swap-back?
The LLM doesn't just copy tokens from input to output — it generates new text that reasons over them. Tokens appear in contexts the input never had, in combinations the model invented, sometimes with formatting drift (CUST-7F2A coming back as CUST 7F2A or "the customer designated 7F2A"). A naive exact-match swap fails on these cases, leaving token fragments visible in the output. Real reconstruction is robust token recognition across diverse generative output forms, with substitution that preserves grammatical coherence.
Where does reconstruction have to run?
Inside the enterprise environment, colocated with the mapping and the source systems. Reconstruction requires reading the mapping; if reconstruction runs outside the enterprise — on a vendor's infrastructure, in a third-country region, or on middleware the enterprise doesn't fully control — the mapping has to be replicated to that location, which collapses the protection the tokenisation provided. This is the most common architectural mistake in deployments of this pattern.
What are the three integration patterns for reconstruction?
Inline reconstruction — the AI integration layer performs reconstruction before returning the response. Works for synchronous request-response workflows like contract review. Streaming reconstruction — works on token-by-token streams, buffering enough to recognise data-protection tokens as they appear. Needed for chat-style UIs and live summarisation. Event-driven reconstruction — happens at the boundary between AI integration and a downstream system like a CRM, ticketing platform, or document store; reconstruction occurs just before the data is written into the system the user will see.
What should happen when the model hallucinates a token?
Three reasonable responses, and the choice should be configurable per workflow. Flag the hallucinated token and surface it as an explicit gap — preserves transparency at the cost of cleanliness. Drop the hallucinated reference and rewrite the surrounding sentence — cleaner output but obscures that the model produced something not grounded in the input. Reject the response and re-prompt the model with a constraint to use only input tokens — highest quality, adds latency and cost. A summarisation for internal review may prefer flagging; a document going to a customer may prefer re-prompting.
What does a reconstruction audit log need to contain?
Enough metadata to answer "who triggered the reconstruction of which token, when, in service of which workflow, and where did the result go." It does not need to contain the original values themselves — that would defeat the access controls — but it has to make the operation traceable. Logs should be retained separately from the workflow logs, with different access controls, and under the same boundary constraints as the mapping itself. They are, in effect, an audit trail of the most sensitive operation in the architecture.
What are the most common reconstruction mistakes?
Five recur. Manual cleanup — building reconstruction as a step users have to perform; they will skip it. Wrong location — running reconstruction in a vendor cloud because it's convenient; it collapses the protection. Static substitution — treating reconstruction as exact string match-and-replace; it fails on the messy real output from production models. No logging — the team cannot defend the architecture or do incident response. Provider lock-in — coupling reconstruction logic to a specific LLM's response format; it breaks when the workflow switches vendor.