Running External LLMs on Data Your Company Can't Send Externally

Q: How is this approach different from data masking?

Masking optimises for what's removed — names, IDs, and identifying fields are replaced with redaction markers. That works for documents where the sensitive part is cleanly separable, but it breaks on operational data because the structure the AI needs to reason about gets destroyed in the process. The approach described here uses structure-preserving tokenisation: sensitive elements are replaced with placeholders that retain format, type, and relationships, so the AI sees a coherent document rather than a fragmented one. The mapping back to original values stays inside the enterprise.

Most enterprise AI workflows stall when external LLMs require data the company can't expose. A look at the architectural patterns that move past the stall — and what trade-offs each one carries.

AI Architecture~10 min readMay 2026

TL;DR

External LLMs produce measurably better output than what most enterprises can run internally — but regulated companies can't send their actual operational data to them. The three standard responses — send and accept the risk, mask and redact, or run an on-premise model — each break at a predictable point. A different approach changes what crosses the boundary rather than whether to cross it: sensitive elements are replaced with structure-preserving tokens inside the enterprise environment, the external LLM works on the tokenised version, and the response is reconstructed internally. The original data never leaves; the frontier-model capability is preserved. This isn't universal — it fits analytical workflows where sensitive elements are identifiable in advance, not personalisation or verification tasks that require the literal identifier. Four design properties define a sound implementation: transformation inside the enterprise environment, exclusive enterprise control of the mapping, a sensitivity definition that evolves with the business, and reconstruction that runs as infrastructure rather than as a manual step.

1. The Quiet Contradiction at the Centre of Enterprise AI

There is a quiet contradiction at the centre of most enterprise AI strategies in 2026.

The external large language models — ChatGPT, Claude, Gemini, and the others — produce measurably better output than anything most enterprises can run internally. They handle context windows that internal models struggle with. They get better every few months without the enterprise paying for retraining. They are, on most dimensions that matter to a business, the obvious choice.

And yet, in a striking number of regulated enterprises, those models cannot legally or contractually touch the data the business actually runs on. Customer records cannot be sent to a US-hosted endpoint. Operational logs are bound by sector-specific data location requirements. Documents contain identifiers that some contract somewhere says will never leave a defined boundary. The data the AI would be most useful on is the data the AI cannot be allowed to see.

The result is a pattern that any CIO at a European bank, insurer, telecom, or hospital recognises. The pilots run on synthetic or anonymised samples. The benchmarks look good. The board is told AI is coming. Then production hits the data the pilot didn't touch, and the workflow stalls. Some teams give up. Some build internal LLMs that disappoint. Some quietly route data through approved channels they're not supposed to use — what people now call shadow AI.

This article walks through the three standard approaches that enterprise AI teams reach for first, where each one breaks, and a different approach — one that has appeared in production deployments across telecom, healthcare, finance, and defence — that changes the question entirely.

2. Why This Problem Is Harder Than It Looks

The instinct, when first encountering the problem, is to reach for one of the existing tools: a data masking library, a PII detection API, a private deployment of an open-source model. Each tool solves part of the problem, and each one fails at a different point in the workflow.

The reason is that enterprise data isn't structured the way the tools assume. A consumer privacy tool assumes the sensitive part of a record is a name, an email, a phone number — fields that can be detected and replaced. But an enterprise document is a service ticket with twelve cross-referenced fields, a free-text description containing customer phrases, an attached log fragment, and references to internal asset IDs. The sensitive information lives in the structure as much as in the fields. Drop the structure and the AI has nothing useful to work on.

The other reason is that enterprise data location requirements are often less about what kind of data and more about where it goes. A document may be perfectly legal to process internally and impossible to send to a third country — because of customer commitments, sector-specific data location rules, or the company's own data posture. The constraint is geographic and contractual, not just categorical.

This is the actual situation enterprise AI teams are trying to solve for: structured operational data that is useful to AI precisely because of its detail, bound by constraints that prevent it from leaving a defined environment, where the most capable models live on the other side of that boundary.

3. The Three Standard Approaches — and Where Each One Breaks

Most enterprise AI conversations end up at one of three patterns. Each is reasonable in isolation. None of them, on its own, gets to production on the workflows that matter.

3.1 Pattern 1 — Send the Data, Accept the Risk

The simplest pattern is to send the data to the external LLM and rely on the vendor's contractual commitments — Data Processing Agreements, standard transfer clauses, regional endpoints. This is what most public LLM use looks like in practice.

The pattern works for workflows where the data isn't sensitive in the first place — marketing copy, public documents, internal generic queries. It stops working the moment the workflow touches customer records, operational systems, or anything bound by contractual data location commitments. The vendor's promise is contractual, not architectural. For the workflows where it matters most, contractual is the level where enterprise legal review tends to draw a line.

3.2 Pattern 2 — Mask and Redact Before Sending

The next instinct is to keep the data internal-only at the point of sensitivity. Names, IDs, and identifying fields get masked or redacted before the document goes to the LLM. The model sees a sanitised version.

This works for documents where the sensitive part is cleanly separable — a contract where you mask the parties, a CV where you mask the name. It breaks on operational data for two reasons. The first is that masking destroys the structure the AI needs: a table where customer names have been replaced with [REDACTED] is no longer a table the AI can reason about. The second is that operational data is full of identifiers that simple masking doesn't see — ticket numbers, asset IDs, internal codes, network identifiers, free-text references — and any one of those can be sensitive depending on context.

The fundamental issue is that masking optimises for what's removed rather than what remains usable. For workflows where the AI needs to understand the relationships in the data, removal-based approaches break the workflow even when they succeed at hiding the sensitive parts.

3.3 Pattern 3 — Run a Model Entirely On-Premise

The third pattern is to give up on external LLMs and run an open-source model on internal infrastructure. The data never leaves. The contractual and data-location questions resolve themselves.

This works, but it carries costs that aren't always visible at the start. Internal models, even good ones, tend to lag the frontier external models by twelve to eighteen months on the dimensions enterprises actually care about — reasoning over complex documents, handling unfamiliar formats, working with long context. Operational cost is real: GPU infrastructure, model serving stack, evaluation pipeline, a team that knows how to maintain all of it. And there's the brittleness: every six months the external models leap forward, and the gap between what the internal model can do and what the business now expects gets uncomfortable.

For some workflows — especially those where data location requirements are absolute and the workflow is bounded enough that a smaller model suffices — this pattern is the right answer. For most others, it solves the data problem at the cost of the AI gains.

3.4 What These Three Patterns Have in Common

All three patterns share an assumption: the question is whether to send the data to the external LLM, and if not, what to send instead. They take the boundary as fixed and ask what fits through it.

A different approach asks a different question.

Pattern	Works when	Breaks when	Core trade-off
Send the data, accept the risk	Data isn't sensitive — marketing copy, public docs, generic queries	Workflow touches customer records, operational systems, or contracted data	Contractual protection, not architectural
Mask and redact before sending	Sensitive part is cleanly separable — contract parties, CV name	Operational data where structure carries meaning; non-obvious identifiers	Optimises for removal, not for what remains usable
Run a model entirely on-premise	Workflow bounded enough for a smaller model; data location is absolute	Workflow needs frontier reasoning, long context, unfamiliar formats	Solves data problem at the cost of AI gains

4. A Different Approach — Change What Crosses the Boundary

This approach starts from a different premise. Instead of asking what can we send to the external LLM, it asks what would the external LLM need to be useful, and can we send that instead of the original data.

The answer, for a large class of enterprise workflows, turns out to be: the LLM needs the structure, the relationships, the question being asked, and the form of the answer expected. It does not need the literal customer name, the actual account number, the real asset identifier. It needs a placeholder that behaves the same way the real value would in the context of the task.

If the original sensitive elements are replaced with structured tokens — placeholders that preserve format, type, and relationships, but that have no meaning outside the originating environment — then what crosses the boundary is no longer the original data. It's a transformation of the data that retains everything the AI needs and removes everything the boundary was meant to keep in.

The mapping between tokens and original values stays inside the environment. The AI processes the tokenised version and returns a tokenised response. Inside the environment, the tokens are mapped back to the original values, and the response becomes a business-ready output containing real customer names, real figures, real references.

This isn't masking, because the tokens preserve structure and format. It isn't synthetic data, because the workflow runs on real production data. It isn't on-premise deployment, because the heavy lifting still happens on the frontier external models. It's a transformation layer that sits between the enterprise environment and the external AI, and it changes what the external AI sees without changing what the enterprise environment knows.

Different communities use different names for parts of this approach. In data-protection practice the substitution step is usually called tokenisation — replacing sensitive values with placeholders that can be mapped back. When tokenisation is combined with structure preservation, format-preserving stand-ins, and optional statistical protections, the resulting data-preparation layer is sometimes referred to as an encapsulation layer — a broader architecture that contains tokenisation as its core mechanism. The terminology varies; the architectural idea is consistent: the boundary doesn't move, and the AI capability doesn't shrink. What changes is the form of the data that crosses.

Figure 1 · The original data and the mapping never leave the enterprise. Only tokenised data and tokenised responses cross the boundary.

●

The architecture in one sentence. Original data and the token-to-value mapping stay inside the enterprise environment; only tokenised data and tokenised responses cross the boundary to the external LLM.

5. How the Approach Works in Practice

The architecture decomposes into four stages, each with independent design decisions.

5.1 Detection

Before anything is transformed, the system has to identify what counts as sensitive in this context. The honest version of detection is harder than it sounds: the sensitive elements in operational data aren't just names and IDs but the specific markers that this enterprise has decided matter — project codes, deal terms, internal asset references, sector-specific identifiers. Generic PII detection finds maybe forty percent of what matters. The other sixty percent has to be defined by the enterprise itself, in a way that can adapt as the business changes. Detection at production quality has to handle both structured fields and unstructured free text, with semantic awareness rather than pattern matching.

5.2 Transformation

Once sensitive elements are identified, they need to be replaced with placeholders that preserve the structural role the original played. A customer name in a free-text field becomes a token that the AI can recognise as a name and reference consistently. An account number stays an account number, just one without semantic meaning outside the system. The transformation has to handle tables, cross-references, hierarchies, and document structure without breaking any of them. Done well, the result reads like a coherent document with anonymous-but-realistic stand-ins. Done poorly, the result is full of [REDACTED] markers that no longer make sense.

A subset of implementations layer additional protection on top of tokenisation — adding statistical noise, enforcing k-anonymity across batches, applying differential-privacy techniques to specific attributes. This reduces the residual risk that a sophisticated correlator could re-identify entities from the tokenised data alone. Whether you need this layer depends on the threat model: for most enterprise workflows, structure-preserving tokenisation alone is sufficient if the mapping is well-controlled. For workflows where data must remain in the EU region or other defined boundaries even at the architectural level, the additional protection is worth the complexity.

5.3 External Processing

The tokenised document goes to the external LLM through whatever API or integration the enterprise uses. From the LLM's perspective, this is just a request — it has no way to know the document has been transformed, and no need to. The LLM does its work — summarisation, extraction, classification, reasoning — and returns a tokenised response containing the same tokens that went in.

5.4 Reconstruction

Inside the enterprise environment, the tokenised response is mapped back to original values. The placeholders for customer names get the real customer names back. Placeholder account numbers become real account numbers. The structure and reasoning the AI produced is preserved; only the placeholders are replaced. The result is a business-ready output ready to flow into the originating workflow.

The reconstruction step is the part most teams underestimate. It is the part that determines whether the AI output is actually usable in production or whether someone has to manually rebuild it. A good reconstruction layer is invisible: the user submits a document, the AI returns an analysis, and the analysis comes back with real values. The transformation and reconstruction happen as infrastructure, not as user-facing steps.

6. Which Workflows This Approach Actually Solves

This approach isn't universal. It works for a specific class of workflows, and being honest about which class matters when deciding whether to adopt it.

The workflows that fit are ones where the AI's task is structural or analytical, and where the sensitive elements are identifiable in advance. Examples include:

Summarising contracts
Drafting incident reports from operational logs
Extracting risk clauses from due-diligence documents
Generating clinical notes from structured patient records
Analysing root causes from network alarm sequences
Classifying claims from insurance filings

In each, the AI is reasoning over structure and content, and the customer-identifying parts are means to an end, not the end itself.

The workflows that don't fit are ones where the AI must operate on the literal sensitive content as part of its task. Personalised content generation that addresses the customer directly. Verification workflows that must check against the actual identifier. Investigative searches that require the original strings.

A useful rule of thumb: if the AI's output could be expressed as "do this analytical thing to this kind of data and tell me what you find," the approach usually works. If the AI's output requires "act on this specific customer/case/identifier," it usually doesn't.

7. What You Have to Decide Before Deploying

Adopting this approach isn't a checkbox decision. It carries architectural choices that are easier to make at the start than to revisit later.

7.1 Where the Transformation Runs

The transformation layer has to run inside the enterprise environment — on-premise, in the company's own cloud VPC, on dedicated infrastructure. The constraint is that the transformation happens before the data reaches the external network, which means the layer is colocated with the source systems, not with the AI endpoint.

7.2 Who Controls the Mapping

The mapping between tokens and original values is the most sensitive component of the architecture. It is, in effect, the key that re-identifies the data. Standard practice — and good practice — is that the mapping is held exclusively by the enterprise, in storage the external LLM provider has no access to. This is a non-negotiable design property, not a configuration option. If your vendor's architecture allows the mapping to leave the enterprise environment, the approach's protection collapses.

7.3 How Sensitivity Is Defined

Generic PII categories — names, emails, phone numbers — are the start, not the end. The enterprise has to define what counts as sensitive in its own context: internal project codes, customer-segment identifiers, sector-specific references. The definition has to be versioned, because what counts as sensitive changes — yesterday it was financial customer data; today it's the new M&A code name; tomorrow it's the asset references for the regulated unit. A static definition becomes stale fast.

7.4 How the Workflow Handles Output

Reconstruction has to happen inside the enterprise environment, integrated into whatever delivery channel the workflow uses — the ticketing system, the document management platform, the analyst's review interface. If reconstruction is a separate manual step, users will skip it, and the architecture's value evaporates.

7.5 What Happens When the External Endpoint Changes

External LLMs are not stable infrastructure. Models get deprecated, vendors change pricing, new options appear. The approach works best when the transformation layer is provider-agnostic — when swapping ChatGPT for Claude or for a new vendor is a configuration change, not an architectural rewrite.

8. The Limits of This Approach, Honestly

The approach resolves a real problem but it doesn't resolve every problem. The limitations are worth stating clearly.

It doesn't help when the AI's actual job requires the original sensitive data — verification, search, personalisation tasks where the literal identifier is the point.
It adds latency. Detection, transformation, and reconstruction each take time. For most enterprise workflows this is invisible — the overhead is fractions of a second in a workflow that takes seconds anyway. For latency-critical applications, the overhead may matter.
It requires sustained investment in the detection and definition layer. Sensitivity isn't static; markers evolve with the business; the definition has to evolve too. A team has to own that, and the team has to be wired into how the business actually changes. Buying the technology without owning the definition leaves the architecture in slow decay.
It doesn't substitute for organisational decisions about what data should be processed in the first place. Some workflows shouldn't be sent to external models at all, regardless of transformation — the data is too sensitive, the workflow is too critical, the failure mode is too costly. The approach is for workflows where the answer is "this would be useful with the right architecture," not for workflows where the answer is "no, never."
It requires that the enterprise actually deploy the layer inside its own environment. Vendors that offer this approach as an external SaaS — where the transformation happens on the vendor's infrastructure — have collapsed the architecture into a different problem. The whole point is that the transformation runs where the data already lives.

9. Where This Leaves the AI Strategy

For most regulated EU enterprises in 2026, the path to production AI runs through some version of this approach. The economics of external LLMs are too good to ignore; the constraints on data location are too real to override; the existing tools — masking, on-premise deployment — solve parts of the problem but not all of it.

This approach is not a finished category. It has multiple implementations from multiple vendors, with different design choices around detection, transformation strength, reconstruction handling, and deployment topology. Choosing among them comes down to questions specific to the enterprise's environment: what existing systems the layer has to integrate with, what the sensitivity definition looks like, what deployment posture the security team has already committed to, what the workflow mix between external and on-premise models is going to be.

What's consistent across the implementations is the architectural commitment that defines the approach:

The original data stays inside the enterprise boundary
The AI capability is preserved
The transformation runs at a layer the enterprise controls
The mapping that enables reconstruction stays under the enterprise's exclusive control

When those four properties hold, the workflows that have been stalled in pilot start moving toward production.

The contradiction that opens this article doesn't fully resolve — there will always be workflows where the constraint and the capability cannot be reconciled. But for the broad middle of enterprise AI work, this approach is the architectural answer that lets the AI strategy and the data strategy stop being in conflict.

Key takeaways

Regulated enterprises face a contradiction: the most capable AI models are external; the most useful data can't leave the boundary
The three standard responses — accept the risk, mask and redact, or run on-premise — each break at a predictable point
A different approach changes what crosses the boundary rather than whether to cross it — using structure-preserving tokenisation
Four stages: detection → transformation → external processing → reconstruction
Fits analytical workflows where sensitive elements are identifiable in advance; does not fit personalisation or verification tasks
Four non-negotiable design properties: transformation inside the enterprise, exclusive control of the mapping, evolving sensitivity definition, reconstruction as infrastructure
Limits: added latency, ongoing investment in detection, no substitute for organisational decisions about what should be processed at all
Provider-agnostic by design — swapping ChatGPT, Claude, or Gemini is a configuration change, not an architectural rewrite

Frequently Asked Questions

How is this approach different from data masking?

Masking optimises for what's removed — names, IDs, and identifying fields are replaced with redaction markers. That works for documents where the sensitive part is cleanly separable, but it breaks on operational data because the structure the AI needs to reason about gets destroyed in the process. The approach described here uses structure-preserving tokenisation: sensitive elements are replaced with placeholders that retain format, type, and relationships, so the AI sees a coherent document rather than a fragmented one. The mapping back to original values stays inside the enterprise.

Does this work for any external LLM, or only specific ones?

The approach is provider-agnostic by design. From the external LLM's perspective, it receives a normal request — it has no way to know the document has been transformed, and no need to. This means swapping ChatGPT for Claude, Gemini, or a new vendor is a configuration change rather than an architectural rewrite. The transformation layer is the constant; the external model is the variable.

What happens to the mapping between tokens and original values?

The mapping is the most sensitive component of the architecture — in effect, the key that re-identifies the data. Standard and good practice is that the mapping is held exclusively by the enterprise, in storage the external LLM provider has no access to. This is a non-negotiable design property. If a vendor's architecture allows the mapping to leave the enterprise environment, the approach's protection collapses.

Where does the transformation layer have to run?

Inside the enterprise environment — on-premise, in the company's own cloud VPC, or on dedicated infrastructure that the enterprise controls. The constraint is that the transformation happens before the data reaches the external network. Vendors that offer this approach as an external SaaS — where transformation happens on the vendor's infrastructure — have collapsed the architecture into a different problem. The whole point is that the transformation runs where the data already lives.

What workflows does this approach not solve?

Workflows where the AI must operate on the literal sensitive content as part of its task. Personalised content generation that addresses the customer directly. Verification workflows that must check against the actual identifier. Investigative searches that require the original strings. A useful rule of thumb: if the AI's output could be expressed as "do this analytical thing to this kind of data and tell me what you find," the approach usually works. If the output requires "act on this specific customer/case/identifier," it usually doesn't.

How does this differ from running an on-premise open-source model?

On-premise deployment solves the data problem by replacing the AI capability with something internal — the data never leaves, but the model is whatever the enterprise can host. That works when the workflow is bounded enough for a smaller model to suffice. The transformation-layer approach keeps the frontier external models in the loop and changes what crosses the boundary instead. The heavy lifting still happens on the most capable models; only the form of the data is different.

Have a deployment question?

Bring your industry, your regulatory profile, and your data. We respond within one business day.

Request a Live Demo

Email : contact@cubig.ai

CUBIG LTD (United Kingdom)

Company Number: NI735459
Address: 21 Arthur Street, Belfast, Antrim, United Kingdom, BT1 4GA

CUBIG CORP (Republic of Korea)

Business Registration Number : 133-81-45679

E-Commerce Registration : 2023-Seoul-Seocho-2822

Address: 4F, NAVER 1784, 95, Jeongjail-ro, Bundang-gu, Seongnam-si, Gyeonggi-do, Republic of Korea

Product

Resources

Company

Legal

Consent Preferences

Email : contact@cubig.ai

CUBIG LTD (United Kingdom)

Company Number: NI735459
Address: 21 Arthur Street, Belfast, Antrim, United Kingdom, BT1 4GA

CUBIG CORP (Republic of Korea)

Business Registration Number : 133-81-45679

E-Commerce Registration : 2023-Seoul-Seocho-2822

Address: 4F, NAVER 1784, 95, Jeongjail-ro, Bundang-gu, Seongnam-si, Gyeonggi-do, Republic of Korea

Product

Resources

Company

Legal

Consent Preferences

Email : contact@cubig.ai

CUBIG LTD (United Kingdom)

Company Number: NI735459
Address: 21 Arthur Street, Belfast, Antrim, United Kingdom, BT1 4GA

CUBIG CORP (Republic of Korea)

Business Registration Number : 133-81-45679

E-Commerce Registration : 2023-Seoul-Seocho-2822

Address: 4F, NAVER 1784, 95, Jeongjail-ro, Bundang-gu, Seongnam-si, Gyeonggi-do, Republic of Korea

Product

Resources

Company

Legal

Consent Preferences

Email : contact@cubig.ai

CUBIG LTD (United Kingdom)

Company Number: NI735459
Address: 21 Arthur Street, Belfast, Antrim, United Kingdom, BT1 4GA

CUBIG CORP (Republic of Korea)

Business Registration Number : 133-81-45679

E-Commerce Registration : 2023-Seoul-Seocho-2822

Address: 4F, NAVER 1784, 95, Jeongjail-ro, Bundang-gu, Seongnam-si, Gyeonggi-do, Republic of Korea

Product

Resources

Company

Legal

Consent Preferences

Email : contact@cubig.ai

CUBIG LTD (United Kingdom)

Company Number: NI735459
Address: 21 Arthur Street, Belfast, Antrim, United Kingdom, BT1 4GA

CUBIG CORP (Republic of Korea)

Business Registration Number : 133-81-45679

E-Commerce Registration : 2023-Seoul-Seocho-2822

Address: 4F, NAVER 1784, 95, Jeongjail-ro, Bundang-gu, Seongnam-si, Gyeonggi-do, Republic of Korea

Product

Resources

Company

Legal

Consent Preferences