Why AI governance needs more than a token effort
Enterprises rushing to deploy large language models are repeating one of the most expensive mistakes of the cloud era – according to Chris Neilon, managing partner at Lightouch Consulting. Without a coherent token strategy, what looks like modest AI experimentation can rapidly become a runaway cost centre – so the AI expert has spelled out what every leader needs to know and do to adapt.
Before worrying about cost, it helps to understand the unit you are being charged for. Large language models do not read words; they read tokens. A token is roughly three-quarters of a word, or about four characters of English text. The sentence you are reading right now contains approximately 30 tokens. Every interaction with an LLM is measured and billed in tokens: every question asked, every document uploaded, every answer generated.
Critically, tokens are consumed in two directions. Input tokens cover everything sent to the model: the instruction, the background context, the conversation history, any documents or data provided. Output tokens cover the response the model generates. Both are metered. Both add up.
At the scale of a handful of test users this is trivial. At enterprise scale, covering thousands of employees, automated workflows and customer-facing applications, it is material.
A familiar warning
If you were present for the enterprise cloud adoption wave of the 2010s, this dynamic will feel uncomfortably familiar. Teams spun up servers on demand without governance. Nobody asked how much compute a given workload truly needed. Development environments ran around the clock. Storage grew unchecked. The elasticity that made cloud attractive also made overspend dangerously easy, with the bill arriving at the end of the month.
Token consumption follows the same pattern, but with some additional traps unique to LLMs. The most significant is the context window. Unlike a cloud server, which simply idles when not in use, every exchange in a conversation carries the full weight of everything said before it. Each new message re-sends the entire conversation history to the model. A 30-minute employee support session may start with a modest token count, but by the end it is many multiples larger. Scale that across thousands of daily interactions and the arithmetic becomes uncomfortable very quickly.
Add to this: verbose system prompts repeated on every call; retrieval-augmented generation (RAG) pipelines that dump entire documents into context rather than targeted excerpts; automated agents running chains of LLM calls for a single task; and frontier models priced at a significant premium. Without governance, each of these is a slow leak that becomes a flood.
Five principles every organisation should establish
The good news: token costs are highly controllable. The organisations that manage them well share a common set of operating principles.
Measure before you manage
You cannot govern what you cannot see. Establish token logging and attribution from day one, broken down by team, application and use case. Treat token spend as a first-class metric alongside compute, storage and API costs.
Match model to task
Frontier models are powerful, but they are also the most expensive. Not every task warrants them. Summarising a short document, classifying a support ticket, or generating a routine first draft are tasks a smaller, faster, cheaper model can handle perfectly well. Build a model-routing strategy that deploys the right capability at the right cost point.
Treat context as a managed resource
Context is not free. Every token that enters the context window costs money. Establish standards for what should and should not be included. This means precise retrieval over bulk document injection, and disciplined session design over open-ended, ever-growing conversation threads.
Standardise prompt architecture
System prompts, which are the instructions that shape model behaviour, are sent with every call. An unreviewed accumulation of guidance, caveats and examples can inflate a system prompt to thousands of tokens, all of which are charged on every single interaction. Treat prompts as production assets: version-controlled, audited, and regularly trimmed.
Set budgets and build accountability
Token budgets should exist at the enterprise, team, and application level. Where possible, hard limits and automated alerts should be built into your AI infrastructure. Accountability without visibility is ineffective; visibility without accountability is just interesting data.
The strategic imperative
The organisations that capture the most value from AI will not simply be those that adopt it earliest. They will be the ones that adopt it most intelligently, extracting genuine productivity and insight while managing cost with the same discipline they bring to any other significant infrastructure investment.
Tokens are the unit of value exchange with large language models. Right now, most enterprises are spending them without counting them. The window to establish governance before costs become entrenched is open, but it will not stay open indefinitely.
