Beyond the buzzwords: Differentiating AI token-based pricing from AI tokenisation in enterprise practice

Can AI-driven automation bridge the cybersecurity skills gap effectively?

Enterprise GenAI programmes are being slowed down by a surprisingly small word: token.

In boardrooms and procurement meetings, “tokens” are often discussed as if they are one thing. In practice, there are two separate concepts:

  • AI tokenisation: how text is broken into tokens so a model can process it. Conflate the two meanings of “token”, three predictable failures emergeconflate the two meanings of “token”, three predictable failures emergedown multiple token categories (including caching-related token types) in its cost reporting
  • AI token-based pricing: how vendors charge you for the tokens you send and receive

Mix them up, and you get flawed business cases, weak vendor negotiations, and cost overruns that look “unexpected” only because the wrong thing was measured.

This article explains the difference in clear, executive-friendly terms, and shows how CISOs, CTOs and CFOs can govern both without turning the organisation into a prompt-engineering shop.

The word token has two meanings (and they are not interchangeable)

Here is the simplest way to hold the distinction:

  • Tokenisation is a technical behaviour of the model interface: it turns your text into tokens.
  • Token-based pricing is a commercial model: it turns token counts into spend.

They interact, but they are not the same problem. Treating them as the same is like confusing kilowatt-hours (usage) with your electricity tariff (pricing).

Tokenisation: how your text becomes billable units

Tokenisation is the process by which LLMs break text into smaller units (tokens). Tokens are not “words”. They are often word fragments, punctuation, or common character sequences.

A useful rule of thumb for English text is that 1 token is roughly ~4 characters (it varies by language and content type). (platform.openai.com)

That variability is exactly why tokenisation matters for executives: two prompts that look similar in a slide deck can produce meaningfully different token counts in production.

What increases token counts in real enterprise prompts

Tokenisation is driven by what you actually send to the model, not what your user typed into a chat box. Common token multipliers include:

  • Long system prompts (policy, tone, formatting rules) repeated on every request
  • Chat history (especially if you keep entire conversation threads)
  • RAG content (documents, knowledge base passages, ticket history, logs)
  • Verbose output formats (long JSON, “explain your reasoning”, multi-step write-ups)
  • Large tool/function definitions (agentic workflows, function calling, schemas)

And importantly, even if you can use a larger context window, you often shouldn’t. Azure OpenAI model options include very large context windows, and Microsoft notes that overly large tool/function definitions can cause failures even before you hit stated context limits. (learn.microsoft.com)

Tokenisation is not just a cost driver. It is also a data exposure driver: every additional token you send is more business context, leaving your application boundary and entering an AI processing flow (even when hosted in your cloud tenant).

Token-based pricing: how vendors convert usage into spend

Token-based pricing is how providers charge you, typically by metering:

  • Input tokens (what you send)
  • Output tokens (what the model returns)

This is explicit in mainstream LLM pricing pages and cloud services. For example, OpenAI’s API pricing is published per token for both input and output (with separate rates). (openai.com)
Azure OpenAI pricing similarly describes pay-as-you-go billing for input and output tokens on standard (on-demand) deployments. (azure.microsoft.com)

Other platforms have additional wrinkles. Amazon Bedrock, for example, breaks out multiple token categories (including caching-related token types) in its cost reporting approach. (docs.aws.amazon.com)

The executive mistake: assuming “we bought tokens” means “we bought outcomes”

A vendor quote may say “$X per million tokens” (or per 1,000 tokens). That sounds like a stable unit cost. It isn’t—unless you also control tokenisation inputs:

  • If your prompt structure expands by 3x, your spend expands by ~3x.
  • If your output length creeps up, your spend expands—often faster than expected, because output rates can be higher than input rates. (openai.com)

This is why the CFO’s unit economics and the CTO’s architecture choices are inseparable—but still distinct governance topics.

How confusing tokenisation and pricing breaks GenAI business cases

When organisations blur the two meanings of “token”, three predictable failures show up.

Flawed ROI models that ignore prompt reality

Business cases are often built on “average prompt size” guesses that reflect what a user types, not what the system sends.

Example: a “simple” analyst assistant might actually ship:

  • 800 tokens of system instructions (policy + formatting)
  • 500 tokens of chat history
  • 1,500 tokens of retrieved knowledge
  • 200 tokens of user query

That is 3,000 input tokens before the model answers a single question.

If the business case assumes “200 tokens per request”, it will be wrong by an order of magnitude—without anyone acting in bad faith.

Poor vendor negotiations (because the wrong levers are discussed)

Procurement teams sometimes push hard on token price while ignoring the largest cost lever: how many tokens you will consume.

Better negotiation questions are:

  • Are input and output priced differently?
  • Is there discounted pricing for repeated context (caching)?
  • Are there controls to cap or shape output length?
  • Can we get reporting by app, user, department, and workload type?

These are commercial questions driven by the realities of tokenisation. The realities of tokenisation

Unexpected cost overruns that are actually design overruns

In production, overruns often come from entirely “reasonable” changes:

  • Adding more background context “to improve accuracy”
  • Keeping longer chat history “to improve user experience”
  • Switching to a larger context window “to avoid truncation”
  • Generating more verbose outputs “to satisfy auditability”

Each change increases tokenisation volume. Token-based pricing then converts that volume into spend.

Tokenisation drives consumption: practical examples (Azure OpenAI and other LLM services)

The goal here is not to turn executives into prompt engineers. It shows why prompt structure is a financial control surface.

Example A: A customer support copilot (cost creep via context and verbosity)

Assume a model priced like this (illustrative, using published OpenAI API rates):

  • Input: $0.75 per 1M tokens
  • Output: $4.50 per 1M tokens (openai.com)

Now compare two designs:

  • Design 1 (disciplined context): 2,500 input tokens + 400 output tokens
    Approx cost per request:
    • Input: 2,500 / 1,000,000 × 0.75 ≈ $0.0019
    • Output: 400 / 1,000,000 × 4.50 ≈ $0.0018
    • Total ≈ $0.0037 per request
  • Design 2 (bloated context + wordy answers): 10,000 input tokens + 1,200 output tokens
    Approx cost per request:
    • Input ≈ $0.0075
    • Output ≈ $0.0054
    • Total ≈ $0.0129 per request

That is ~3.5x per interaction, purely from tokenisation choices. The token-based pricing didn’t change; your design did.

Example B: SOC-style summarisation (cost and security move together)

Security teams often want GenAI to summarise incidents:

  • Alerts
  • User/device timeline
  • Related tickets
  • Relevant threat intel
  • Recommended response steps

If you send full raw logs, you may unintentionally ship highly sensitive data (usernames, hostnames, internal URLs, query strings, patient IDs, etc.) and balloon token counts.

A safer and cheaper pattern is typically:

  • Retrieve only the top few relevant artefacts (RAG discipline)
  • Redact or minimise sensitive fields
  • Set explicit output constraints (length, format)
  • Log token usage per incident for audit and cost allocation

The key point for CISOs: larger metrics in the response (prompt tokens, completion tokens, total tokens) do not automatically make the context “more secure” or “more accurate”. It is more data movement, more exposure, and more spend.

Require every AI-enabled workflow to declare an explicit “context budget” and “output budget” (in tokens) at design time—just like you would require RTO/RPO targets or data retention rules.

Pricing translates consumption into spend: what CISOs, CTOs and CFOs should govern separately

CTO: architecture and product controls (tokenisation governance)

The CTO’s remit is to ensure engineering choices don’t create uncapped token growth:

  • Standardise prompt patterns (what must be in the system prompt, what must not)
  • Control chat history length and summarise history instead of replaying it
  • Put guardrails on context window selection (bigger is not always better)
  • Use retrieval discipline: fewer, higher-quality chunks instead of dumping documents
  • Cap output length and enforce structured answers where it reduces verbosity

Also, insist on measurement. Many APIs return usage in the response (prompt tokens, completion tokens, total). (help.openai.com)

CFO: unit economics, forecasting and commercial controls (pricing governance)

The CFO’s remit is to make costs predictable and attributable:

  • Separate input vs output in reporting (they can price differently) (openai.com)
  • Forecast by workload type (support chat, document summarisation, agentic automation)
  • Allocate costs to cost centres (department, product, customer, region)
  • Negotiate pricing constructs that match reality (discounts, commitments, caching models)
  • Treat “token spend” like any other variable consumption line: governed, budgeted, monitored

CISO: data minimisation, auditability and third-party risk (security governance)

The CISO’s remit is to ensure tokenisation doesn’t become uncontrolled data disclosure:

  • Classify what data is allowed in prompts (and what is prohibited)
  • Apply redaction/minimisation before the retrieval of content enters prompts
  • Define logging rules (what prompts/responses may be stored, for how long, and where)
  • Ensure identity and access controls for who can run which AI workflows
  • Assess provider-specific cost and routing behaviours (for example, cross-region routing can affect cost structures on some platforms) (docs.aws.amazon.com)

In other words, tokenisation is a data-handling pathway, not just a billing meter.

The executive checklist: avoid token confusion in your next GenAI initiative

Use this as a practical governance baseline.

  • Define two metrics in every business case:
    • Tokenisation metrics: average input tokens, average output tokens, peak context size
    • Pricing metrics: £/MTok (or £/1K tokens) for input and output, plus any discounts/tiers
  • Instrument usage at the API boundary (store token counts, model, app, user, incident/customer ID). (help.openai.com)
  • Set hard guardrails:
    • Maximum context budget per request
    • Maximum output budget per request
    • Default “short answer” modes for high-volume workflows
  • Make security explicit:
    • Data allowed in prompts
    • Data residency requirements
    • Retention and audit rules for prompts and outputs
  • Procure like a grown-up:
    • Negotiate on total unit economics, not headline token price
    • Ask what is billed (input, output, cached input, tool calls, etc.) (docs.aws.amazon.com)

Why this matters in real operations: from GenAI pilots to SOC automation

As GenAI moves from experimentation to operational workflows (IT, customer service, security operations), the organisations that win will be the ones that can answer two questions confidently:

  1. What data are we sending, and is it appropriate? (tokenisation as a data pathway)Understanding how different prompt components-such as system prompts, chat history, and RAG content-affect token counts enables more accurate cost forecasting and better budget management, especially since two prompts that look similar can produce
  2. What will it cost at scale, and can we cap it? (token-based pricing as a financial model)

At SecQube, we see the same pattern in security operations: AI can massively reduce triage time and workload, but only if the organisation treats context design and commercial metering as separate, governed disciplines—not as buzzwords.

If you want more practical content on operationalising AI safely in down multiple token categories (including caching-metrics in the response (prompt tokens, completion tokens, total tokens, related token types) in its cost reporting, conflate the two meanings of “token”, three predictable failures emerge. Microsoft-alignedIn Microsoft-aligned security environments, do some research andIn Microsoft-aligned security environments, do some research, and security environments, do some research, stick to Microsoft as closely as you can. I can’t believe I just said that.    

Written By:
Cymon Skinner
design svgdesign svgdesign svg
SaaS
Experts

Harvey®

AI SOC
SOC
Incident
Skills Gap

SecQube®

Try today
SaaS

Harriet

design color imagedesign svg
design color imagedesign color image