Functions

`count`

fn (text: Str, model: Str): Int

Count tokens in text for model. When model resolves to a known BPE encoding, defers to the host tokenizer; otherwise falls back to the chars/4 heuristic.

Pure (no IO, no allocation beyond the encode buffer); safe to call inside hot loops.

`count-message`

fn (m: Map, model: Str): Int

Token count for a single chat-message-shaped map ({role, content, tool-calls?, tool-call-id?}), inclusive of a fixed per-message overhead. Uses the BPE tokenizer when an encoding is known for model; falls back to the heuristic otherwise.

`count-messages`

fn (messages: Vec, model: Str): Int

Token count for a vec of chat messages. Matches the (messages, model) -> Int contract expected by ::ai::chat/ChatOptions.count-tokens-fn, so this can be passed directly:

opts ChatOptions({
    chat-fn: ::openai/chat-with-tools,
    model: "gpt-4o-mini",
    count-tokens-fn: ::ai::tokenizer/count-messages
})

For OpenAI-family models the result is within a few tokens of the provider-reported prompt_tokens; for non-OpenAI models it falls through to the chars/4 heuristic and is therefore an estimate only.

`count-text-heuristic`

fn (text: Str): Int

Character-based token estimator (chars/4) used when no BPE encoding is known for the model. Tracks BPE tokenizers to within ~10% on English prose; code, JSON, and non-Latin text skew higher.

`count-with-encoding`

fn (text: Str, encoding: Str): Int

Lower-level counterpart to count that takes the encoding name directly (e.g. "o200k_base"). Errors when encoding is unknown to the host runtime — use ::hot::internal::tokenizer/encodings to enumerate what is supported.

`encoding-for-model`

fn (model: Str): Str?

Resolve model to a BPE encoding name. Matches by longest known family prefix and returns null for non-OpenAI models so callers can fall back to a heuristic or a provider-native counter.

Recognized families (April 2026):

Encoding	Models
`o200k_harmony`	`gpt-oss-*`
`o200k_base`	`gpt-5`, `gpt-4.1`, `gpt-4o`, `o1`/`o3`/`o4`,
	`codex-`, `chatgpt-4o-`
`cl100k_base`	`gpt-4`, `gpt-4-32k`, `gpt-3.5-turbo`,
	`text-embedding-ada-002`, `text-embedding-3-*`
`p50k_base`	`text-davinci-002`, `text-davinci-003`, `code-*`
`p50k_edit`	`text-davinci-edit-001`, `code-davinci-edit-001`
`r50k_base`	`davinci`, `curie`, `babbage`, `ada`, `gpt2`

`is-supported-model`

fn (model: Str): Bool

True when model resolves to an encoding this runtime knows how to tokenize with. Cheap probe; useful when deciding whether to install count-messages as count-tokens-fn or fall back to a provider-native counter.

`ns` alias

Alias of ::ai::tokenizer/

Accurate token counting for chat models, backed by tiktoken-rs via ::hot::internal::tokenizer. Designed to plug into ::ai::chat/ChatOptions.count-tokens-fn so pre-call budget checks can refuse to send a request that would overflow the model's context window.

Two-tier API:

Text-level — count(text, model) returns the BPE token count for arbitrary text, falling back to the shared character-based heuristic when no encoding is known for the model.
Chat-level — count-message(msg, model) and count-messages(messages, model) mirror the (messages, model) -> Int shape that ::ai::chat expects, with a per-message overhead added so the estimate matches what the provider will actually bill.

Coverage today is limited to OpenAI-family encodings (o200k_*, cl100k_base, p50k_*, r50k_base/gpt2). Anthropic, Gemini, and other providers fall through to the heuristic; their adapters typically supply a provider-native count-tokens-fn that calls the vendor API instead.

Example

::tokenizer ::ai::tokenizer

opts ChatOptions({
    chat-fn: ::openai/chat-with-tools,
    model: "gpt-4o-mini",
    count-tokens-fn: ::tokenizer/count-messages,
    max-context-tokens: 120_000,
    warn-context-pct: 0.85
})

`supported-encodings`

fn (): Vec

The BPE encoding names this runtime can tokenize with. Thin pass-through over ::hot::internal::tokenizer/encodings.

::stream-labels ::tool

Functions

count

count-message

count-messages

count-text-heuristic

count-with-encoding

encoding-for-model

is-supported-model

ns alias

supported-encodings