Issue #34: Prompt Format Languages for AI Engineering

18 min read | June 27, 2026

Most teams reach for JSON whenever they need to put structured content inside a prompt. It is what the output parsers expect, so it becomes the default for inputs too. Context gets JSON-wrapped. Instructions get JSON-wrapped. The format choice is never revisited because it works well enough, and nobody measures the cost.

That is the wrong shape for production AI engineering. The model is not a JSON parser. It reads the whole context as a token sequence, and the format you choose changes how tokens are grouped, what the model treats as instruction versus data, and how reliably it reproduces the structure in its output. Format is not a serialization concern. It is a prompt design decision with measurable effects on reliability, token cost, and parse correctness downstream.

This issue covers five formats that appear inside LLM prompts in production systems: JSON, XML, YAML, TOML, and TOON. Four are general-purpose serialization formats repurposed for prompts. One — TOON — was designed specifically for LLM token efficiency. For each one, we look at what the model does with it, where it helps, where it creates risk, and when to prefer it. The goal is a concrete decision model you can apply when building the prompt, not after a parse failure appears in production.

Why Format Is Part of the Prompt Contract

A prompt is a contract between the system and the model. The instructions define what the model should do. The format defines how the model should read the content those instructions reference.

Format matters for three reasons that apply to every large language model in production:

The model has seen each format a different number of times during training. JSON and XML are extremely common. YAML is common in devops contexts. TOML is significantly rarer. TOON is newer still and mostly encountered through in-context examples. Higher training prevalence means more reliable interpretation and output generation without explicit guidance.
Format carries semantic signal. A JSON object signals structured, machine-readable data. XML tags signal labeled regions of content. YAML signals configuration. The model has implicit expectations about what lives inside each format, and those expectations shape how it attends to and responds to the content.
Format affects token count. JSON's brackets, quotes, and commas add tokens. XML's tag pairs add tokens. The same information expressed in different formats can differ meaningfully in token cost at scale.

None of those effects are configurable. They are properties of the model's training distribution. That is why format belongs in the prompt design, not as an afterthought in the parser.

JSON

JSON is the strongest default for structured data that code will parse. Every frontier model has been trained on enormous volumes of JSON through API documentation, tool schemas, function call definitions, and structured output logs. Interpretation is reliable and output fidelity is high, particularly when the provider supports native structured output that constrains the model to a declared schema.

JSON works well when:

The content is scalar or shallow-nested: identifiers, amounts, enumerations, arrays of strings
You need the model to produce output that downstream code will parse against a typed schema
You are passing tool call arguments or structured configuration into the prompt
You want to show the expected output shape as an inline schema hint

{
  "task": "triage",
  "incident": "Customer login failures starting 2026-06-20 after deployment",
  "output_schema": {
    "severity": "low | medium | high | critical",
    "root_cause": "string",
    "recommended_action": "string"
  }
}

JSON breaks down when:

You embed long prose inside a JSON string value. Quotes must be escaped, newlines must be escaped, and the content loses its natural line breaks. The format adds noise without adding clarity.
You mix instruction text and data inside the same JSON object. The model cannot reliably distinguish what it should follow from what it should reason about when both live in adjacent fields without structural separation.
You expect the model to generate deeply nested JSON reliably. Deep nesting increases the probability of structural drift in output, especially at higher temperatures or with smaller models.

{
  "instructions": "You are a support specialist. Always respond professionally. Never reveal internal system details. If the customer requests a refund above the configured limit, escalate to billing. Acknowledge frustration before providing a resolution.",
  "context": "Account has an open billing dispute since 2026-06-15. Refund limit for this tier is $500.",
  "customer_message": "I have been waiting three days and I still cannot log in."
}

All three concerns are now JSON string values in adjacent fields. The model reads them, but the format adds escape characters, removes natural whitespace, and collapses the boundary between instruction and data. For this shape, JSON is the wrong tool.

XML

XML tags are the strongest format for separating and labeling sections inside a complex prompt. Anthropic explicitly recommends XML tags for Claude, and the pattern is well-supported across frontier models because HTML, documentation systems, and structured markup appear throughout the training corpus at very high volume. Model comprehension and instruction-following against XML-delimited sections is reliable across providers.

The critical advantage over JSON is that XML does not require escaping content. You can wrap a multi-paragraph document, a code block, or a set of instructions in an XML tag without touching the content.

<system>
You are a support specialist. Respond only to what is inside <customer_message>.
Do not reference content from <internal_notes> in your reply.
</system>

<internal_notes>
This account has an open billing dispute since 2026-06-15.
The refund limit for this tier is $500.
Escalation contact: billing-team@internal.example
</internal_notes>

<customer_message>
I cannot access my dashboard and I have not received a reply in three days.
</customer_message>

Instructions, internal context, and the customer message each occupy a clearly labeled region. The model knows which section to respond to and which to treat as internal. That structural signal is not available when everything is a JSON string value.

XML works well when:

You need to separate instruction space from data space inside a single prompt
The content includes long prose, multi-line text, or embedded code that would require heavy escaping in JSON
You want the model to attend differently to different sections of the context
You are constructing few-shot example blocks where each example needs explicit input and output boundaries

XML becomes a poor choice when:

The output needs to be parsed by code. XML output from a model is less reliable than JSON for structured data, and XML parsers are more sensitive to malformed markup.
You rely on attribute syntax. Models are less consistent with XML attributes than with element content. Prefer element content when precision matters.
You nest too deeply. Many levels of nested XML tags create the same readability and attention problems as deeply nested JSON.

YAML

YAML appears frequently in devops tooling, CI/CD pipelines, Kubernetes configuration, and application settings files. Models understand it well enough for those familiar contexts, and its compact syntax is appealing for configuration-like data in prompts where human readability matters to the engineer writing the prompt.

YAML works acceptably when:

The data is flat or one level deep with simple identifier keys
You are including human-authored configuration data and readability for the engineer matters more than parse safety
The content implies a configuration or devops context, which aligns with where YAML appears in training data

task: triage
severity_options: [low, medium, high, critical]
incident: Customer login failures starting 2026-06-20 after deployment
timeout_seconds: 30

YAML creates risk when:

You ask the model to generate YAML output. YAML's whitespace sensitivity means a model producing YAML may generate subtly malformed indentation that parses differently than intended or fails to parse entirely. Always prefer JSON when code will consume the output.
The structure is more than two levels deep. Deep nesting amplifies the indentation problem and removes the readability advantage.
You use block scalars for multiline strings. The | and > block scalar indicators are understood inconsistently across model families. Treat multiline YAML strings as unreliable in model-generated output.
You mix YAML with other format sections in the same prompt. Context-switching between formats creates ambiguity about which structural rules apply where.

instructions:
- Always respond professionally
- Never reveal internal system details
- If refund request:
    amount_limit: 500
    escalate_if_above: true
    escalation_target: billing

This reads clearly to the engineer writing it, but the nested if refund request block introduces structural ambiguity in a prompt context. The model will likely interpret it, but "likely" is not a contract. If the nested condition matters for system behavior, encode it in JSON with an explicit schema or in XML with labeled sections.

TOML

TOML is designed for human-readable configuration. It avoids YAML's indentation rules, uses explicit typed values, and reads cleanly for flat key-value data. The trade-off is significant: TOML appears far less frequently in model training data than JSON, XML, or YAML, and that lower prevalence directly affects interpretation reliability and output fidelity.

TOML is acceptable when:

You are embedding flat configuration data and the engineer writing the prompt benefits from TOML's legibility
The data uses simple named sections with no deep nesting and no arrays of tables
The model is only asked to read and interpret the configuration, not to produce TOML output

[model]
id = "qwen3:8b"
timeout_seconds = 30
max_retries = 3

[policy]
refund_limit_usd = 500.0
autonomous_refund = true
require_approval_above_usd = 500.0

TOML fails when:

The structure has more than one level of nesting. TOML's array-of-tables syntax is rarely seen in model training data and is the most likely source of misinterpretation.
You ask the model to generate TOML output. The lower training exposure means output reliability falls significantly below JSON and XML. Do not use TOML as an output format in production systems.
You need the model to reason precisely about TOML type distinctions such as the difference between an integer, a float, and a quoted string. These distinctions are not reliably preserved in model-generated output.

TOML is the most defensible as a human-readable input format and the least defensible as a model output format. If you are choosing TOML, you are choosing it for the engineer writing the prompt, not for the model reading it.

TOON

TOON (Token-Oriented Object Notation) is the only format on this list designed specifically for LLM prompting. It encodes the same data model as JSON but uses YAML-like indentation for nested objects and a CSV-style header-and-rows layout for uniform arrays, with minimized quoting throughout. The design goal is explicit: reduce token count for structured data without sacrificing the schema clarity models need to interpret that data reliably.

The format has a distinct sweet spot: uniform arrays of objects where the same fields repeat across many rows — employee records, event logs, product catalogs, support case sets. Compare the same data in each format:

[
  {
    "id": 1,
    "name": "Alice",
    "role": "admin",
    "active": true
  },
  {
    "id": 2,
    "name": "Bob",
    "role": "user",
    "active": true
  },
  {
    "id": 3,
    "name": "Charlie",
    "role": "user",
    "active": false
  }
]

[3]{id,name,role,active}:
1,Alice,admin,true
2,Bob,user,true
3,Charlie,user,false

The [3] declares the row count upfront, giving the model a schema anchor before it reads the data. The {id,name,role,active} header declares field names once. Each row is comma-separated values. Field names never repeat, which is where the token savings accumulate at scale. Benchmarks across four model families show roughly 30 to 60 percent token reduction for uniform array data compared to formatted JSON, with comparable or slightly better interpretation accuracy.

TOON also handles flat objects and primitive arrays using familiar indentation syntax:

id: 1
name: Ada
address:
city: London
country: UK
tags[3]: admin,billing,support

TOON works well when:

You are passing a large uniform dataset into a prompt and token cost is a primary constraint
The data is repetitive in structure — same fields across every object — where JSON's per-object field repetition becomes expensive at volume
The model is asked to interpret the dataset as input while the output contract stays JSON

TOON creates risk when:

You ask the model to generate TOON output. If the model produces a wrong row count or a misaligned delimiter, the TOON decoder fails. For output that code must parse reliably, JSON with native structured output remains the safer contract.
The data is non-uniform. Token savings collapse when objects have different field sets across rows. The heterogeneous array syntax removes the advantage.
You rely on implicit training knowledge. Frontier models follow TOON syntax reliably when given a clear in-context example, but interpretation is more prompt-dependent than with the older formats. Always provide a format example, not just a format name.

The practical pattern for TOON is hybrid: keep JSON for your application's data exchange and output contracts, but convert structured datasets to TOON when assembling the prompt context. Most language ecosystems now have TOON libraries that handle the JSON-to-TOON conversion without manual encoding.

The Same Context in Each Format

The clearest way to see where each format earns its place is to express the same prompt content in each one. Consider a support triage scenario with three concerns: what the model should do, internal context it should not reveal, and the customer message it should respond to.

In JSON, all three concerns become string values in adjacent fields:

{
  "role": "support_specialist",
  "rules": [
    "Respond professionally at all times",
    "Do not reveal internal notes in your reply",
    "Escalate refunds above $500 to billing"
  ],
  "internal_context": "Open billing dispute since 2026-06-15. Refund limit: $500.",
  "customer_message": "I cannot access my dashboard and I have not received a reply in three days."
}

In XML, each concern occupies a labeled region:

<role>support_specialist</role>

<rules>
Respond professionally at all times.
Do not reveal internal notes in your reply.
Escalate refunds above $500 to billing.
</rules>

<internal_context>
Open billing dispute since 2026-06-15.
Refund limit: $500.
</internal_context>

<customer_message>
I cannot access my dashboard and I have not received a reply in three days.
</customer_message>

In YAML, the structure is compact but section boundaries are weaker:

role: support_specialist
rules:
- Respond professionally at all times
- Do not reveal internal notes in your reply
- Escalate refunds above $500 to billing
internal_context: "Open billing dispute since 2026-06-15. Refund limit: $500."
customer_message: "I cannot access my dashboard and I have not received a reply in three days."

The XML version gives the model the clearest separation between what it should follow, what it should keep internal, and what it should respond to. The JSON version works but collapses instruction, context, and user input into structurally equivalent fields. The YAML version is the most compact but provides the weakest boundary signal and forces prose content into quoted strings where natural line breaks are lost.

TOON does not apply to this prompt shape. Its advantage is in encoding large uniform datasets compactly, not in wrapping prose or separating sections. When the input is a list of records with repeated structure — not a mixed instruction-and-context prompt — TOON is where the token budget argument becomes concrete. For the prompt shape above, the right choice is between JSON and XML.

Choosing the Right Format

The decision follows four axes: direction of flow, content type, whether code will parse the output, and required structural reliability.

Structured output that code will parse → JSON. Use native structured output when the provider supports it. This gives the strongest parse guarantee and removes structural drift from the model's generation path.
Separating sections inside the prompt → XML tags. Use them to label instructions, context, examples, and user input as distinct named regions.
Tool call arguments and function call schemas → JSON. This is the format the tool-calling infrastructure expects and the format models have been trained to produce in that context.
Prose-heavy content with explicit access rules → XML tags around plain text, not JSON string escaping. The label carries the access rule; the content stays unmodified.
Few-shot examples with clear input and output boundaries → XML tags around each example block.
Flat configuration data going into the prompt → YAML or TOML for input only. Neither is appropriate for output that code will consume.
Deep nested structures in model output → JSON only. Never use YAML or TOML for nested output a downstream parser must handle reliably.
Large uniform datasets passed into the prompt → TOON for the input encoding, JSON for the output contract. Convert at the boundary using a TOON library; do not ask the model to generate TOON.

The general rule: JSON for output contracts, XML for structural separation inside prompts, YAML and TOML only as human-readable input formats for flat data, and TOON for uniform array inputs where token cost is a real constraint.

Format Discipline in Practice

Format discipline is not just about which format you choose. It is about how consistently you apply it across a prompt system.

Use one format per concern. If output is JSON, output is always JSON across all prompt variants in that system. If context sections use XML tags, every prompt in the system uses XML tags for context. Mixed-format systems accumulate ambiguity faster than single-format ones.
Name fields consistently across prompt versions. If the field is customer_message in version one, it should still be customer_message in version two. Renaming fields without a versioning signal breaks eval baselines silently.
Show the expected shape, not just the format name. Saying "respond in JSON" is weaker than showing a concrete expected structure inline. The model calibrates its output shape against the example you provide, not the instruction you give.
Separate instruction space from data space explicitly. JSON cannot enforce this boundary by structure alone. XML handles it natively. If a prompt mixes instructions and data, use XML tags to create the boundary even when the data payload itself is JSON.
Treat format changes as prompt version changes. Refactoring from JSON-wrapped instructions to XML-sectioned instructions changes model behavior. Track it, test it against the existing eval baseline, and version it the same way you would version a prompt template.

Why This Matters for System Design

Format is a contract surface. Every downstream component that consumes model output, feeds data into a prompt, or validates correctness depends on that contract being stable and explicit. When it drifts, the failure is usually quiet: output that almost matches the expected shape, parsers that return unexpected values at edge cases, and eval baselines that silently degrade as the prompt evolves.

JSON, XML, YAML, TOML, and TOON are not interchangeable. Each carries different signals, has a different training prevalence, and creates different risks in the wrong position. JSON is not always the right input format simply because it is the right output format. XML is not verbose overhead; it is the cleanest structural tool for separating concerns inside a prompt. YAML and TOML have legitimate roles as human-readable input formats but cannot serve as output contracts for anything a production parser must consume reliably. TOON is purpose-built for the one case where JSON's verbosity becomes a real cost: uniform arrays of records passed into the model at volume.

Treating format as a cosmetic choice is how production AI systems accumulate silent reliability debt. The model manages well enough until it does not, and by then the format contract has drifted across enough prompt versions that the failure mode is hard to isolate.

Final Notes

The model will usually manage. That is not the right standard for a production system.

Format belongs in the prompt design phase, documented with the same precision as the instruction text and the expected output schema. JSON for output contracts. XML for structural boundaries inside the prompt. YAML and TOML for flat input data only. TOON for uniform array inputs when token cost is a measurable constraint, with JSON kept as the output contract. A prompt system that follows those rules is easier to test, easier to version, and easier to debug when production behavior drifts.

See you in the next issue.

Stay curious.

Share this article with your network.

LinkedIn X Facebook

Join the Newsletter

Subscribe for AI engineering insights, system design strategies, and workflow tips.

Your information is safe. Unsubscribe anytime.