GOVERNANCE
ZARA Fixtures Dashboard
This document explains how the ZARA Fixtures system works end‑to‑end, how the ZARA Fixtures Dashboard is built, and how to add or modify fixtures safely. It is intended for the systems‑info library and assumes basic familiarity with ZARA, Hecate, and Docusaurus.
1. What the ZARA Fixtures system is
ZARA Fixtures are deterministic, repeatable test cases used to validate that the ZARA LLM:
- Produces schema‑valid JSON output
- Does not hallucinate beyond provided context
- Correctly resolves tables, bundles, and columns
- Produces actionable, high‑quality engineering instructions
Each fixture simulates a real documentation slice and runs ZARA against it under controlled conditions.
The results are:
- Persisted as NDJSON history
- Aggregated into JSON snapshots
- Visualized in the ZARA Fixtures Dashboard
2. High‑level architecture
MDX slice + hints
↓
run‑fixtures.ts
↓
ZARA LLM (strict JSON schema)
↓
Metrics + flags + derived scores
↓
history.ndjson → history.json
↓
ZARA Fixtures Dashboard (Docusaurus)
Key properties:
- No golden outputs (no expected.json required)
- Failures are signal, not regressions
- All metrics are computed automatically
3. Folder structure
All ZARA fixture logic lives under:
scripts/hecate/zara/
Important subfolders:
fixtures/
01-table-contract/
input.json
excerpt.mdx
02-ambiguity-handling/
03-code-and-admonitions/
fixtures/results/
history.ndjson # append‑only raw history
history.json # aggregated for dashboard
run-fixtures.ts # main runner
llm-generate.ts # ZARA LLM call
zaraOutputSchema.json # strict output schema
Unused or future fixtures should live in:
scripts/hecate/zara/unused-fixtures/
They are intentionally not executed.
4. What a fixture is
A fixture represents one test scenario and consists of:
Required files
input.json
Defines metadata and hints used by ZARA:
{
"fixtureId": "01-table-contract",
"name": "Table contract resolution",
"specId": "input-hub-general",
"needType": "Task",
"headingText": "Signal Registry",
"tableHints": ["Signals table"],
"tableBundles": [ ... ]
}
This file is machine‑only and must always be valid JSON.
excerpt.mdx
A verbatim slice of documentation that ZARA will see.
Rules:
- Must be valid MDX
- Should match real docs structure
- May include tables, code blocks, admonitions
- No frontmatter (this is not a real page)
Example:
### Object
| Field | Type | Required |
|-------|------|----------|
| id | UUID | yes |
> ⚠️ The `id` must be stable across updates.
This is the single most important input to ZARA.
5. How excerpts should be written
Golden rules
- Copy from real documentation whenever possible
- Keep it small but complete
- Include ambiguity when testing ambiguity handling
- Include real‑world messiness (notes, warnings, partial tables)
What to test with excerpts
| Goal | Include |
|---|---|
| Table resolution | Multiple tables + references |
| Ambiguity | Vague wording, missing constraints |
| Hallucination | Things ZARA must not invent |
| Code handling | Code blocks + prose around them |
6. Running fixtures
Local run
export OPENAI_API_KEY=sk-...
npm run zara:fixtures
Outputs:
fixtures/results/history.ndjson- Per‑fixture debug artifacts (last‑result.json, last‑output.json)
Nightly run (CI)
- Runs via GitHub Actions
- Appends to history
- Publishes aggregated JSON
- Fails CI depending on
ZARA_FIXTURES_FAIL_ON
Environment controls:
| Variable | Effect |
|---|---|
| ZARA_FIXTURES_FAIL_ON=none | Never fail |
| warn | Fail on WARN or FAIL |
| fail (default) | Fail only on FAIL |
7. Metrics explained
Each fixture run produces:
Metrics
schemaValid: JSON schema compliancehallucinations: conservative countclarificationsNeeded: number of clarifying questionsactionableBullets: usable instructionstablesDetected / resolvedcolumnsDetected / resolved
Derived
bundleResolutionPctcolumnResolutionPctqualityScore(0‑100)
Flags
hasHallucinationschemaViolation
Status
| Status | Meaning |
|---|---|
| PASS | Clean run |
| WARN | Acceptable but degraded |
| FAIL | Schema or hallucination failure |
8. The ZARA Fixtures Dashboard
The dashboard lives in:
docusaurus/src/components/ZaraFixturesDashboard.tsx
It visualizes:
Left column (66.66%)
- Quality & Resolution line chart
- Clarifications bar chart
- Shared time‑range focus (Brush + slider)
Right columns (16.67% each)
- Schema Validity heatmap (year view)
- Hallucinations heatmap (year view)
Bottom
- Latest snapshot (monospace, system‑style)
All charts are derived directly from history.json.
9. Adding a new fixture (checklist)
- Create new folder under
fixtures/NN-name/ - Add
input.json - Add
excerpt.mdx - Run
npm run zara:fixtures - Verify dashboard renders
- Commit fixture folder
⚠️ Never modify history.ndjson by hand.
10. Common failure modes
| Problem | Cause |
|---|---|
Unexpected token # | Markdown leaked into JSON output |
| Schema violation | Output too long / missing field |
| Hallucination | ZARA referenced unseen concept |
| Dashboard empty | history.json missing or invalid |
All failures are signals, not bugs.
11. Design philosophy (important)
- Fixtures are observational, not assertive
- We measure behavior, not exact text
- ZARA is allowed to evolve
- Dashboards show trends, not pass/fail gates
This makes the system robust, future‑proof, and model‑agnostic.
12. When to add expected.json (rare)
Only add expected.json if:
- You need regression locking for a critical behavior
- You accept higher maintenance cost
By default: do not use expected.json.