ZAYAZ Automation System – Operator Runbook
1. Purpose
This document explains how scheduled and manual automations in the ZAYAZ system work, how they are triggered, and how operators should reason about failures, overrides, and safety.
The automation system is designed to be:
- Centralized (single scheduler, single dispatcher)
- Observable (events, dashboard, warnings)
- Safe by default (manual-only jobs are never scheduled)
- UI-driven (enable/disable + run-now without touching YAML)
2. Architecture Overview
GitHub (cron every 10 min)
|
v
POST /api/automation/tick
|
v
System API (Fly.io)
- reads automation catalog
- reads overrides store
- evaluates cron + enable flags
- dispatches jobs
|
v
GitHub Actions
automation.yml (dispatcher)
|
v
npm run <script>
3. Key Components
| Component | Responsibility |
|---|---|
| package.json | Source of truth for what scripts exist |
| scripts/system/automation.overrides.json | Human-facing catalog metadata + default schedules |
| docusaurus/static/system/automation.json | Generated public automation catalog |
| System API (/api/automation/*) | Scheduling, overrides, dispatch, logging |
| /data/automation.overrides.json (Fly volume) | Runtime state (enabled, cron overrides, lastRunAt) |
| automation.yml | Single dispatcher workflow |
| automation-tick.yml | Calls scheduler every 10 minutes |
4. Automation Types
4.1. Scheduled Automations
These have:
enabledDefault: true- non-empty
cron array
They run automatically via /tick.
Examples:
- Jira syncs & indices
- Docs hygiene
- Hecate audits
- Search release
- ZARA fixtures
4.2. Manual-Only Automations
These have:
enabledDefault: false"cron": []
They never run automatically but are visible and runnable from the UI.
Examples:
jira:backfill-linkszara:jirazara:jira:create
These are usually:
- corrective
- side-effect-heavy
- migration-oriented
- potentially destructive
4.3. One-Off / Developer Scripts (Not in Automation System)
These are intentionally excluded and explained in the System Settings UI.
Examples:
- dev servers
- setup scripts
- composite helpers
- scripts already wrapped by higher-level automations
5. Scheduling Model (Important)
- All schedules live in data, not YAML.
- GitHub has exactly one cron: every 10 minutes.
/api/automation/tick:- evaluates cron expressions
- de-dupes using lastRunAt
- respects enable/disable flags
- dispatches only due jobs
This avoids:
- duplicated schedules
- YAML drift
- hard-to-debug cron behavior
6. Overrides & Control
Enable / Disable
- Controlled via UI
- Stored in /data/automation.overrides.json
- Takes precedence over defaults
Cron Overrides
- Can be changed later via System API (future UI)
- Default cron comes from catalog
Manual Run
- “Run now” triggers
/api/automation/run/:id - Bypasses scheduler
- Uses the same dispatcher workflow
7. Observability & Safety
Dashboard
The Automation Dashboard shows:
- Tick activity
- Dispatch success / errors
- Lock contention
- Catalog fetch issues
- Top offending automations
This replaces email-based monitoring.
Locks
- Global tick lock prevents overlap
- Prevents duplicate dispatches
- Visible as “locked ticks” in UI
Audit Trail
- All ticks and dispatches are logged
- Available via /api/automation/events
- Used by the dashboard
8. Operational Playbooks
“Something didn’t run”
- Open System Settings → Automation Dashboard
- Check:
- catalog errors
- dispatch errors
- lock contention
- Inspect “Top offenders”
- Re-run manually if needed
Disable temporarily”
- Toggle Enabled off in UI
- Scheduler will skip it immediately
- No YAML changes required
Run something dangerous
- Only possible for manual-only jobs
- Explicitly labeled
- Requires admin token
- Never scheduled automatically
Design Principles (Why this system exists)
- No cron in YAML except the scheduler
- No hidden jobs
- No email spam
- No accidental side effects
- Everything observable
- Everything reversible
This is intentional.