Reproducible Evaluation — Capability
Purpose
Provide an immutable, fully-provenanced experiment & benchmark harness so engines (assessment, migration, …) can be objectively evaluated and compared across versions — “a scientific instrument” for capability improvement.
At a glance
| Field | Value |
|---|---|
| Slug | cap-reproducible-evaluation |
| Maturity | 🔶 early-development |
| Owning repo(s) | website-intelligence-lab |
| Layer(s) | 2 · Intelligence Platform |
| Last reviewed | 2026-07-05 |
Business value
You cannot improve what you cannot reproduce and compare. This capability makes engine/capability improvement measurable, protecting quality as the ecosystem scales.
Technical responsibilities
- Immutable, provenanced runs; quality-tiered benchmarks; a reference catalog of best-practice targets.
- A corpus of real/synthetic subject businesses; a technology-agnostic platform adapter contract.
- Not responsible for the engines themselves (external repos) or production hosting.
Owning repository or repositories
| Repo | Role |
|---|---|
website-intelligence-lab |
The evaluation harness (infra, corpus, runs) |
Consuming repositories
| Repo | How it consumes |
|---|---|
inexisstudios (Website Assessment Engine) |
Engines run against the lab, writing immutable runs (planned — external engines) |
Consuming ventures
- Inexis Digital (indirectly — better-evaluated engines → better client outcomes).
Inputs
- Subject businesses + observed website assets; external engine executions; external knowledge (pinned by version).
Outputs
- Immutable provenanced runs (Generated assets); benchmark evaluations; run-to-run diffs.
Dependencies
- Reusable AI Skills; Docker/Caddy/WordPress/Cloudflare infra.
Capability relationships
graph LR
LAB[website-intelligence-lab] -->|provides| CAP[[Reproducible Evaluation]]
SS[Reusable AI Skills] -.underpins.-> CAP
ENG[[Website Assessment engines]] -.evaluated by.-> CAP
CAP -->|measured improvement| WA[[Website Assessment]]
Current maturity
🔶 Early-development. Phase 1–2 complete & VPS-validated (infra); Phase 2.5 in progress; core loop
(adapter, runs, evaluation) pending; crawler not built (observed assets capture_pending).
(Source: website-intelligence-lab digest.)
Planned evolution
Adapter → runs/provenance → evaluation loop; the Experiments domain; additional platform adapters and engines.
Related
- Repo:
website-intelligence-lab· Platform: Intelligence Platform - Principles: reproducibility & provenance (Principle 1)
- Registry: Capability Registry · Reuse Map