Capability maturity: early-development living owner: Azwaan reviewed: 2026-07-05

Reproducible Evaluation — Capability

Purpose

Provide an immutable, fully-provenanced experiment & benchmark harness so engines (assessment, migration, …) can be objectively evaluated and compared across versions — “a scientific instrument” for capability improvement.

At a glance

Field	Value
Slug	`cap-reproducible-evaluation`
Maturity	🔶 early-development
Owning repo(s)	`website-intelligence-lab`
Layer(s)	2 · Intelligence Platform
Last reviewed	2026-07-05

Business value

You cannot improve what you cannot reproduce and compare. This capability makes engine/capability improvement measurable, protecting quality as the ecosystem scales.

Technical responsibilities

Immutable, provenanced runs; quality-tiered benchmarks; a reference catalog of best-practice targets.
A corpus of real/synthetic subject businesses; a technology-agnostic platform adapter contract.
Not responsible for the engines themselves (external repos) or production hosting.

Owning repository or repositories

Repo	Role
`website-intelligence-lab`	The evaluation harness (infra, corpus, runs)

Consuming repositories

Repo	How it consumes
`inexisstudios` (Website Assessment Engine)	Engines run against the lab, writing immutable runs (planned — external engines)

Consuming ventures

Inexis Digital (indirectly — better-evaluated engines → better client outcomes).

Inputs

Subject businesses + observed website assets; external engine executions; external knowledge (pinned by version).

Outputs

Immutable provenanced runs (Generated assets); benchmark evaluations; run-to-run diffs.

Dependencies

Reusable AI Skills; Docker/Caddy/WordPress/Cloudflare infra.

Capability relationships

graph LR
    LAB[website-intelligence-lab] -->|provides| CAP[[Reproducible Evaluation]]
    SS[Reusable AI Skills] -.underpins.-> CAP
    ENG[[Website Assessment engines]] -.evaluated by.-> CAP
    CAP -->|measured improvement| WA[[Website Assessment]]

Current maturity

🔶 Early-development. Phase 1–2 complete & VPS-validated (infra); Phase 2.5 in progress; core loop (adapter, runs, evaluation) pending; crawler not built (observed assets capture_pending). (Source: website-intelligence-lab digest.)

Planned evolution

Adapter → runs/provenance → evaluation loop; the Experiments domain; additional platform adapters and engines.

Repo: website-intelligence-lab · Platform: Intelligence Platform
Principles: reproducibility & provenance (Principle 1)
Registry: Capability Registry · Reuse Map