Capability maturity: early-development living owner: Azwaan reviewed: 2026-07-05

Reproducible Evaluation — Capability

Purpose

Provide an immutable, fully-provenanced experiment & benchmark harness so engines (assessment, migration, …) can be objectively evaluated and compared across versions — “a scientific instrument” for capability improvement.

At a glance

Field Value
Slug cap-reproducible-evaluation
Maturity 🔶 early-development
Owning repo(s) website-intelligence-lab
Layer(s) 2 · Intelligence Platform
Last reviewed 2026-07-05

Business value

You cannot improve what you cannot reproduce and compare. This capability makes engine/capability improvement measurable, protecting quality as the ecosystem scales.

Technical responsibilities

  • Immutable, provenanced runs; quality-tiered benchmarks; a reference catalog of best-practice targets.
  • A corpus of real/synthetic subject businesses; a technology-agnostic platform adapter contract.
  • Not responsible for the engines themselves (external repos) or production hosting.

Owning repository or repositories

Repo Role
website-intelligence-lab The evaluation harness (infra, corpus, runs)

Consuming repositories

Repo How it consumes
inexisstudios (Website Assessment Engine) Engines run against the lab, writing immutable runs (planned — external engines)

Consuming ventures

  • Inexis Digital (indirectly — better-evaluated engines → better client outcomes).

Inputs

  • Subject businesses + observed website assets; external engine executions; external knowledge (pinned by version).

Outputs

  • Immutable provenanced runs (Generated assets); benchmark evaluations; run-to-run diffs.

Dependencies

Capability relationships

graph LR
    LAB[website-intelligence-lab] -->|provides| CAP[[Reproducible Evaluation]]
    SS[Reusable AI Skills] -.underpins.-> CAP
    ENG[[Website Assessment engines]] -.evaluated by.-> CAP
    CAP -->|measured improvement| WA[[Website Assessment]]

Current maturity

🔶 Early-development. Phase 1–2 complete & VPS-validated (infra); Phase 2.5 in progress; core loop (adapter, runs, evaluation) pending; crawler not built (observed assets capture_pending). (Source: website-intelligence-lab digest.)

Planned evolution

Adapter → runs/provenance → evaluation loop; the Experiments domain; additional platform adapters and engines.