Skip to content

Checker Pipeline

The checker pipeline is the part of Shape that turns reviewable architecture claims into pass or fail output. It is intentionally deterministic: the same set of .shape files and changed-file inputs should always produce the same facts, the same rule decisions, and the same diagnostics.

That determinism is important because Shape is not trying to infer intent from source code at review time. The language records claims that a human can inspect, then the checker rejects claims that conflict with the model. Analyzer output and authoring helpers can assist the workflow, but the checker itself is built around explicit declarations.

Checker pipeline diagram showing parse, lower facts, run rules, and emit diagnostics with facts, rules, and provenance.

flowchart TD
A[".shape files"] --> B["Parse with Langium"]
B --> C["ShapeModule ASTs"]
C --> E["Effective model"]
E --> F["Lower declarations into facts"]
F --> G["Run semantic rules"]
G --> H{"Diagnostics?"}
H -->|"none"| I["pass"]
H -->|"one or more"| J["reject with causal trail"]

Think of the checker as five small phases rather than one large validator.

PhaseInputOutputMain job
ParseSource textShapeModule ASTsReject syntax the language cannot understand.
Lower factsEffective modelFact list and internal indexesNormalize syntax into records rules can consume.
Run rulesFacts and indexesSemantic diagnosticsReject incoherent claims and missing obligations.
Format diagnosticsDiagnostics with provenanceCLI/editor outputExplain the shortest causal path to the reviewer.

This split keeps each phase honest. The parser does not decide whether HardDelete<AuditEvent> is allowed. Rule evaluation does not re-parse source text. Diagnostic formatting does not invent new facts. The checker becomes easier to extend because each new concept must choose where it belongs in the pipeline.

Langium parses each loaded .shape file into a ShapeModule. A module contains imports and a list of top-level declarations: resources, traits, components, relations, implementations, attestations, rules, rationales, memories, and reevaluations.

At this stage the checker only knows whether the text follows the grammar. For example, this is syntactically meaningful even if later rules reject it:

module audit
resource AuditEvent : AppendOnly
component AuditStore {
owns AuditEvent
grants HardDelete<AuditEvent>
fn purgeOldEvents
source ts("src/audit/purge.ts#purgeOldEvents")
effects complete {
HardDelete<AuditEvent>
evidence ts("src/audit/purge.ts:12-16")
}
}

The parser accepts the shape of the declaration. The later semantic stages decide that an append-only resource cannot be hard-deleted.

Parser diagnostics have exit code 2 because the checker could not build a semantic model at all. Semantic diagnostics have exit code 1 because the model was understood and rejected.

Files under shape/ are the normal CI contract. The checker lowers the committed global model before evaluating rules, so rules see the architecture exactly as the repo declares it.

flowchart TD
A["global declarations"] --> C["effective model"]
C --> D["facts"]
D --> E["rules"]

A global model update can add, modify, or remove functions and declarations:

module audit
component AuditStore {
fn purgeOldEvents
source ts("src/audit/purge.ts#purgeOldEvents")
effects complete {
HardDelete<AuditEvent>
evidence ts("src/audit/purge.ts:12-16")
}
}

Coverage, memory guard, grant, and final-forbid checks all evaluate the same effective model.

The semantic checker does not want every rule to walk the raw AST. Instead it lowers declarations into small records such as resources, traits, grants, function effects, implementation paths, shape traits, context objects, and rule declarations.

For the AuditStore example above, lowering produces facts conceptually like:

resource AuditEvent
resource_trait AuditEvent AppendOnly
component AuditStore
owns AuditStore AuditEvent
grants AuditStore HardDelete<AuditEvent>
function AuditStore.purgeOldEvents
effect AuditStore.purgeOldEvents HardDelete<AuditEvent>

The real checker keeps more detail than this simplified view, including provenance for each fact. Provenance is the bridge between internal reasoning and useful diagnostics: every fact knows which source declaration created it.

Rules consume facts and internal indexes. They answer questions such as:

  • Does a function emit an effect its component does not grant?
  • Does a resource trait create a final forbid for an emitted effect?
  • Did a governed source file change without a matching Shape update or current attestation?
  • Did a function marked RefactorSensitive receive the required memory?
  • Did a guarded target change without a matching reevaluation?
  • Did a hypercycle rule find a forbidden hypercycle in the directed hypergraph, and what relations and vertex path prove it?

The important detail is that these checks are deterministic comparisons over a lowered model. The checker can be conservative because it is not guessing from code. If a function has effects unknown, the model says uncertainty remains. If a function has effects complete, the author is claiming every material effect is represented.

A good diagnostic should explain the rejection as a causal trail. For a final-forbid failure, the reviewer needs to see more than “effect rejected.” They need the function effect, target resource, trait on the resource, and specific final forbid.

flowchart TD
A["function emitted HardDelete on AuditEvent"] --> B["target resource AuditEvent"]
B --> C["AuditEvent has trait AppendOnly"]
C --> D["AppendOnly forbids final HardDelete"]
D --> E["diagnostic: final forbidden effect"]

That causal trail is the product surface. Shape is meant to make architecture review explicit, so an error should help a reviewer decide whether to change the code, change the claim, add missing context, or accept that the model is protecting a real invariant.

When extending the checker, these boundaries keep the design predictable:

  • Grammar changes belong in packages/shp-checker/src/language/shape.langium.
  • Normalization from declarations to records belongs in lowering.
  • Project coherence checks belong in semantic rule functions.
  • Human-facing explanation belongs in diagnostic formatting, explain, graph, memory, and obligations output.
  • Analyzer hints remain advisory unless a human turns them into reviewed Shape claims.

The result is a small but teachable architecture: parse the language, build the effective model, lower it into facts, evaluate coherence, then explain any rejection.