ADR-0017: Pre-commit hook implementation

Status: Accepted Date: 2026-05-19

Context

ADR-0009 specified that a pre-commit hook should enforce the “AI hard rules” from CONTEXT.md so that drift becomes structurally impossible — failed hook = failed commit = AI must fix and retry. After 2 successful pilot batches, the patterns to enforce have stabilized enough to encode them.

Decision

Implement scripts/pre-commit-hook.py — a standard-library-only Python script (no external dependencies) installed as a git pre-commit hook via symlink: .git/hooks/pre-commit → ../../scripts/pre-commit-hook.py.

The hook reads staged .md files (git diff --cached --name-only --diff-filter=ACM) and applies type-specific checks based on the type: field in frontmatter.

Checks implemented (numbered to match CONTEXT.md “AI hard rules”)

#RuleImplementation
1Tag must be in _meta/tags.md registryParse tag list from registry; verify every tags: entry in staged file is present
2Atomic body ≤ 300 wordsStrip markdown formatting + wikilinks + footnote refs, count space-separated tokens
3Threads have ## Counter-argument AND ## Response sectionsSubstring presence check
4Counter has cited critic OR “source not located” disclaimerRegex for wikilinks [[…]], inline (Author Year citation, or the literal disclaimer string
5Resource wikilinks (containing /) resolve to existing filesPath lookup under resources/
7Person/Glossary ## Referenced by matches actual vault wikilink graphScan INDEX_FOLDERS for files containing [[<slug>]], compare with parsed list; report missing or extra entries

Checks NOT implemented

  • Rule 6 (no commit outside active staging batch) — workflow discipline, not commit-time check. Hook has no concept of “active batch.”
  • Frontmatter schema validation (required fields per type) — could add later; current cost/benefit suggests waiting until a missing-field bug actually causes harm.
  • Citation grammar shape per class (rule 5 strong form) — current implementation only checks that resource paths resolve; it does not verify that the anchor format matches the per-class grammar in CONTEXT.md (e.g., #3:16 for Bible vs #section-heading for BR). Defer until a real grammar drift bug appears.

Failure mode

On any error: prints failures to stderr with prefix, prints total error count, exits 1 — git aborts the commit. AI must fix and re-stage.

On success: prints ✓ pre-commit checks passed (N markdown file(s)), exits 0.

Install

chmod +x scripts/pre-commit-hook.py
ln -sf ../../scripts/pre-commit-hook.py .git/hooks/pre-commit

(Symlink makes the hook portable: clone the repo, run the install command once, and the hook is live.)

Manual invocation

python3 scripts/pre-commit-hook.py          # validates staged set (what's about to commit)
python3 scripts/pre-commit-hook.py --all    # validates every tracked .md file in the vault

--all is useful for retroactive sweeps — after schema changes, after manual edits, or periodically as a sanity check that nothing has drifted between commits.

Alternatives considered

  • pre-commit framework (the popular Python hook manager): rejected — adds a dependency and a .pre-commit-config.yaml. Single Python file with stdlib only is simpler and matches ADR-0009’s “small Python script (~50 lines)” intent. (Final implementation is ~180 lines; the larger size buys precise error messages and the full rule 7 graph check.)
  • PyYAML for frontmatter parsing: rejected — we control the schema; a small regex parser handles our subset deterministically and adds zero dependencies.
  • Subprocess + grep for the wikilink graph (rule 7): rejected in favor of pure-Python pathlib.glob + compiled regex; portable across shells, no piping fragility.
  • Strict citation-grammar enforcement (rule 5 full form): deferred — current resolves-to-file check catches the most common failure mode (broken or typo’d citations). Stronger grammar matching is a future enhancement.

Consequences

  • (+) AI cannot land a commit with drift in any of the enforced rules. The previously-manual “verify all ## Referenced by sections match” step is now automatic on every commit.
  • (+) Future batches stop needing the manual grep-verification step at finalization (saves a few minutes per batch, eliminates a class of human-error mistakes).
  • (+) New contributors / future Claude sessions inherit the rules mechanically — the hook is the single source of truth for what “drift” means operationally.
  • (−) Test/dev workflows that intentionally stage incomplete work get blocked. Workaround: git commit --no-verify exists but per global instructions should only be used with explicit user permission — fine, since the hook exists precisely to prevent the kind of commits that bypass should never happen.
  • (−) ~180 lines of script to maintain. Mitigated by stdlib-only design and comprehensive comments.