Batch ws-notes — REVIEW
Content-cleanup variant per ADR-0009. Phase A of the WS-II bulk-processing handoff — produce the full back-matter notes file so that the splitter’s --notes-file argument resolves real footnote text (not TBD placeholders) for every chapter going forward, and the pilot’s staging/ws-notes-ch1/ partial can be retired.
What changed
- New (1 file):
staging/ws-notes/resources/WorldScripture/ws-notes.md— 707 footnote definitions covering Preface (2), Invocation (6), and Chapters 1–22, in standard[^N]:markdown footnote syntax, grouped under## Preface,## Invocation, and## Chapter NH2 headings. Replaces the Chapter 1-only partial atstaging/ws-notes-ch1/. - New script:
scripts/clean_ws_notes.py— deterministic stage-2.5 cleaner (Marker raw notes.md → grouped + ordered footnote file). Will be reused if WS-II notes are re-extracted, or if a sibling anthology’s notes file lands in the same shape. - New (1 file):
staging/ws-notes/flag-log.txt— every parser flag emitted during the run, mirrored from stderr.
Per-section note counts
| Section | Count | Notes |
|---|---|---|
| Preface | 2 | |
| Invocation | 6 | |
| Chapter 1 | 39 | Matches the partial staging/ws-notes-ch1/ already in flight. |
| Chapter 2 | 23 | |
| Chapter 3 | 15 | |
| Chapter 4 | 16 | |
| Chapter 5 | 37 | |
| Chapter 6 | 34 | Notes 29–34 misordered into Ch7 area by Marker (two-column print bleed); script reassigned by numeric continuation. |
| Chapter 7 | 48 | Notes 47–48 misordered into Ch8 area; reassigned. |
| Chapter 8 | 40 | |
| Chapter 9 | 48 | |
| Chapter 10 | 27 | |
| Chapter 11 | 21 | |
| Chapter 12 | 28 | |
| Chapter 13 | 33 | |
| Chapter 14 | 27 | Note 25 was emitted by Marker as a #### 25. **James 1.22-24** heading instead of a list item; script stripped the heading prefix and recorded it normally. |
| Chapter 15 | 23 | |
| Chapter 16 | 66 | |
| Chapter 17 | 7 | |
| Chapter 18 | 26 | |
| Chapter 19 | 28 | Notes 22–28 misordered into Ch20 area; reassigned. |
| Chapter 20 | 63 | |
| Chapter 21 | 26 | |
| Chapter 22 | 24 | |
| Total | 707 |
Cleanup decisions (applied deterministically by scripts/clean_ws_notes.py)
- Footnote syntax. Each bullet
- N. **citation:** textrewritten as[^N]: **citation:** text. - Chapter headings. Marker preserved only Preface, Invocation, Ch4, and Ch19. All other chapter starts are detected via numeric restart (
- 1.after a higher number) and emitted as## Chapter N. - Misorder reassignment. A note whose number is not
prior_in_section + 1is treated as misordered if exactly one prior chapter’s tail ended atnumber − 1; when multiple chapters end at the same number, the highest chapter number wins (most-recently-ended chapter is overwhelmingly the source of a column bleed). All seven reassigned spans flagged. - Continuation paragraphs. Lines without a
- N.prefix (either bare paragraphs after a page break, or-bullets without a number) are merged into the preceding note’s body. Crucially, the anchor only advances when the just-recorded note was NOT reassigned — otherwise a misordered run would shadow the raw section’s real last note (the column’s true tail). This was caught in pilot review when ch6 note 34 was originally absorbing the continuation of ch7 note 4. N**.artifact (9**. ...,10**. ...in raw): script re-emits the leading**so the citation’s opening bold marker is preserved.- Word-break artifacts. Joined halves of PDF line-break hyphenations restored via a small word-fix dictionary in the script:
naturereligion → nature-religion,selfbegotten → self-begotten,bodyform → body-form,life-anddeath → life-and-death,birth-anddeath → birth-and-death,giveand-receive → give-and-receive. Mid-word hyphen splits with whitespace (nat- ural,treach- erous) collapsed by regex. - CJK spacing.
(眞 如)→(眞如)(width-space inside parens collapsed). - Frontmatter.
type: resource,class: WorldScripture, title “Notes”, book/author/publisher per the existingstaging/ws-notes-ch1/partial,ingested: <today>.
Flagged for human review
-
Marker’s
low-mindedat ch7 note 4 continuation: present in raw as already-correctlow-minded(no break); preserved as-is. Mentioning for completeness because it sits adjacent to the correctednature-religion/body-form/etc. and could look inconsistent. -
Multi-paragraph notes (preface note 2, ch6 note 3, ch6 note 9, ch6 note 16, ch7 note 4 [now fixed], ch12 note 28, ch13 note 31). The script preserves paragraph breaks with a blank line + 4-space indent (CommonMark footnote-continuation grammar). Spot-checked preface and ch6; render in Quartz preview to confirm.
-
Ch6 note 16 stray italics. Marker output:
Vatican II, *Guadium et* *Spes*(split italics). The script keeps as-is. Should be one italic run*Guadium et Spes*— flagging for manual fix. -
Ch7 note 17 quoted Kabbalistic doctrine includes nested italics inside Hebrew transliteration that Marker may have split. Worth eyeballing.
-
naturereligionand friends were fixed via an explicit dictionary in the script. If a future PDF re-extraction surfaces a different word break (e.g.cosmicreligion,Buddhanature) the dictionary will need extending. The script will silently miss it; only Quartz preview / pre-commit prose checks would catch. -
Marker line-numbers in raw notes.md are gitignored, so the flag-log’s
line N:references will go stale ifnotes.mdis re-extracted. The script is the source of truth; re-running it after a re-extract regenerates a fresh flag log. -
staging/ws-notes-ch1/can be deleted at finalize time. The pilot’s--notes-filearg points at it now; after this batch lands atresources/WorldScripture/ws-notes.mdthe splitter should point there instead. Deferred to Phase D pilot finalize so nothing breaks mid-flight.
Verification
# Total footnote-definition count: should be 707.
grep -cE '^\[\^[0-9]+\]:' staging/ws-notes/resources/WorldScripture/ws-notes.md
# All 22 chapter headings present.
grep -cE '^## Chapter [0-9]+$' staging/ws-notes/resources/WorldScripture/ws-notes.md # → 22
# Preface + Invocation headings present.
grep -cE '^## (Preface|Invocation)$' staging/ws-notes/resources/WorldScripture/ws-notes.md # → 2
# Per-chapter note counts match the table above.
for ch in $(seq 1 22); do
count=$(awk "/^## Chapter $ch\$/,/^## Chapter $((ch+1))\$/" \
staging/ws-notes/resources/WorldScripture/ws-notes.md | grep -cE '^\[\^')
echo "Ch$ch: $count"
done
# Re-run the cleaner to confirm reproducibility.
uv run scripts/clean_ws_notes.py \
--input resources-raw/WorldScripture/extracted/99-back-matter/notes.md \
--output /tmp/ws-notes-check.md \
--flag-log /tmp/ws-notes-flag-check.txt
diff staging/ws-notes/resources/WorldScripture/ws-notes.md /tmp/ws-notes-check.mdFinalize plan
When user approves this batch:
cp staging/ws-notes/REVIEW.md _meta/batch-reviews/ws-notes.mdmv staging/ws-notes/resources/WorldScripture/ws-notes.md resources/WorldScripture/ws-notes.md- The
staging/ws-notes/flag-log.txtis reproducible from the script + raw; do NOT commit. (It’s already only instaging/, which is gitignored at finalize.) - Delete
staging/ws-notes-ch1/once the pilot’s--notes-filereference has been switched to the final location (this happens during Phase D pilot finalize per the handoff). - Subsequent Phase B chapters (2–22) call
split_ws_chapter.pywith--notes-file resources/WorldScripture/ws-notes.md(the final path, since notes ship before chapters).
Out of scope (deferred)
- Ch6 note 16 split italics, ch7 note 17 nested italics — flagged above for human pass.
- Cross-reference link validation — many notes reference other chapters by name (e.g. “see Chapter 7: Reversal and Restoration”). The pre-commit hook validates wikilinks but not free-text chapter references; checking would require knowing each chapter’s title (deferred until all 22 chapters are staged so titles are derivable).
- Quartz preview — confirm footnote definitions render correctly under H2 chapter headings, and that the per-chapter file’s local
## Footnotessection can still reference them when split_ws_chapter.py wires up the per-sub-theme files.