This guide describes how to transcribe a PDF document (book or paper) into a hierarchical modular directory tree of markdown files. Follow each step in order.
The pipeline produces:
- Split PDFs - one per top-level group (chapter/section), extracted with
qpdf - Transcript files - page-level markdown files with YAML frontmatter, named by page number