| name | kge-extraction |
|---|---|
| description | Extract temporally-grounded knowledge graph quads from sources. Maintains a markdown-based quad store, entity index, and append-only log. Distinguishes between a source's own arguments (ARG) and attributions (HISTOGPHY). Operates in a two-phase workflow: Extract (generate raw quads) and Lint (normalize entities and reconcile graphs). |
A pattern for building temporally-grounded knowledge graphs from scholarly and documentary sources, optimised for downstream knowledge graph embedding (KGE) training.
Most knowledge graph construction from text produces triples: (subject, predicate, object). Applied to scholarly or historical sources, this collapses a critical distinction. A monograph arguing that the Treaty of Westphalia established state sovereignty is doing something categorically different from the same monograph reporting that previous historians have made this argument. The first is a knowledge claim being advanced; the second is a claim being attributed. Embedding models trained on undifferentiated triples will represent both as equivalent facts which could create a fundamental error of provenance.
| """ | |
| groq_ocr.py | |
| Processes newspaper images using Groq's vision API and extracts individual | |
| articles to a CSV. Each row represents one article with associated page metadata. | |
| Usage: | |
| python newspaper_ocr.py --input_dir processed_output/images --output ocr_results.csv | |
| Requirements: |
| # AUFBAU | |
| ## RECONSTRUCTION | |
| **AN AMERICAN WEEKLY PUBLISHED IN NEW YORK** | |
| **by The New World Club, Inc., 209 West 48th Street, New York 19, N. Y. Phone: CIrcle 7-4462** | |
| *Entered as second-class matter January 20, 1934, at the Post Office New York, N. Y. under Act of March 3, 1879* | |
| **Vol. XV—No. 26 | NEW YORK, N. Y., FRIDAY, JULY 1, 1949 | Price 10¢** | |
| *** | |
| > **Zunächst in "Aufbau":** |
| **Role:** You are a precise archaeological document analyst specializing in the digitization of field notebooks and excavation catalogues. | |
| **Task:** | |
| 1. Perform a spatial analysis of the document to distinguish between text blocks, artifact photographs/sketches, and marginalia. | |
| 2. Extract metadata and create a brief 2-3 sentence overview of the document's contents. | |
| 3. Transcribe the document EXACTLY as written into a valid YAML structure. | |
| 4. Extract archaeological entities into specific categories based only on explicit mentions. | |
| **Critical Rules:** | |
| - **Zero Hallucination:** Only include information directly visible in the image. If a word is illegible, mark it as `[illegible]`. |
| index | fruit | quantity | color | store | |
|---|---|---|---|---|---|
| 0 | apple | 12 | red | loblaws | |
| 1 | banana | 18 | yellow | farm boy | |
| 2 | grape | 30 | purple | freshco | |
| 3 | cherry | 4 | red | iga | |
| 4 | watermelon | 2 | green | farm boy | |
| 5 | raspberries | 23 | red | iga |
| import mesa | |
| class LetterAgent(mesa.Agent): | |
| def __init__(self, model): | |
| super().__init__(model) | |
| self.letters_sent = 0 | |
| self.letters_received = 0 | |
| def step(self): | |
| print(f"Hi, I am agent {self.unique_id}.") |