Markdown to Knowledge Graph Pipeline
Exploring documentation-to-graph transformation with Neo4j
An experimental pipeline that transforms structured markdown documentation into a Neo4j knowledge graph.
Status
Research & Demo
This project is an experimental knowledge graph ingestion pipeline built to explore how documentation can be transformed into a structured, queryable graph.
Overview
This demo explores a common limitation of traditional documentation:
Markdown is easy to write, but hard to reason over programmatically.
The pipeline converts hierarchically structured markdown into a Neo4j knowledge graph, making concepts, sections, and references explicit and navigable.
The emphasis is on ontology design and semantic structure, not full-text search or static documentation rendering.
Core Idea
Instead of treating documentation as flat text, the system treats it as structured knowledge:
- High-level topics become Concept nodes
- Subsections become Chunk nodes
- File references become explicit relationships
- Hierarchy is preserved as graph structure
This enables graph-native queries such as:
- “What concepts are related to this topic?”
- “Which sections reference the same assets?”
- “What documentation is impacted if this concept changes?”
Architecture Summary
The pipeline processes a directory of markdown files and builds a deterministic ontology in Neo4j.
High-level stages:
- Parse markdown structure
- Extract concepts and fragments
- Normalize content into nodes
- Create explicit relationships
- Persist graph structure
The pipeline is intentionally simple to make the data model the primary focus.
Data Model
Nodes
Concept
- Represents a top-level topic
- Derived from level-1 markdown headings
- Acts as a semantic anchor
Chunk
- Represents a subsection or idea
- Derived from level-2 headings
- Contains the actual explanatory content
Relationships
(:Concept)-[:HAS]->(:Chunk)(:Chunk)-[:PART_OF]->(:Concept)
Bidirectional relationships make traversal and reasoning easier without relying on implicit hierarchy.
Main Characteristics
Documentation → Ontology Mapping
Shows how unstructured documentation can be mapped into a formal graph model with clear semantics.
Graph-Native Thinking
The system is designed around relationships first, not documents or tables.
Deterministic Ingestion
Given the same markdown structure, the pipeline produces the same graph structure every time.
Foundation for Graph-RAG
The resulting graph can be extended with:
- Embeddings
- Similarity relationships
- Cross-document reasoning
- Agent-driven traversal
This pipeline acts as a base layer for more advanced knowledge systems.
Technology Stack
- Node.js / TypeScript — pipeline implementation
- Neo4j — knowledge graph storage
- Markdown — source format
No frameworks are hidden behind abstractions — the focus is on data modeling clarity.
Explore the Code
-
Areas of interest:
- Markdown parsing logic
- Ontology definition
- Neo4j write patterns
- Relationship modeling decisions
An experimental pipeline that transforms structured markdown documentation into a Neo4j knowledge graph.