Agent Systems · R&D Demos

Markdown to Knowledge Graph Pipeline

Exploring documentation-to-graph transformation with Neo4j

Juan Iturbe
Markdown to Knowledge Graph Pipeline

Status

Research & Demo

This project is an experimental knowledge graph ingestion pipeline built to explore how documentation can be transformed into a structured, queryable graph.


Overview

This demo explores a common limitation of traditional documentation:

Markdown is easy to write, but hard to reason over programmatically.

The pipeline converts hierarchically structured markdown into a Neo4j knowledge graph, making concepts, sections, and references explicit and navigable.

The emphasis is on ontology design and semantic structure, not full-text search or static documentation rendering.


Core Idea

Instead of treating documentation as flat text, the system treats it as structured knowledge:

  • High-level topics become Concept nodes
  • Subsections become Chunk nodes
  • File references become explicit relationships
  • Hierarchy is preserved as graph structure

This enables graph-native queries such as:

  • “What concepts are related to this topic?”
  • “Which sections reference the same assets?”
  • “What documentation is impacted if this concept changes?”

Architecture Summary

The pipeline processes a directory of markdown files and builds a deterministic ontology in Neo4j.

High-level stages:

  1. Parse markdown structure
  2. Extract concepts and fragments
  3. Normalize content into nodes
  4. Create explicit relationships
  5. Persist graph structure

The pipeline is intentionally simple to make the data model the primary focus.


Data Model

Nodes

Concept

  • Represents a top-level topic
  • Derived from level-1 markdown headings
  • Acts as a semantic anchor

Chunk

  • Represents a subsection or idea
  • Derived from level-2 headings
  • Contains the actual explanatory content

Relationships

  • (:Concept)-[:HAS]->(:Chunk)
  • (:Chunk)-[:PART_OF]->(:Concept)

Bidirectional relationships make traversal and reasoning easier without relying on implicit hierarchy.


Main Characteristics

Documentation → Ontology Mapping

Shows how unstructured documentation can be mapped into a formal graph model with clear semantics.

Graph-Native Thinking

The system is designed around relationships first, not documents or tables.

Deterministic Ingestion

Given the same markdown structure, the pipeline produces the same graph structure every time.

Foundation for Graph-RAG

The resulting graph can be extended with:

  • Embeddings
  • Similarity relationships
  • Cross-document reasoning
  • Agent-driven traversal

This pipeline acts as a base layer for more advanced knowledge systems.


Technology Stack

  • Node.js / TypeScript — pipeline implementation
  • Neo4j — knowledge graph storage
  • Markdown — source format

No frameworks are hidden behind abstractions — the focus is on data modeling clarity.


Explore the Code