Blog
Apr 10, 2026 - 10 MIN READ
Developing a GraphRAG Research Chatbot with Neo4j

Developing a GraphRAG Research Chatbot with Neo4j

How I built a GraphRAG-style research assistant that combines Neo4j graph retrieval, vector search, and grounded LLM answers with citations.

Peter Mangoro

Peter Mangoro

This project started with a simple question: how do we make an LLM answer from a research corpus without hallucinating? The answer was to combine graph structure, semantic retrieval, and strict evidence grounding.

Team Context

This work was built collaboratively. Team members included:

  • Peter Mangoro
  • Bekithemba Nkomo
  • Masheia Dzimba
  • Tafadzwa
  • Sharman

Project Focus

The chatbot is designed for Q&A over a PDF corpus using a GraphRAG pattern:

  • papers and chunks are stored in Neo4j
  • chunks are embedded for vector retrieval
  • graph relationships support structured filtering (gene, author, topic paths)
  • LLM answers are generated from retrieved evidence with citations

What I Built

I implemented and documented an end-to-end workflow across ingestion and serving:

  • extraction pipeline (PDF -> structured JSON)
  • graph schema and ingestion into Neo4j
  • embedding generation and vector indexing
  • retrieval routing (semantic, gene-focused, author-focused, aggregates)
  • streaming chat outputs with trace/progress support in UI

I also maintained architecture and operations guides to make the system reproducible for others.

Key Findings

  • Graph + vector hybrid retrieval is much stronger than either strategy alone for research Q&A.
  • Routing matters: classifying query intent before retrieval avoids noisy context.
  • Grounded citations significantly improve trust in responses.
  • Operational clarity (envs, run order, schema contracts) is as important as model quality for team maintainability.

Lessons Learned

The biggest lesson was that building a useful chatbot is mostly systems engineering, not just prompt writing. Data contracts, ingestion repeatability, retrieval quality, and traceability determine whether answers stay reliable over time.

I also learned the value of documenting architecture in plain language so both technical and non-technical collaborators can contribute effectively.

Skills I Gained

  • GraphRAG architecture design
  • Neo4j schema design and graph ingestion pipelines
  • Retrieval orchestration (route-based + tool-based patterns)
  • Streaming response UX with retrieval traceability
  • Technical documentation for long-lived project maintenance
Built with Nuxt UI • © 2026