Developing a GraphRAG Research Chatbot with Neo4j
How I built a GraphRAG-style research assistant that combines Neo4j graph retrieval, vector search, and grounded LLM answers with citations.
Peter Mangoro
This project started with a simple question: how do we make an LLM answer from a research corpus without hallucinating? The answer was to combine graph structure, semantic retrieval, and strict evidence grounding.
Team Context
This work was built collaboratively. Team members included:
- Peter Mangoro
- Bekithemba Nkomo
- Masheia Dzimba
- Tafadzwa
- Sharman
Project Focus
The chatbot is designed for Q&A over a PDF corpus using a GraphRAG pattern:
- papers and chunks are stored in Neo4j
- chunks are embedded for vector retrieval
- graph relationships support structured filtering (gene, author, topic paths)
- LLM answers are generated from retrieved evidence with citations
What I Built
I implemented and documented an end-to-end workflow across ingestion and serving:
- extraction pipeline (PDF -> structured JSON)
- graph schema and ingestion into Neo4j
- embedding generation and vector indexing
- retrieval routing (semantic, gene-focused, author-focused, aggregates)
- streaming chat outputs with trace/progress support in UI
I also maintained architecture and operations guides to make the system reproducible for others.
Key Findings
- Graph + vector hybrid retrieval is much stronger than either strategy alone for research Q&A.
- Routing matters: classifying query intent before retrieval avoids noisy context.
- Grounded citations significantly improve trust in responses.
- Operational clarity (envs, run order, schema contracts) is as important as model quality for team maintainability.
Lessons Learned
The biggest lesson was that building a useful chatbot is mostly systems engineering, not just prompt writing. Data contracts, ingestion repeatability, retrieval quality, and traceability determine whether answers stay reliable over time.
I also learned the value of documenting architecture in plain language so both technical and non-technical collaborators can contribute effectively.
Skills I Gained
- GraphRAG architecture design
- Neo4j schema design and graph ingestion pipelines
- Retrieval orchestration (route-based + tool-based patterns)
- Streaming response UX with retrieval traceability
- Technical documentation for long-lived project maintenance
Designing and Building a Neo4j Knowledge Graph from Relational Data
How I modeled a Chinook-style music dataset as a property graph, loaded it in the right dependency order, and validated it with Cypher queries.
From Code to Insights: My Journey from Software Development to Data Analytics
How my background in software development shaped my approach to data analytics, the projects that defined my transition, and the lessons learned along the way.