Analyzing a Healthcare Knowledge Graph with Cypher and Graph Data Science
How I explored a FAERS-style healthcare graph, moved from Cypher EDA to GDS workflows, and turned graph results into practical analytical insights.
Peter Mangoro
This assignment felt like a real analytics engagement: incomplete documentation, a complex healthcare graph, and the need to make method choices that I could justify.
Assignment Focus
I worked with a FAERS-inspired healthcare dataset restored from a Neo4j dump, then analyzed it in three passes:
- Cypher-based schema and data exploration
- Deeper analytical querying and aggregation
- GDS workflows for structural and similarity analysis
I used Python + Neo4j driver + pandas to keep experiments iterative and reproducible.
What I Built
- A practical EDA workflow for discovering labels, relationship patterns, and property quality
- Analytical Cypher queries for non-trivial domain questions
- GDS projections and algorithm runs aligned to assignment questions (instead of running algorithms blindly)
I documented not just outputs, but why each query/algorithm choice matched the analytical goal.
Key Findings
- Schema discovery first saved time later. Without that step, I would have built analyses on false assumptions.
- Cypher before GDS produced better algorithm decisions because I understood graph semantics first.
- Projection design is the real GDS work: node sets, relationship directions, and weighting determine result quality.
- Driver-based iteration in Python made query refinement much faster than manual-only browser experimentation.
Lessons Learned
This project taught me to treat graph analytics as a staged process: understand the graph, validate assumptions, then apply algorithms with purpose. I also learned how quickly “interesting but wrong” results can appear when projections are underspecified.
Skills I Gained
- Cypher EDA for unfamiliar production-like graph datasets
- GDS projection design and algorithm execution
- Python-driven graph analysis workflow with Neo4j driver + pandas
- Query/result interpretation with explicit methodological tradeoffs
Artifacts
- Notebook: P_Mangoro_C2_assn.ipynb