Applications of GraphRAG Knowledge Graphs for Technical Service Documents

Tuesday, April 29, 4:05-4:25 p.m.
Room 236
Presenter: Christian D. Powell
Modality: Traditional Talk

Abstract

The performance and utility of Large Language Models (LLMs) have been advancing at unprecedented rates as increasingly larger and more complex models are continually released. While LLMs have made improvements on their knowledge summarization and reasoning abilities, their ability to converse on novel content not seen during training remains limited. To this end, Retrieval-Augmented Generation (RAG) techniques have been developed to find and pull in additional information to augment LLM conversational abilities. This has made RAG a popular technique for the development of content-specific chatbots which are able to converse over curated corpuses of technical documents. However, the performance of RAG applications are limited by the size of the context windows of the LLM being used and the ability to retrieve content relevant to the user’s query from the corpus. Due to these limitations, RAG often fails to be able to converse over global questions aimed at the entirety of the corpus. Microsoft’s GraphRAG project has been shown to improve performance on these global queries by generating knowledge graph representations of the corpus which can then be queried at various community levels to provide more global information. Additionally, having these knowledge graphs allows for analyzing and visualizing the corpus with graph theory-based methods. This project explores the application of GraphRAG and graph-theory methods to a corpus of technical printer service documents. We show that the GraphRAG and NetworkX Python3 packages can be used to generate knowledge graphs which can then be queried, analyzed, and visualized.

Bio

Image
Christian D. Powell

Christian D. Powell is a skilled Data Scientist currently working at Lexmark International, Inc. since June 2022. He holds a Master's in Data Science from the University of Kentucky. During his time there, he worked as Graduate Research Assistant in a Bioinformatics laboratory at the Markey Cancer Center, gaining experience in research and contributing to more than eight publications. He is currently enrolled in the Georgia Institute of Technology’s OMSCS program specializing in Machine Learning.

Program

Check out the Program page for the full program!

Questions About the Conference?

Check out our FAQ page for answers and contact information!