On this tutorial, we’ll present the way to create a Information Graph from an unstructured doc utilizing an LLM. Whereas conventional NLP strategies have been used for extracting entities and relationships, Giant Language Fashions (LLMs) like GPT-4o-mini make this course of extra correct and context-aware. LLMs are particularly helpful when working with messy, unstructured information. Utilizing Python, Mirascope, and OpenAI’s GPT-4o-mini, we’ll construct a easy data graph from a pattern medical log.
Putting in the dependencies
!pip set up "mirascope[openai]" matplotlib networkx
OpenAI API Key
To get an OpenAI API key, go to https://platform.openai.com/settings/group/api-keys and generate a brand new key. For those who’re a brand new person, you could want so as to add billing particulars and make a minimal cost of $5 to activate API entry. Try the complete Codes right here.
import os
from getpass import getpass
os.environ["OPENAI_API_KEY"] = getpass('Enter OpenAI API Key: ')
Defining Graph Schema
Earlier than we extract data, we’d like a construction to characterize it. On this step, we outline a easy schema for our Information Graph utilizing Pydantic. The schema contains:
- Node: Represents an entity with an ID, a kind (similar to “Physician” or “Treatment”), and elective properties.
- Edge: Represents a relationship between two nodes.
- KnowledgeGraph: A container for all nodes and edges.
Try the complete Codes right here.
from pydantic import BaseModel, Subject
class Edge(BaseModel):
supply: str
goal: str
relationship: str
class Node(BaseModel):
id: str
sort: str
properties: dict | None = None
class KnowledgeGraph(BaseModel):
nodes: listing[Node]
edges: listing[Edge]
Defining the Affected person Log
Now that we’ve got a schema, let’s outline the unstructured information we’ll use to generate our Information Graph. Beneath is a pattern affected person log, written in pure language. It accommodates key occasions, signs, and observations associated to a affected person named Mary. Try the complete Codes right here.
patient_log = """
Mary referred to as for assist at 3:45 AM, reporting that she had fallen whereas going to the toilet. This marks the second fall incident inside per week. She complained of dizziness earlier than the autumn.
Earlier within the day, Mary was noticed wandering the hallway and appeared confused when requested primary questions. She was unable to recall the names of her drugs and requested the identical query a number of occasions.
Mary skipped each lunch and dinner, stating she did not really feel hungry. When the nurse checked her room within the night, Mary was mendacity in mattress with gentle bruising on her left arm and complained of hip ache.
Important indicators taken at 9:00 PM confirmed barely elevated blood stress and a low-grade fever (99.8°F). Nurse additionally famous elevated forgetfulness and attainable indicators of dehydration.
This habits is just like earlier episodes reported final month.
"""
Producing the Information Graph
To rework unstructured affected person logs into structured insights, we use an LLM-powered perform that extracts a Information Graph. Every affected person entry is analyzed to determine entities (like folks, signs, occasions) and their relationships (similar to “reported”, “has symptom”).
The generate_kg perform is adorned with @openai.name, leveraging the GPT-4o-mini mannequin and the beforehand outlined KnowledgeGraph schema. The immediate clearly instructs the mannequin on the way to map the log into nodes and edges. Try the complete Codes right here.
from mirascope.core import openai, prompt_template
@openai.name(mannequin="gpt-4o-mini", response_model=KnowledgeGraph)
@prompt_template(
"""
SYSTEM:
Extract a data graph from this affected person log.
Use Nodes to characterize folks, signs, occasions, and observations.
Use Edges to characterize relationships like "has symptom", "reported", "famous", and many others.
The log:
{log_text}
Instance:
Mary stated assist, I've fallen.
Node(id="Mary", sort="Affected person", properties={{}})
Node(id="Fall Incident 1", sort="Occasion", properties={{"time": "3:45 AM"}})
Edge(supply="Mary", goal="Fall Incident 1", relationship="reported")
"""
)
def generate_kg(log_text: str) -> openai.OpenAIDynamicConfig:
return {"log_text": log_text}
kg = generate_kg(patient_log)
print(kg)
Querying the graph
As soon as the KnowledgeGraph has been generated from the unstructured affected person log, we will use it to reply medical or behavioral queries. We outline a perform run() that takes a pure language query and the structured graph, and passes them right into a immediate for the LLM to interpret and reply. Try the complete Codes right here.
@openai.name(mannequin="gpt-4o-mini")
@prompt_template(
"""
SYSTEM:
Use the data graph to reply the person's query.
Graph:
{knowledge_graph}
USER:
{query}
"""
)
def run(query: str, knowledge_graph: KnowledgeGraph): ...
query = "What well being dangers or issues does Mary exhibit primarily based on her current habits and vitals?"
print(run(query, kg))
Visualizing the Graph
Eventually, we use render_graph(kg) to generate a transparent and interactive visible illustration of the data graph, serving to us higher perceive the affected person’s situation and the connections between noticed signs, behaviors, and medical issues.
import matplotlib.pyplot as plt
import networkx as nx
def render_graph(kg: KnowledgeGraph):
G = nx.DiGraph()
for node in kg.nodes:
G.add_node(node.id, label=node.sort, **(node.properties or {}))
for edge in kg.edges:
G.add_edge(edge.supply, edge.goal, label=edge.relationship)
plt.determine(figsize=(15, 10))
pos = nx.spring_layout(G)
nx.draw_networkx_nodes(G, pos, node_size=2000, node_color="lightgreen")
nx.draw_networkx_edges(G, pos, arrowstyle="->", arrowsize=20)
nx.draw_networkx_labels(G, pos, font_size=12, font_weight="daring")
edge_labels = nx.get_edge_attributes(G, "label")
nx.draw_networkx_edge_labels(G, pos, edge_labels=edge_labels, font_color="blue")
plt.title("Healthcare Information Graph", fontsize=15)
plt.present()
render_graph(kg)
Try the Codes. All credit score for this analysis goes to the researchers of this mission. Additionally, be happy to comply with us on Twitter and don’t neglect to affix our 100k+ ML SubReddit and Subscribe to our E-newsletter.

I’m a Civil Engineering Graduate (2022) from Jamia Millia Islamia, New Delhi, and I’ve a eager curiosity in Information Science, particularly Neural Networks and their utility in varied areas.

Elevate your perspective with NextTech Information, the place innovation meets perception.
Uncover the most recent breakthroughs, get unique updates, and join with a world community of future-focused thinkers.
Unlock tomorrow’s tendencies right now: learn extra, subscribe to our e-newsletter, and turn into a part of the NextTech group at NextTech-news.com

