On this tutorial, we discover find out how to leverage the PyBEL ecosystem to assemble and analyze wealthy organic data graphs straight inside Google Colab. We start by putting in all obligatory packages, together with PyBEL, NetworkX, Matplotlib, Seaborn, and Pandas. We then reveal find out how to outline proteins, processes, and modifications utilizing the PyBEL DSL. From there, we information you thru the creation of an Alzheimer’s disease-related pathway, showcasing find out how to encode causal relationships, protein–protein interactions, and phosphorylation occasions. Alongside graph development, we introduce superior community analyses, together with centrality measures, node classification, and subgraph extraction, in addition to strategies for extracting quotation and proof knowledge. By the top of this part, you’ll have a completely annotated BEL graph prepared for downstream visualization and enrichment analyses, laying a strong basis for interactive organic data exploration.
!pip set up pybel pybel-tools networkx matplotlib seaborn pandas -q
import pybel
import pybel.dsl as dsl
from pybel import BELGraph
from pybel.io import to_pickle, from_pickle
import networkx as nx
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from collections import Counter
import warnings
warnings.filterwarnings('ignore')
print("PyBEL Superior Tutorial: Organic Expression Language Ecosystem")
print("=" * 65)
We start by putting in PyBEL and its dependencies straight in Colab, making certain that each one obligatory libraries, NetworkX, Matplotlib, Seaborn, and Pandas, can be found for our evaluation. As soon as put in, we import the core modules and suppress warnings to maintain our pocket book clear and targeted on the outcomes.
print("n1. Constructing a Organic Information Graph")
print("-" * 40)
graph = BELGraph(
identify="Alzheimer's Illness Pathway",
model="1.0.0",
description="Instance pathway displaying protein interactions in AD",
authors="PyBEL Tutorial"
)
app = dsl.Protein(identify="APP", namespace="HGNC")
abeta = dsl.Protein(identify="Abeta", namespace="CHEBI")
tau = dsl.Protein(identify="MAPT", namespace="HGNC")
gsk3b = dsl.Protein(identify="GSK3B", namespace="HGNC")
irritation = dsl.BiologicalProcess(identify="inflammatory response", namespace="GO")
apoptosis = dsl.BiologicalProcess(identify="apoptotic course of", namespace="GO")
graph.add_increases(app, abeta, quotation="PMID:12345678", proof="APP cleavage produces Abeta")
graph.add_increases(abeta, irritation, quotation="PMID:87654321", proof="Abeta triggers neuroinflammation")
tau_phosphorylated = dsl.Protein(identify="MAPT", namespace="HGNC",
variants=[dsl.ProteinModification("Ph")])
graph.add_increases(gsk3b, tau_phosphorylated, quotation="PMID:11111111", proof="GSK3B phosphorylates tau")
graph.add_increases(tau_phosphorylated, apoptosis, quotation="PMID:22222222", proof="Hyperphosphorylated tau causes cell demise")
graph.add_increases(irritation, apoptosis, quotation="PMID:33333333", proof="Irritation promotes apoptosis")
graph.add_association(abeta, tau, quotation="PMID:44444444", proof="Abeta and tau work together synergistically")
print(f"Created BEL graph with {graph.number_of_nodes()} nodes and {graph.number_of_edges()} edges")
We initialize a BELGraph with metadata for an Alzheimer’s illness pathway and outline proteins and processes utilizing the PyBEL DSL. By including causal relationships, protein modifications, and associations, we assemble a complete community that captures key molecular interactions.
print("n2. Superior Community Evaluation")
print("-" * 30)
degree_centrality = nx.degree_centrality(graph)
betweenness_centrality = nx.betweenness_centrality(graph)
closeness_centrality = nx.closeness_centrality(graph)
most_central = max(degree_centrality, key=degree_centrality.get)
print(f"Most linked node: {most_central}")
print(f"Diploma centrality: {degree_centrality[most_central]:.3f}")
We compute diploma, betweenness, and closeness centralities to quantify every node’s significance inside the graph. By figuring out essentially the most linked nodes, we acquire perception into potential hubs which will drive illness mechanisms.
print("n3. Organic Entity Classification")
print("-" * 35)
node_types = Counter()
for node in graph.nodes():
node_types[node.function] += 1
print("Node distribution:")
for func, depend in node_types.objects():
print(f" {func}: {depend}")
We classify every node by its perform, comparable to Protein or BiologicalProcess, and tally their counts. This breakdown helps us perceive the composition of our community at a look.
print("n4. Pathway Evaluation")
print("-" * 20)
proteins = [node for node in graph.nodes() if node.function == 'Protein']
processes = [node for node in graph.nodes() if node.function == 'BiologicalProcess']
print(f"Proteins in pathway: {len(proteins)}")
print(f"Organic processes: {len(processes)}")
edge_types = Counter()
for u, v, knowledge in graph.edges(knowledge=True):
edge_types[data.get('relation')] += 1
print("nRelationship sorts:")
for rel, depend in edge_types.objects():
print(f" {rel}: {depend}")
We separate all proteins and processes to measure the pathway’s scope and complexity. Counting the completely different relationship sorts additional reveals which interactions, like will increase or associations, dominate our mannequin.
print("n5. Literature Proof Evaluation")
print("-" * 32)
citations = []
evidences = []
for _, _, knowledge in graph.edges(knowledge=True):
if 'quotation' in knowledge:
citations.append(knowledge['citation'])
if 'proof' in knowledge:
evidences.append(knowledge['evidence'])
print(f"Whole citations: {len(citations)}")
print(f"Distinctive citations: {len(set(citations))}")
print(f"Proof statements: {len(evidences)}")
We extract quotation identifiers and proof strings from every edge to judge our graph’s grounding in revealed analysis. Summarizing whole and distinctive citations permits us to evaluate the breadth of supporting literature.
print("n6. Subgraph Evaluation")
print("-" * 22)
inflammation_nodes = [inflammation]
inflammation_neighbors = listing(graph.predecessors(irritation)) + listing(graph.successors(irritation))
inflammation_subgraph = graph.subgraph(inflammation_nodes + inflammation_neighbors)
print(f"Irritation subgraph: {inflammation_subgraph.number_of_nodes()} nodes, {inflammation_subgraph.number_of_edges()} edges")
We isolate the irritation subgraph by accumulating its direct neighbors, yielding a targeted view of inflammatory crosstalk. This focused subnetwork highlights how irritation interfaces with different illness processes.
print("n7. Superior Graph Querying")
print("-" * 28)
strive:
paths = listing(nx.all_simple_paths(graph, app, apoptosis, cutoff=3))
print(f"Paths from APP to apoptosis: {len(paths)}")
if paths:
print(f"Shortest path size: {len(paths[0])-1}")
besides nx.NetworkXNoPath:
print("No paths discovered between APP and apoptosis")
apoptosis_inducers = listing(graph.predecessors(apoptosis))
print(f"Elements that enhance apoptosis: {len(apoptosis_inducers)}")
We enumerate easy paths between APP and apoptosis to discover mechanistic routes and establish key intermediates. Itemizing all predecessors of apoptosis additionally reveals us which components might set off cell demise.
print("n8. Knowledge Export and Visualization")
print("-" * 35)
adj_matrix = nx.adjacency_matrix(graph)
node_labels = [str(node) for node in graph.nodes()]
plt.determine(figsize=(12, 8))
plt.subplot(2, 2, 1)
pos = nx.spring_layout(graph, okay=2, iterations=50)
nx.draw(graph, pos, with_labels=False, node_color="lightblue",
node_size=1000, font_size=8, font_weight="daring")
plt.title("BEL Community Graph")
plt.subplot(2, 2, 2)
centralities = listing(degree_centrality.values())
plt.hist(centralities, bins=10, alpha=0.7, coloration="inexperienced")
plt.title("Diploma Centrality Distribution")
plt.xlabel("Centrality")
plt.ylabel("Frequency")
plt.subplot(2, 2, 3)
features = listing(node_types.keys())
counts = listing(node_types.values())
plt.pie(counts, labels=features, autopct="%1.1f%%", startangle=90)
plt.title("Node Sort Distribution")
plt.subplot(2, 2, 4)
relations = listing(edge_types.keys())
rel_counts = listing(edge_types.values())
plt.bar(relations, rel_counts, coloration="orange", alpha=0.7)
plt.title("Relationship Sorts")
plt.xlabel("Relation")
plt.ylabel("Rely")
plt.xticks(rotation=45)
plt.tight_layout()
plt.present()
We put together adjacency matrices and node labels for downstream use and generate a multi-panel determine displaying the community construction, centrality distributions, node-type proportions, and edge-type counts. These visualizations deliver our BEL graph to life, supporting a deeper organic interpretation.
On this tutorial, now we have demonstrated the facility and suppleness of PyBEL for modeling advanced organic methods. We confirmed how simply one can assemble a curated white-box graph of Alzheimer’s illness interactions, carry out network-level analyses to establish key hub nodes, and extract biologically significant subgraphs for targeted examine. We additionally lined important practices for literature proof mining and ready knowledge buildings for compelling visualizations. As a subsequent step, we encourage you to increase this framework to your pathways, integrating extra omics knowledge, working enrichment checks, or coupling the graph with machine-learning workflows.
Try the Codes right here. All credit score for this analysis goes to the researchers of this challenge. Additionally, be at liberty to comply with us on Twitter and don’t neglect to hitch our 100k+ ML SubReddit and Subscribe to our E-newsletter.
Sana Hassan, a consulting intern at Marktechpost and dual-degree scholar at IIT Madras, is obsessed with making use of expertise and AI to deal with real-world challenges. With a eager curiosity in fixing sensible issues, he brings a recent perspective to the intersection of AI and real-life options.


