Centrality Algorithms
Centrality Algorithms
Identify influential nodes with PageRank, degree, betweenness, and eigenvector centrality
What You'll Learn
- PageRank - Find globally important nodes
- Degree Centrality - Measure direct connections
- Betweenness Centrality - Identify bridge nodes
- Comparison - Combine measures for a complete picture
# Cell 1 — ParametersUSERNAME = "_FILL_ME_IN_" # Set your email before running# Cell 2 — Connectfrom graph_olap import GraphOLAPClientclient = GraphOLAPClient(username=USERNAME)
# Cell 3 — Provisionfrom notebook_setup import provisionpersonas, conn = provision(USERNAME)analyst = personas["analyst"]admin = personas["admin"]ops = personas["ops"]client = analyst
print(f"Connected | {conn.query_scalar('MATCH (n) RETURN count(n)')} nodes")PageRank
Measure influence through link structure
PageRank assigns each node a score based on how many other nodes link to it and how important those linking nodes are. Originally designed for web pages, it works equally well on banking graphs: a customer who shares accounts with many highly-connected customers receives a higher PageRank score.
In our test graph, MR LAU XIAOMING and KWONG XIAO TONG each participate in 3 shared-account relationships (degree 3), so they receive the highest PageRank scores.
# Run PageRank on Customer nodes connected by SHARES_ACCOUNT edgesresult = conn.algo.pagerank( node_label="Customer", property_name="cent_pr", edge_type="SHARES_ACCOUNT",)print(f"PageRank {result.status} \u2014 {result.nodes_updated} nodes scored")# View PageRank results ordered by scoredf = conn.query_df(""" MATCH (c:Customer) RETURN c.id AS name, round(c.cent_pr, 4) AS pagerank ORDER BY c.cent_pr DESC""")dfDegree Centrality
Normalised connection count
Degree centrality is the simplest centrality measure: it counts a node’s connections and normalises by the maximum possible connections (N-1). A score of 1.0 means the node is connected to every other node in the graph.
This uses the NetworkX interface (conn.networkx) because degree centrality is not
one of the built-in native algorithms.
# Degree centrality via NetworkXresult = conn.networkx.degree_centrality( node_label="Customer", property_name="cent_dc",)print(f"Degree centrality {result.status} — {result.nodes_updated} nodes scored")
df = conn.query_df(""" MATCH (c:Customer) RETURN c.id AS name, round(c.cent_dc, 4) AS degree_centrality ORDER BY c.cent_dc DESC""")dfBetweenness Centrality
Find nodes that bridge shortest paths
Betweenness centrality measures how often a node lies on the shortest path between other pairs of nodes. High-betweenness nodes act as bridges or brokers: removing them would disconnect parts of the network.
In anti-money-laundering, a high-betweenness customer may be the single link between two otherwise separate account clusters — a pattern worth investigating.
# Betweenness centrality via NetworkXresult = conn.networkx.betweenness_centrality( node_label="Customer", property_name="cent_bc",)print(f"Betweenness centrality {result.status} — {result.nodes_updated} nodes scored")
df = conn.query_df(""" MATCH (c:Customer) RETURN c.id AS name, round(c.cent_bc, 4) AS betweenness ORDER BY c.cent_bc DESC""")dfMR LAU XIAOMING and KWONG XIAO TONG have non-zero betweenness because some shortest paths between the other three customers pass through them. The remaining customers have betweenness of 0 — they do not serve as bridges.
Comparing Centrality Measures
Side-by-side view of all three metrics
Different centrality measures answer different questions. Comparing them side by side reveals which customers are important and why:
- PageRank — Who is important because of who they are connected to?
- Degree — Who has the most connections?
- Betweenness — Who bridges otherwise separate parts of the network?
# Compare all three centrality measures in one tabledf = conn.query_df(""" MATCH (c:Customer) RETURN c.id AS name, round(c.cent_pr, 4) AS pagerank, round(c.cent_dc, 4) AS degree, round(c.cent_bc, 4) AS betweenness ORDER BY c.cent_pr DESC""")dfIn this small test graph the rankings are consistent across all three measures. In larger production graphs, you will often see customers who rank high on one measure but low on another — for example, a customer with few direct connections (low degree) who nonetheless bridges two large clusters (high betweenness).
Key Takeaways
- PageRank measures influence through iterative link-based scoring (
conn.algo.pagerank) - Degree centrality counts normalised connections (
conn.networkx.degree_centrality) - Betweenness centrality finds bridge nodes on shortest paths (
conn.networkx.betweenness_centrality) - Combining multiple centrality measures gives a richer picture of node importance