Combining Algorithms

Tutorial

Combining Algorithms

Layer centrality, community, and structural measures for richer insights

25 min Advanced

AlgorithmsCombinedAnalysis

What You'll Learn

Multi-Algorithm Pipeline - Run PageRank, Louvain, and degree centrality in sequence
Unified Query - Combine all results into a single DataFrame
Cross-Algorithm Insight - Identify influential members within communities

Prerequisites

Complete 02 Centrality and 03 Community Detection first.

Setup

# Cell 1 — Parameters
USERNAME = "_FILL_ME_IN_"  # Set your email before running

# Cell 2 — Connect
from graph_olap import GraphOLAPClient
client = GraphOLAPClient(username=USERNAME)

# Cell 3 — Provision
from notebook_setup import provision
personas, conn = provision(USERNAME)
analyst = personas["analyst"]
admin = personas["admin"]
ops = personas["ops"]
client = analyst

print(f"Connected | {conn.query_scalar('MATCH (n) RETURN count(n)')} nodes")

Run Multiple Algorithms

Build a multi-dimensional node profile with three algorithms

No single algorithm tells the whole story. PageRank identifies globally important nodes, Louvain groups nodes into communities, and degree centrality counts direct connections. Running all three and querying the results together reveals which customers are influential within their community — a pattern that matters for AML case prioritisation.

# 1. PageRank — global importance
pr = conn.algo.pagerank(
    node_label="Customer",
    property_name="pr",
    edge_type="SHARES_ACCOUNT",
)
print(f"PageRank:          {pr.status} ({pr.nodes_updated} nodes)")

# 2. Louvain — community assignment
lv = conn.algo.louvain(
    node_label="Customer",
    property_name="community",
    edge_type="SHARES_ACCOUNT",
)
print(f"Louvain:           {lv.status} ({lv.nodes_updated} nodes)")

# 3. Degree centrality — normalised connection count
dc = conn.networkx.degree_centrality(
    node_label="Customer",
    property_name="dc",
)
print(f"Degree Centrality: {dc.status}")

print("\nAll three algorithms stored results as node properties.")

Unified Results Table

Query all algorithm results in a single DataFrame

# Query all algorithm results together
df = conn.query_df("""
    MATCH (c:Customer)
    RETURN c.id AS name,
           round(c.pr, 3) AS pagerank,
           c.community AS community,
           round(c.dc, 2) AS degree
    ORDER BY c.pr DESC
""")
df

Cross-Algorithm Insights

Identify influential members within each community

The real power of combining algorithms emerges when you layer the results:

Community + PageRank → Who is the most influential person in each group?
Community + Degree → Who has the most connections within their cluster?
PageRank + Degree → Do well-connected nodes always rank highest?

In our small test graph, all customers are in one community and the hub nodes (LAU, KWONG) dominate both PageRank and degree. In production graphs with thousands of customers and multiple communities, this approach pinpoints the key individuals to review first.

# Find the leader of each community (highest PageRank within group)
# Query all customers with their algorithm results
df = conn.query_df("""
    MATCH (c:Customer)
    RETURN c.community AS comm, c.id AS name,
           c.pr AS pr, c.dc AS dc
    ORDER BY c.pr DESC
""")

# Group by community in Python (avoids Cypher list function limitations)
communities = {}
for row in df.iter_rows(named=True):
    comm = row["comm"]
    if comm not in communities:
        communities[comm] = []
    communities[comm].append(row)

print("Cross-algorithm insights:")
for comm, members in sorted(communities.items()):
    leader = members[0]  # Already sorted by PR desc
    most_connected = max(members, key=lambda m: m["dc"])
    print(f"\nCommunity {comm} ({len(members)} members):")
    print(f"  Leader (highest PageRank): {leader['name']} ({leader['pr']:.3f})")
    print(f"  Most connected (highest degree): {most_connected['name']} ({most_connected['dc']:.2f})")

df

Key Takeaways

Run multiple algorithms in sequence — results accumulate as node properties
A single Cypher query can fetch all algorithm outputs into one DataFrame
Combining centrality with community detection reveals leaders within clusters
In production, multi-algorithm profiles help prioritise AML investigations
Divergence between measures (e.g. high PageRank but low degree) flags unusual node roles