Algorithm Concepts
Algorithm Concepts
Centrality, community detection, and pathfinding theory
What You'll Learn
- Centrality - Measuring node importance
- Community Detection - Finding clusters
- Pathfinding - Shortest paths and traversal
- When to Use Which - Algorithm selection
# Cell 1 — ParametersUSERNAME = "_FILL_ME_IN_" # Set your email before running# Cell 2 — Connectfrom graph_olap import GraphOLAPClientclient = GraphOLAPClient(username=USERNAME)
# Cell 3 — Provisionfrom notebook_setup import provisionpersonas, conn = provision(USERNAME)analyst = personas["analyst"]admin = personas["admin"]ops = personas["ops"]client = analyst
print(f"Connected | {conn.query_scalar('MATCH (n) RETURN count(n)')} nodes")Centrality algorithms measure how important or influential a node is within a graph. The most common centrality measures are:
- PageRank — ranks nodes by the number and quality of incoming links. A node is important if it is pointed to by other important nodes. Originally designed for web page ranking, it applies to any directed graph.
- Betweenness Centrality — measures how often a node lies on the shortest path between other pairs of nodes. High-betweenness nodes are “bridges” or “brokers”.
The pattern for running algorithms with graph_olap is always:
- Run the algorithm —
conn.algo.<algorithm>(...)computes scores and writes them as properties on the nodes. - Query the results — use a standard Cypher query to read the computed properties.
This two-step approach keeps algorithm execution separate from result analysis, and lets you re-query computed scores without re-running the algorithm.
# --- List available algorithms ---algos = conn.algo.algorithms()print("Available algorithms:")for algo in algos: print(f" - {algo}")# --- Step 1: Run PageRank ---# This writes a "pr_score" property onto every matched nodeconn.algo.pagerank(node_label="Person", property_name="pr_score")print("PageRank computed -- scores written to pr_score property.")# --- Step 2: Query PageRank results ---# The pr_score property is now available for querying like any other node propertydf = conn.query_df(""" MATCH (p:Person) WHERE p.pr_score IS NOT NULL RETURN p.name AS person, round(p.pr_score, 4) AS pagerank ORDER BY p.pr_score DESC LIMIT 10""")dfCommunity detection algorithms identify clusters of densely connected nodes. They assign each node to a community (group), usually represented by an integer ID.
Common algorithms:
- Louvain — optimizes modularity to find communities. Works well on large graphs and naturally discovers hierarchical community structure.
- Label Propagation — each node adopts the most common label among its neighbours. Very fast but non-deterministic (results may vary between runs).
After running community detection, you can:
- Query which community a node belongs to.
- Count the size of each community.
- Find nodes that bridge multiple communities.
# --- Run community detection (Louvain) ---conn.algo.louvain(node_label="Person", property_name="community_id")print("Louvain community detection computed -- IDs written to community_id property.")# --- Query community sizes ---df = conn.query_df(""" MATCH (p:Person) WHERE p.community_id IS NOT NULL RETURN p.community_id AS community, count(p) AS size ORDER BY size DESC""")df# --- Query members of each community ---df = conn.query_df(""" MATCH (p:Person) WHERE p.community_id IS NOT NULL RETURN p.community_id AS community, COLLECT(p.name) AS members ORDER BY size(COLLECT(p.name)) DESC""")dfPathfinding algorithms find routes between nodes. Cypher has built-in support for shortest-path queries without needing the algorithm API:
shortestPath()— finds a single shortest path between two nodes.allShortestPaths()— finds all shortest paths of equal length.
MATCH p = shortestPath((a)-[*]-(b))WHERE a.name = 'Alice' AND b.name = 'Bob'RETURN pThe [*] means “any number of hops”. You can constrain it with [*1..5] for a
maximum depth.
For weighted shortest paths (e.g. minimizing cost or distance), you would use the algorithm API or NetworkX integration, which supports Dijkstra and other weighted algorithms.
# --- Shortest path with Cypher ---# Find the shortest path between any two connected nodesdf = conn.query_df(""" MATCH (a), (b) WHERE a <> b AND a.name IS NOT NULL AND b.name IS NOT NULL WITH a, b LIMIT 1 MATCH p = shortestPath((a)-[*..5]-(b)) RETURN a.name AS start, b.name AS end, length(p) AS hops, [n IN nodes(p) | n.name] AS path_nodes""")df# --- Variable-length paths: find all nodes within N hops ---df = conn.query_df(""" MATCH (start) WHERE start.name IS NOT NULL WITH start LIMIT 1 MATCH (start)-[*1..2]-(neighbour) RETURN DISTINCT start.name AS origin, neighbour.name AS reachable, labels(neighbour)[0] AS label LIMIT 10""")dfChoosing the right algorithm depends on the question you are asking:
| Question | Algorithm Family | Example |
|---|---|---|
| Who is most important? | Centrality | PageRank, Betweenness |
| What groups exist? | Community Detection | Louvain, Label Propagation |
| How are two nodes connected? | Pathfinding | Shortest Path, BFS |
| How similar are two nodes? | Similarity | Jaccard, Cosine |
Decision framework:
- Start with the business question — “Who are the key influencers?” maps to PageRank. “Which teams form naturally?” maps to Louvain.
- Consider the graph structure — directed vs undirected, weighted vs unweighted.
- Check scale — some algorithms (Label Propagation) scale to millions of nodes; others (Betweenness Centrality) are more expensive.
- Combine algorithms — run PageRank + Louvain together to find “the most important person in each community”.
The cell below demonstrates this combination pattern.
# --- Combining algorithms: top-ranked person per community ---# Both pr_score and community_id were computed in earlier cellsdf = conn.query_df(""" MATCH (p:Person) WHERE p.pr_score IS NOT NULL AND p.community_id IS NOT NULL WITH p.community_id AS community, p ORDER BY p.pr_score DESC WITH community, head(COLLECT(p.name)) AS top_person, head(COLLECT(round(p.pr_score, 4))) AS top_score, count(p) AS community_size RETURN community, top_person, top_score, community_size ORDER BY community_size DESC""")dfKey Takeaways
- Centrality: PageRank (influence), Betweenness (bridges)
- Community: Louvain (modularity), Label Prop (fast)
- Pathfinding: BFS (unweighted), Dijkstra (weighted)
- Choose based on question: who, what group, how connected