Aggregation
Aggregation
COUNT, SUM, AVG, COLLECT and grouping
What You'll Learn
- Counting - COUNT rows and distinct values
- Collecting - COLLECT into lists
- Grouping - Implicit GROUP BY behaviour
- Connection Degree - Count relationships per node
# Cell 1 — ParametersUSERNAME = "_FILL_ME_IN_" # Set your email before running# Cell 2 — Connectfrom graph_olap import GraphOLAPClientclient = GraphOLAPClient(username=USERNAME)
# Cell 3 — Provisionfrom notebook_setup import provisionpersonas, conn = provision(USERNAME)analyst = personas["analyst"]admin = personas["admin"]ops = personas["ops"]client = analyst
print(f"Connected | {conn.query_scalar('MATCH (n) RETURN count(n)')} nodes")COUNT Function
Counting results
COUNT(x) returns the number of non-null values. Use COUNT(DISTINCT x) to
count only unique values. Without a GROUP BY clause, the count applies to the
entire result set.
# Count all customerstotal = conn.query_scalar("MATCH (c:Customer) RETURN count(c)")print(f"Total customers: {total}")
# Count distinct sectorsdistinct_sectors = conn.query_scalar( "MATCH (c:Customer) RETURN count(DISTINCT c.bk_sectr)")print(f"Distinct sectors: {distinct_sectors}")COLLECT Function
Building lists from results
COLLECT() gathers values into a list. When combined with a non-aggregated
column, it groups automatically — one list per group.
# Collect all connections per customer into a listresult = conn.query(""" MATCH (a:Customer)-[:SHARES_ACCOUNT]->(b:Customer) RETURN a.id AS customer, collect(b.id) AS connections ORDER BY customer""")
result.show()Grouping Results
Implicit GROUP BY
Cypher has no explicit GROUP BY keyword. Instead, any non-aggregated column
in RETURN automatically becomes a grouping key. This is similar to SQL’s
implicit grouping.
RETURN c.bk_sectr AS sector, count(c) AS count-- ^^^^^^^^^^^ ^^^^^^^^-- grouping key aggregation# Group customers by sectorresult = conn.query(""" MATCH (c:Customer) RETURN c.bk_sectr AS sector, count(c) AS count""")
result.show()
# Group by ID typeresult = conn.query(""" MATCH (c:Customer) RETURN c.acct_stus AS id_type, count(c) AS count ORDER BY count DESC""")
result.show()Connection Degree
Counting relationships per node
A node’s degree is the number of relationships it has. Counting relationships per node reveals the most-connected entities in the graph.
Using an undirected pattern (c)-[r:SHARES_ACCOUNT]-() counts both incoming
and outgoing edges.
# Count connections per customer (degree)result = conn.query(""" MATCH (c:Customer)-[r:SHARES_ACCOUNT]-() RETURN c.id AS customer, count(r) AS degree ORDER BY degree DESC""")
result.show()Key Takeaways
COUNT(c)counts rows;COUNT(DISTINCT c.x)counts unique valuesCOLLECT()gathers grouped values into a list- Non-aggregated columns in
RETURNact as implicitGROUP BYkeys - Counting relationships per node reveals the most-connected entities