Skip to content

Aggregation

Tutorial

Aggregation

COUNT, SUM, AVG, COLLECT and grouping

25 min Intermediate
CypherAggregationGROUP BY

What You'll Learn

  • Counting - COUNT rows and distinct values
  • Collecting - COLLECT into lists
  • Grouping - Implicit GROUP BY behaviour
  • Connection Degree - Count relationships per node
Info:
Prerequisites
Complete 01_property_graphs before this tutorial.
# Cell 1 — Parameters
USERNAME = "_FILL_ME_IN_" # Set your email before running
# Cell 2 — Connect
from graph_olap import GraphOLAPClient
client = GraphOLAPClient(username=USERNAME)
# Cell 3 — Provision
from notebook_setup import provision
personas, conn = provision(USERNAME)
analyst = personas["analyst"]
admin = personas["admin"]
ops = personas["ops"]
client = analyst
print(f"Connected | {conn.query_scalar('MATCH (n) RETURN count(n)')} nodes")
1

COUNT Function

Counting results

COUNT(x) returns the number of non-null values. Use COUNT(DISTINCT x) to count only unique values. Without a GROUP BY clause, the count applies to the entire result set.

# Count all customers
total = conn.query_scalar("MATCH (c:Customer) RETURN count(c)")
print(f"Total customers: {total}")
# Count distinct sectors
distinct_sectors = conn.query_scalar(
"MATCH (c:Customer) RETURN count(DISTINCT c.bk_sectr)"
)
print(f"Distinct sectors: {distinct_sectors}")
2

COLLECT Function

Building lists from results

COLLECT() gathers values into a list. When combined with a non-aggregated column, it groups automatically — one list per group.

# Collect all connections per customer into a list
result = conn.query("""
MATCH (a:Customer)-[:SHARES_ACCOUNT]->(b:Customer)
RETURN a.id AS customer, collect(b.id) AS connections
ORDER BY customer
""")
result.show()
3

Grouping Results

Implicit GROUP BY

Cypher has no explicit GROUP BY keyword. Instead, any non-aggregated column in RETURN automatically becomes a grouping key. This is similar to SQL’s implicit grouping.

RETURN c.bk_sectr AS sector, count(c) AS count
-- ^^^^^^^^^^^ ^^^^^^^^
-- grouping key aggregation
# Group customers by sector
result = conn.query("""
MATCH (c:Customer)
RETURN c.bk_sectr AS sector, count(c) AS count
""")
result.show()
# Group by ID type
result = conn.query("""
MATCH (c:Customer)
RETURN c.acct_stus AS id_type, count(c) AS count
ORDER BY count DESC
""")
result.show()
4

Connection Degree

Counting relationships per node

A node’s degree is the number of relationships it has. Counting relationships per node reveals the most-connected entities in the graph.

Using an undirected pattern (c)-[r:SHARES_ACCOUNT]-() counts both incoming and outgoing edges.

# Count connections per customer (degree)
result = conn.query("""
MATCH (c:Customer)-[r:SHARES_ACCOUNT]-()
RETURN c.id AS customer, count(r) AS degree
ORDER BY degree DESC
""")
result.show()

Key Takeaways

  • COUNT(c) counts rows; COUNT(DISTINCT c.x) counts unique values
  • COLLECT() gathers grouped values into a list
  • Non-aggregated columns in RETURN act as implicit GROUP BY keys
  • Counting relationships per node reveals the most-connected entities