Subqueries
Subqueries
CALL, UNION, and subquery patterns
What You'll Learn
- CALL Subqueries - Nested query execution
- UNION - Combine result sets
- EXISTS Subqueries - Conditional pattern checks
- Correlated Subqueries - Reference outer scope
# Cell 1 — ParametersUSERNAME = "_FILL_ME_IN_" # Set your email before running# Cell 2 — Connectfrom graph_olap import GraphOLAPClientclient = GraphOLAPClient(username=USERNAME)
# Cell 3 — Provisionfrom notebook_setup import provisionpersonas, conn = provision(USERNAME)analyst = personas["analyst"]admin = personas["admin"]ops = personas["ops"]client = analyst
print(f"Connected | {conn.query_scalar('MATCH (n) RETURN count(n)')} nodes")CALL { ... } lets you run a subquery inside a larger query. The subquery executes in its own
scope, and only the variables it explicitly RETURNs are visible to the outer query.
Why use subqueries?
- Isolation — the subquery has its own variable namespace, avoiding conflicts.
- Post-processing — aggregate inside the subquery, then continue processing outside.
- Per-row execution — when placed after a
WITH, the subquery runs once per input row.
Basic syntax:
CALL { MATCH (p:Person) RETURN p.name AS name, p.age AS age}RETURN name, ageORDER BY age DESCThe outer query can only see name and age — not p itself.
Important: a CALL subquery MUST end with a RETURN clause. Everything that is not
returned is invisible to the outer scope.
# --- Basic CALL subquery ---# The subquery finds nodes and returns selected fields; the outer query just sortsdf = conn.query_df(""" CALL { MATCH (n) RETURN labels(n)[0] AS label, n.name AS name LIMIT 20 } RETURN label, name ORDER BY label, name""")df# --- Subquery with aggregation ---# The subquery counts nodes per label; the outer query filters to labels with many nodesdf = conn.query_df(""" CALL { MATCH (n) RETURN labels(n)[0] AS label, count(n) AS cnt } WITH label, cnt WHERE cnt >= 2 RETURN label, cnt ORDER BY cnt DESC""")dfUNION combines the results of two or more queries into a single result set.
UNIONremoves duplicate rows (like SQLUNION).UNION ALLkeeps all rows, including duplicates (faster — no deduplication).
Both sides of a UNION must return the same column names.
MATCH (p:Person) RETURN p.name AS nameUNIONMATCH (c:Company) RETURN c.name AS nameUNION is also frequently used inside CALL { } subqueries to combine different
query branches:
CALL { MATCH (p:Person) RETURN p.name AS name UNION ALL MATCH (c:Company) RETURN c.name AS name}RETURN nameOPTIONAL MATCH is a related concept — it works like MATCH, but returns null for
unmatched patterns instead of eliminating the row:
MATCH (p:Person)OPTIONAL MATCH (p)-[:MANAGES]->(sub:Person)RETURN p.name, sub.nameIf a person manages nobody, sub.name is null but the row for p still appears.
# --- UNION ALL: combine results from different labels ---df = conn.query_df(""" MATCH (n) WHERE 'Person' IN labels(n) RETURN n.name AS name, 'Person' AS type LIMIT 5 UNION ALL MATCH (n) WHERE 'Company' IN labels(n) RETURN n.name AS name, 'Company' AS type LIMIT 5""")df# --- OPTIONAL MATCH: keep rows even when no match is found ---# Compare MATCH (drops unmatched rows) vs OPTIONAL MATCH (keeps them with nulls)df = conn.query_df(""" MATCH (a) OPTIONAL MATCH (a)-[r]->(b) RETURN a.name AS source, type(r) AS relationship, b.name AS target LIMIT 10""")dfEXISTS { ... } is a boolean subquery that checks whether a pattern exists in the graph.
It returns true or false without actually returning the matched data.
This is useful for filtering — keep only nodes that have (or lack) a certain pattern:
MATCH (p:Person)WHERE EXISTS { MATCH (p)-[:WORKS_AT]->(:Company) }RETURN p.nameThis returns only people who work at a company.
Negation — use NOT EXISTS to find nodes that lack a pattern:
MATCH (p:Person)WHERE NOT EXISTS { MATCH (p)-[:WORKS_AT]->() }RETURN p.name AS unemployedEXISTS subqueries are more readable than the older pattern-predicate syntax and can
express more complex conditions (multi-hop patterns, aggregation checks, etc.).
# --- EXISTS: find nodes that have outgoing relationships ---df = conn.query_df(""" MATCH (n) WHERE EXISTS { MATCH (n)-[]->() } RETURN n.name AS has_outgoing, labels(n)[0] AS label LIMIT 10""")df# --- NOT EXISTS: find nodes with no outgoing relationships (leaf nodes) ---df = conn.query_df(""" MATCH (n) WHERE NOT EXISTS { MATCH (n)-[]->() } RETURN n.name AS leaf_node, labels(n)[0] AS label LIMIT 10""")dfA correlated subquery references variables from the outer query. This is what makes
CALL { } subqueries truly powerful — you can pass context from the outer query into
the subquery using WITH.
MATCH (p:Person)CALL { WITH p -- import p from the outer scope MATCH (p)-[r]->() RETURN count(r) AS rel_count}RETURN p.name, rel_countORDER BY rel_count DESCHow it works:
- The outer
MATCHproduces one row perPerson. - For each row, the
CALLsubquery runs with that specificp. - The subquery counts relationships for that person and returns the count.
- The outer query combines
p.namewithrel_count.
Key rule: inside a correlated subquery, only variables imported via WITH from the outer
scope are visible. All other outer variables are hidden.
WITH for chaining is a related technique — even outside subqueries, WITH acts as a
pipeline separator that lets you reshape results mid-query:
MATCH (n)WITH labels(n)[0] AS label, count(n) AS cntWHERE cnt > 5RETURN label, cntORDER BY cnt DESC# --- Correlated subquery: count relationships per node ---df = conn.query_df(""" MATCH (n) CALL { WITH n MATCH (n)-[r]->() RETURN count(r) AS out_degree } RETURN n.name AS node, labels(n)[0] AS label, out_degree ORDER BY out_degree DESC LIMIT 10""")df# --- WITH chaining: aggregate, filter, then return ---df = conn.query_df(""" MATCH (n)-[r]->() WITH labels(n)[0] AS label, type(r) AS rel_type, count(*) AS cnt WHERE cnt >= 2 RETURN label, rel_type, cnt ORDER BY cnt DESC LIMIT 10""")dfKey Takeaways
- CALL {} executes subqueries in isolated scope
- UNION combines results (UNION ALL keeps duplicates)
- EXISTS checks for pattern existence
- Correlated subqueries reference outer variables