Ryugraph and NetworkX integration for graph analytics
Ryugraph and NetworkX integration for graph analytics
Section titled “Ryugraph and NetworkX integration for graph analytics”Ryugraph provides native NetworkX conversion through its get_as_networkx() method, enabling a seamless workflow from Cypher queries to Python graph analytics. Ryugraph is the only embedded graph engine shipped by the Graph OLAP Platform (see packages/ryugraph-wrapper/). It offers zero-copy DataFrame integration, automatic disk spilling for large queries, and vectorized execution with strong path-finding performance.
Naming note. This platform ships Ryugraph exclusively; the wrapper (
packages/ryugraph-wrapper/) depends onryugraph. The examples below use theryugraphimport andRYUGRAPH_*configuration prefix.
Native NetworkX conversion eliminates manual data wrangling
Section titled “Native NetworkX conversion eliminates manual data wrangling”The Ryugraph Python API provides direct graph conversion through the QueryResult class. After executing a Cypher query that returns nodes and relationships, calling get_as_networkx(directed=True) produces a NetworkX DiGraph (or Graph for undirected) with all node and edge properties automatically transferred as attributes.
import ryugraphimport networkx as nx
db = ryugraph.Database("./transaction_db")conn = ryugraph.Connection(db)
# Extract subgraph via Cypherresult = conn.execute(""" MATCH (c:Client)-[t:TransactedWith]->(m:Merchant) WHERE t.is_disputed = true RETURN *;""")
# Convert to NetworkX DiGraphG = result.get_as_networkx(directed=True)
# Run any NetworkX algorithmpagerank = nx.pagerank(G)centrality = nx.betweenness_centrality(G)Nodes in the resulting graph use composite identifiers combining the label and primary key (e.g., 'Client_35'). All properties from Ryugraph transfer directly—node properties become G.nodes[node_id] attributes, edge properties become G.edges[u, v] attributes. The method handles heterogeneous graphs containing multiple node and relationship types, making it suitable for complex property graph schemas.
Cypher patterns optimized for subgraph extraction
Section titled “Cypher patterns optimized for subgraph extraction”The most effective Cypher patterns for NetworkX workflows return complete graph structures rather than scalar projections. Using RETURN * or explicitly returning node and relationship variables ensures the conversion receives the full graph topology.
Variable-length paths use Kleene star syntax with bounds:
-- 1 to 3 hops with intermediate filteringMATCH p = (a:User)-[:Follows*1..3 (r, n | WHERE r.since < 2022 AND n.age > 30)]->(b:User)WHERE a.name = "Alice"RETURN *;Multi-hop neighborhood extraction for ego networks:
MATCH (center:Person {id: $target_id})-[r*1..2]-(neighbor)RETURN *LIMIT 1000;For large subgraph exports, pagination prevents memory issues:
MATCH (n:Person) RETURN n.* ORDER BY n.id SKIP 100000 LIMIT 100000;DataFrame integration provides a powerful intermediate layer
Section titled “DataFrame integration provides a powerful intermediate layer”Ryugraph’s zero-copy integration with Polars, Pandas, and PyArrow enables sophisticated ETL pipelines. Results can flow through DataFrames for transformation before NetworkX conversion, or algorithm outputs can be written back to the graph.
import polars as pl
# Run NetworkX algorithmpagerank_scores = nx.pagerank(G)
# Transform to Polars DataFramedf = pl.DataFrame({ "name": list(pagerank_scores.keys()), "pagerank": list(pagerank_scores.values())})
# Write results back to Ryugraphconn.execute("ALTER TABLE Person ADD IF NOT EXISTS pagerank DOUBLE DEFAULT 0.0;")conn.execute(""" LOAD FROM df MERGE (p:Person {name: name}) SET p.pagerank = pagerank;""")The LOAD FROM syntax directly scans DataFrames and PyArrow tables without requiring file export, providing efficient round-trip workflows between graph storage and Python analytics.
Configuration for optimal performance
Section titled “Configuration for optimal performance”Ryugraph runs as an embedded database requiring no server process. The key initialization parameters control memory and concurrency:
db = ryugraph.Database( database_path="./my_database", buffer_pool_size=4_294_967_296, # 4GB explicit buffer pool max_num_threads=8, # Limit thread parallelism compression=True, # Enable storage compression lazy_init=False, # Immediate initialization read_only=False # Read-write access)
conn = ryugraph.Connection(db, num_threads=4)Buffer pool defaults to ~80% of system memory—explicitly setting this prevents memory pressure on systems running other workloads. Runtime configuration via Cypher CALL statements allows dynamic tuning:
CALL THREADS=6; -- Thread limit for connectionCALL TIMEOUT=30000; -- 30-second query timeoutCALL spill_to_disk=true; -- Enable disk spillingCALL progress_bar=true; -- Monitor long queriesIn the Graph OLAP Platform the wrapper exposes these as RYUGRAPH_BUFFER_POOL_SIZE and RYUGRAPH_MAX_THREADS environment variables (see ryugraph-performance.reference.md).
Disk spilling handles larger-than-memory operations
Section titled “Disk spilling handles larger-than-memory operations”Ryugraph automatically spills intermediate results to disk when memory pressure approaches buffer pool limits. This enables processing graphs that exceed available RAM—benchmarks show 17 billion edges loaded in ~70 minutes with only 102GB memory available.
Disk spilling creates temporary .tmp files in the database directory, automatically cleaned after query completion. The feature is disabled for in-memory databases and read-only connections. For large relationship table ingestion, this capability is essential:
| Buffer Memory | Load Time (17B edges, 32 threads) |
|---|---|
| 420GB | 1 hour |
| 205GB | 1 hour 8 minutes |
| 102GB | 1 hour 10 minutes |
Embedded deployment in the Graph OLAP Platform
Section titled “Embedded deployment in the Graph OLAP Platform”Ryugraph operates as an embedded database by default—the library runs in-process with no network overhead, similar to SQLite or DuckDB. This architecture provides the fastest performance but limits to a single read-write process per database directory.
The Graph OLAP Platform wraps this single in-process Ryugraph instance in a FastAPI service (packages/ryugraph-wrapper/) and spawns one pod per user-facing graph instance. There is no multi-client Ryugraph API server deployment shipping with the platform; all client access goes through the wrapper’s REST endpoints. See ryugraph-wrapper.design.md and system.architecture.design.md for the deployment topology.
The concurrency model requires understanding: create one Database object per process, spawn multiple Connection objects for concurrent queries, but only one write transaction executes at a time. Multiple processes can open the same database in read-only mode.
Memory-efficient patterns for large graph extraction
Section titled “Memory-efficient patterns for large graph extraction”Converting large graphs to NetworkX requires careful memory management since NetworkX stores everything in Python dictionaries. Three strategies optimize this workflow:
Batched extraction processes subgraphs incrementally:
def extract_batched(conn, batch_size=100000): offset = 0 while True: result = conn.execute(f""" MATCH (a)-[r]->(b) RETURN * SKIP {offset} LIMIT {batch_size} """) subgraph = result.get_as_networkx() if subgraph.number_of_edges() == 0: break yield subgraph offset += batch_sizeNative algorithm extension avoids NetworkX overhead entirely—Ryugraph’s algo extension implements PageRank, betweenness centrality, and community detection in C++, running significantly faster than Python equivalents. Use NetworkX only for algorithms unavailable natively.
COPY TO for bulk export extracts data to Parquet for external processing:
COPY (MATCH (n:Person)-[r:KNOWS]->(m:Person) RETURN n.id, m.id, r.weight)TO 'edges.parquet';Extension loading in the platform image
Section titled “Extension loading in the platform image”Ryugraph normally requires explicit installation from an extension server and loading per connection. When an extension server is needed, it is addressed via the in-cluster wrapper service DNS:
-- Install algo extension from the in-cluster wrapper service (illustrative)INSTALL algo FROM 'http://ryugraph-wrapper:8080/';-- Load the extension (required for each connection)LOAD algo;The Graph OLAP wrapper avoids this round trip entirely by pre-baking libalgo.ryu_extension into its Docker image at build time, at the path /root/.ryu/extension/{ryugraph_version}/linux_amd64/algo/. As a result, the wrapper just calls LOAD algo at startup and Ryugraph finds the binary in its local extension cache with no network call. See ADR-138 for the rationale.
Complete integration workflow example
Section titled “Complete integration workflow example”This example demonstrates the full pipeline from database setup through NetworkX analysis and result persistence:
import ryugraphimport networkx as nximport polars as pl
# Initialize with explicit memory configurationdb = ryugraph.Database("./graph_analytics", buffer_pool_size=2_147_483_648)conn = ryugraph.Connection(db)
# Create schemaconn.execute(""" CREATE NODE TABLE IF NOT EXISTS Person(id STRING PRIMARY KEY, name STRING, age INT64); CREATE REL TABLE IF NOT EXISTS Knows(FROM Person TO Person, weight DOUBLE);""")
# Bulk load from Parquet (fastest method)conn.execute("COPY Person FROM 'persons.parquet';")conn.execute("COPY Knows FROM 'relationships.parquet';")
# Extract subgraph for analysisresult = conn.execute(""" MATCH (p1:Person)-[k:Knows]->(p2:Person) WHERE k.weight > 0.5 RETURN *;""")
# Convert to NetworkXG = result.get_as_networkx(directed=True)print(f"Extracted {G.number_of_nodes()} nodes, {G.number_of_edges()} edges")
# Run algorithmspagerank = nx.pagerank(G, weight='weight')communities = list(nx.community.greedy_modularity_communities(G.to_undirected()))
# Prepare results as DataFramedf = pl.DataFrame({ "id": [n.split("_")[1] for n in pagerank.keys()], "pagerank": list(pagerank.values())})
# Write back to graphconn.execute("ALTER TABLE Person ADD IF NOT EXISTS pagerank DOUBLE DEFAULT 0.0;")conn.execute(""" LOAD FROM df MERGE (p:Person {id: id}) SET p.pagerank = pagerank;""")
# Verifytop_nodes = conn.execute(""" MATCH (p:Person) RETURN p.id, p.name, p.pagerank ORDER BY p.pagerank DESC LIMIT 10;""").get_as_pl()
conn.close()db.close()Graph OLAP Platform Integration
Section titled “Graph OLAP Platform Integration”For implementation details of how the Graph OLAP Platform uses Ryugraph with NetworkX, see:
- ryugraph-wrapper.design.md - FastAPI wrapper implementation
- system.architecture.design.md - Pod architecture and data flows
Conclusion
Section titled “Conclusion”Ryugraph provides production-ready NetworkX integration through native conversion methods that preserve all graph properties. The embedded architecture delivers exceptional performance for analytical workloads—COPY FROM ingests data ~18x faster than Neo4j, while disk spilling enables processing of billion-edge graphs on commodity hardware. The DataFrame integration creates flexible ETL pipelines where Polars or Pandas can transform data before or after graph analysis.
Key architectural decisions: use batch operations via COPY FROM rather than individual inserts, configure buffer pool explicitly for predictable memory usage, leverage the native algo extension before falling back to NetworkX, and rely on the FastAPI wrapper (not an external API-server image) for multi-client access. With Ryugraph maintaining active development, the platform remains a viable choice for graph analytics workflows requiring Python ecosystem integration.