A New Player in the Big Data World: PuppyGraph
A Novel Concept: Marrying Data Lakehouse with Graph Database
A startup called PuppyGraph is gaining attention in the big data world with its innovative concept: combining the data storage efficiency of the data lakehouse with the analytic capabilities of a graph database. The result is a distributed, column-oriented OLAP graph query engine that can scale horizontally into the petabyte range.
The Problem with Graph Databases
Graph databases are known for their performance advantage over relational databases when it comes to executing certain types of queries across connected data. However, they have a fundamental limitation: the data must be ETL’d into the database before the graph engine can process it. This downtime is a major obstacle for graph databases used for analytics, although it’s less of an issue for OTLP workloads.
The Solution: PuppyGraph
PuppyGraph, founded in 2023 by software engineer Weimo Liu, aims to eliminate this limitation. By separating the compute and storage layers and building a vectorized and column-oriented graph query engine, PuppyGraph can deliver fast OLAP graph performance on massive data stored in object stores, eliminating the downtime associated with loading data into graph databases.
PuppyGraph Architecture
The company’s architecture is built around a logical graph layer running atop columnar data models. This approach allows PuppyGraph to efficiently run graph queries without processing all the data in each record. The query engine can access only the necessary attributes, making it more scalable and efficient.
How PuppyGraph Works
PuppyGraph currently supports Cypher and Gremlin, the two most popular graph query languages. The company leverages the Google F1 query engine design, which enables the query engine to map certain attributes of the source data into a logical graph layer composed of nodes and edges. This column-based approach allows PuppyGraph to efficiently run graph queries without having to process all of the data in each record.
Performance and Scalability
PuppyGraph’s design enables fast OLAP graph performance on massive data. The company’s use of caching and indexing makes queries run fast, and its adoption of SIMD processing provides more parallelism. The entire product runs in a Docker container atop Kubernetes, which handles resource scheduling and provides elasticity.
PuppyGraph in Action
After building the first PuppyGraph prototype, Liu contacted the founders of Tabular, the commercial outfit behind the Iceberg table format (since acquired by Databricks). The Iceberg founders were impressed that a three-hop query on Azure ran faster than dedicated graph databases, Liu says. "They realize, oh, there is a potential for other data models," he says.
PuppyGraph’s Future
PuppyGraph is a young company with paying customers, including one company involved in cryptocurrency. The company has attracted $5 million in seed funding and is targeting OLAP graph and graph analytic use cases, such as fraud detection and regulatory compliance with its BYOC cloud offerings. A fully managed version of PuppyGraph is in the works.
Conclusion
PuppyGraph’s innovative approach to graph analytics has the potential to revolutionize the way we process big data. By combining the data storage efficiency of the data lakehouse with the analytic capabilities of a graph database, PuppyGraph can deliver fast OLAP graph performance on massive data. With its scalable architecture and support for popular graph query languages, PuppyGraph is poised to make a significant impact in the big data world.
FAQs
Q: What is PuppyGraph?
A: PuppyGraph is a distributed, column-oriented OLAP graph query engine that combines the data storage efficiency of the data lakehouse with the analytic capabilities of a graph database.
Q: What are the benefits of PuppyGraph?
A: PuppyGraph eliminates the downtime associated with loading data into graph databases, providing fast OLAP graph performance on massive data stored in object stores.
Q: What are the use cases for PuppyGraph?
A: PuppyGraph is designed for OLAP graph and graph analytic use cases, such as fraud detection and regulatory compliance.

