Date:

New Benchmark for Real-Time Analytics Released

Real-Time Analytics: A New Benchmark for Evaluating Performance

Real-time analytics pushes the limits on data that distributed hardware and software can deliver. To adequately measure the relative performance of real-time analytics databases, Timescale today released a real-time analytics benchmark dubbed RTABench.

The Need for a New Benchmark

Traditional column-store databases are not designed to handle the high-concurrency, low-latency, and real-time updates required by modern applications. Timescale’s flagship offering, TimescaleDB, is a modified version of Postgres that treats time-series data as a first-class data type. The company has been adopted in gaming and other consumer-facing applications that are exposed to fast-changing data and require low-latency responses to many concurrent users.

The Problem with ClickBench

Timescale notes that ClickHouse launched ClickBench, a real-time analytics benchmark. Several dozen databases have taken the test since it launched in 2022, with the Umbra database currently holding the number one position. TimescaleDB shows five entries in the ClickBench results, where it sits in the bottom 25%.

However, Timescale was not entirely happy with ClickBench. The company says that the way ClickBench evaluates databases – by "using a single table of clickstream data, representative of workloads like web analytics, BI, and log aggregation" – is not conducive to a fair hearing on the full breadth of real-time analytic workloads.

Introducing RTABench

So Timescale developed its own benchmark to better address the real-world workloads that it sees real-time analytics being asked to run. What makes RTABench different is how it handles behind-the-scenes data tasks in real-time analytics databases, such as joins, filters, and pre-aggregations.

Key Features of RTABench

  • Joins: database joins are important to bring together tables storing disparate data, such as event data and metadata. "You need fast joins on fresh data to retrieve related records from multiple tables," the company writes in the blog.
  • Filtering and indexing: filtering and indexing are other common database techniques to avoid the dreaded full-table scans. "Databases built for real-time applications must excel at indexing, partitioning, and fast lookups – not just bulk aggregations over large datasets," Timescale writes.
  • Pre-aggregations: pre-aggregations are another common way to speed up the inevitable queries that will come down the pike. "Existing benchmarks like ClickBench do not benchmark pre-aggregation," Timescale writes, "but many real-time applications depend on it for sub-second response times."

How RTABench Works

To develop RTABench, Timescale started with the open source ClickBench framework and then modified it with different data and queries. It also created RTABench to work on normalized data (i.e. data straight from the database), as opposed to working on denormalized data, as ClickBench has done.

The Results

The database that Timescale created for the benchmark contains 171 million order events, about 1,100 customers, more than 9,250 products, and about 10 million historical orders. Timescale then created 40 queries that are designed to test how the database handles common tasks, such as counting the number of departed shipments per day from a specific terminal, finding the last recorded status of a given order, or showing the total revenue generated by each customer in the last 30 days.

Conclusion

RTABench is a new benchmark that evaluates databases using query patterns that mirror real-world application workloads – something missing from existing benchmarks. Unlike ClickBench and other benchmarks, RTABench closely reflects the actual needs of real-time analytics applications, measuring key factors such as joins, selective filtering, and pre-aggregations.

FAQs

Q: What is RTABench?
A: RTABench is a new benchmark that evaluates databases using query patterns that mirror real-world application workloads.

Q: Why did Timescale develop RTABench?
A: Timescale developed RTABench to better address the real-world workloads that it sees real-time analytics being asked to run.

Q: What are the key features of RTABench?
A: The key features of RTABench include joins, filtering and indexing, and pre-aggregations.

Q: How does RTABench work?
A: RTABench works by using the open source ClickBench framework and modifying it with different data and queries. It also creates RTABench to work on normalized data, as opposed to working on denormalized data.

Q: What are the results of RTABench?
A: The results of RTABench include the performance of several databases, including TimescaleDB, ClickHouse, MongoDB, Postgres, and MySQL.

Latest stories

Read More

LEAVE A REPLY

Please enter your comment!
Please enter your name here