Date:

AWS Unveils Hosted Apache Iceberg Service

AWS Unveils New S3 Bucket Type Optimized for Apache Iceberg

AWS today unveiled a new S3 bucket type that’s optimized for storing data in Apache Iceberg, which has become the defacto standard for open table formats. AWS will not only automate the "undifferentiated heavy lifting" of table maintenance with the new S3 bucket type, but it will deliver a massive speedup in analytics using the Iceberg table.

The Rise of Iceberg

The events of this June, when Databricks acquired Tabular and Snowflake launched the Polaris metadata catalog for Iceberg, are still reverberating around the big data community. Customers who previously might have been hesitant to invest in building a data lakehouse out of fear of choosing the wrong table format were given the greenlight as the industry settled on Iceberg.

Benefits of S3 Tables

That’s basically what AWS is doing with today’s launch of Amazon S3 Tables. AWS says the new bucket type optimizes storage and querying of tabular data as Iceberg tables, where it can be consumed by multiple query engines, including AWS services like Amazon Athena, EMR, Redshift, and Quicksight, but also open source query engines like Apache Spark and others. Storing data in this way gives customers benefits like row-level transaction support, queryable snapshots via time travel functionality, schema evolution, and other Iceberg capabilities.

Performance Boost

Parquet and Iceberg are designed for large-scale big data analytic environments, and AWS says it’s upping the performance with Amazon S3 Tables. The company claims its new Iceberg service delivers up to 3x faster query performance and up to 10x higher transactions per second (TPS) compared to plain vanilla Parquet files stored on standard S3 buckets.

Metadata Service

In addition to a managed Iceberg service, AWS took the next step and launched a metadata service to help manage the morass of data stored in Iceberg environments. The company says the new offering, dubbed S3 Metadata, will "automatically generates queryable object metadata in near real-time to help accelerate data discovery and improve data understanding, eliminating the need for customers to build and maintain their own complex metadata systems."

Customer Adoption

One of the AWS customers planning to use S3 Tables is Genesys, a provider of AI orchestration tools. The company says using S3 Tables will enable it to offer a materialized view layer for its diverse data analysis needs.

Conclusion

S3 Tables are generally available now. S3 Metadata is available as a preview. For more information on S3 Tables, read this AWS blog. For more information on S3 Metadata, read this AWS blog.

FAQs

Q: What is the new S3 bucket type optimized for?
A: The new S3 bucket type is optimized for storing data in Apache Iceberg.

Q: What are the benefits of using S3 Tables?
A: S3 Tables provide benefits like row-level transaction support, queryable snapshots via time travel functionality, schema evolution, and other Iceberg capabilities.

Q: How does S3 Metadata work?
A: S3 Metadata automatically generates queryable object metadata in near real-time to help accelerate data discovery and improve data understanding.

Q: Is S3 Metadata available now?
A: S3 Metadata is available as a preview.

Q: Can I use S3 Tables with my existing query engines?
A: Yes, S3 Tables can be consumed by multiple query engines, including AWS services like Amazon Athena, EMR, Redshift, and Quicksight, but also open source query engines like Apache Spark and others.

Latest stories

Read More

LEAVE A REPLY

Please enter your comment!
Please enter your name here