Date:

IBM to Buy DataStax for Database, GenAI Capabilities

IBM Acquires DataStax, Strengthening Its Capabilities in Unstructured Data Management and Generative AI

IBM today announced its intent to acquire DataStax, the longtime backer of the Apache Cassandra database that has recently broadened its reach into streaming data and generative AI. IBM cited DataStax’s capability to manage unstructured data as well as its vector database, which is used for developing RAG solutions.

The Early Days of Cassandra

Apache Cassandra was originally developed at Facebook in 2008 to serve the social network’s need for a highly scalable, fault-tolerant database to store big data generated by users on its website. Facebook was a big user and creator in the nascent big data ecosystem, building its social media empire atop non-relational technology like Apache Hadoop and HBase, another NoSQL data store, as well as Apache Hive, which it created to make Hadoop look like a relational database.

DataStax’s Journey

Cassandra, which technically is a wide-column store that favors data availability and reliability (at the expense of data consistency), became a top-level project at the Apache Software Foundation in 2010. That is the same year that Jonathan Ellis and Matt Pfeil co-founded a company in Austin, Texas called Riptano, which it quickly renamed DataStax.

At first, DataStax followed the typical commercial open-source business model, offering an enterprise version of Apache Cassandra called DataStax Enterprise (DSE). The company, which had moved to Santa Clara, California by 2014, attracted customers from the Fortune 500, such as FedEx, Capital One, and Verizon. It has raised $106 million in venture capital at a $830 million valuation, and was on pace for an IPO in the 2015 or 2016 timeframe.

Astra DB and K8ssandra

That IPO never happened, as MongoDB dominated the NoSQL space and went public in 2017. In May 2020, DataStax launched Astra DB, a fully managed version of Cassandra running in the cloud atop Kassandra, giving customers the scalability and availability benefits of the NoSQL database but without the management responsibilities (like many distributed systems, Cassandra can be difficult to manage). Later that year, it released K8ssandra, an open source version of the database running atop the resource manager.

Expansion into Streaming Data and Generative AI

Soon, the company started branching beyond NoSQL databases. In 2021, it launched Astra Streaming, an event streaming platform based on Apache Pulsar, a publish and subscribe (pub-sub) data platform that competes with Apache Kafka. In 2023, DataStax bought Kaskada, an AI startup that helped to automate tedious feature engineering tasks, and made the software open source under the Luna ML brand.

DataStax further bolstered its generative AI capabilities in 2023 with the launch of a vector store in Astra DB. Vector stores emerged as critical tools for building retrieval-augmented generation (RAG) pipelines to bolster the accuracy of large language model (LLM) output in generative AI applications. Then in 2024, DataStax further fleshed out its RAG story when it nabbed Langflow, which developed an open source framework for building RAG pipelines.

IBM’s Acquisition of DataStax

All of the accumulated capabilities that DataStax built and bought obviously caught the eye of IBM. Big Blue, which has been rallying its business to some degree on the back of its Watsonx AI offerings, cited open source projects like Apache Cassandra, Apache Pulsar, Langflow, and OpenSearch (a branch of Elasticsearch and Kibana) in its press release announcing the acquisition.

IBM is particularly enamored of how DataStax has built its unstructured data management capabilities under a single product. While it didn’t mention DataStax’s Hyper-Converged Data Platform (HCDP) by name, it seems clear that IBM is banking on harnessing the tech to help customers turn unstructured data into winning AI applications.

Conclusion

The acquisition is expected to close in the second quarter, and terms of the deal were not disclosed. DataStax was valued at $1.6 billion during its most recent funding round, in June 2022. The company has raised $342.6 million over several rounds. It has hundreds of paying customers, according to IBM.

FAQs

Q: What is DataStax?
A: DataStax is a company that offers a suite of products and services based on the Apache Cassandra database and other open-source technologies.

Q: Why is IBM acquiring DataStax?
A: IBM is acquiring DataStax to strengthen its capabilities in unstructured data management and generative AI.

Q: What are the key products and services offered by DataStax?
A: DataStax offers a range of products and services, including Astra DB, K8ssandra, Astra Streaming, and Luna ML.

Q: What is the expected closing date of the acquisition?
A: The acquisition is expected to close in the second quarter.

Latest stories

Read More

LEAVE A REPLY

Please enter your comment!
Please enter your name here