Date:

Campspot Scores Data Pipeline Win with Apache Airflow and Astronomer

Searching for the Perfect Campsite Can be a Hit-and-Miss Affair

Searching for the perfect campsite can be a hit-and-miss affair, as one seeks the perfect combination of a view, sufficient parking, and proximity to neighbors and services, among other factors. When it came to selecting a tool to manage its big data pipeline, the online reservation company Campspot didn’t have to look any further than Apache Airflow and, eventually, the hosted Airflow service from Astronomer.

The Data-Driven World of Campspot

If you’ve got a hankering for some camping, then Campspot is a good place to start. Founded in 2015 in Grand Rapids, Michigan, the software-as-a-service (SaaS) company lets customers make reservations at more than 2,700 private campgrounds, RV resorts, cabins, and “glamping” locations in the United States and Canada. All told, Campspot manages the reservations for more than 230,000 campsites across North America, which has helped earn the company the nickname “the Expedia of campgrounds.”

Data Management Challenges

While campers might measure their overall satisfaction by the number of s’mores consumed per day, Campspot’s partners–the campground owners–need just a bit more data. For instance, every day, they need to know which of their campsites are reserved, how many total are reserved, and how that compares to previous time periods.

The Data Platform Team

The responsibility of keeping the campground owners’ data appetite properly sated falls to John Marriott, manager of Campspot’s data platform team. According to Marriott, the company runs a nightly batch job that takes the latest data from the homegrown reservation management system and rolls it up into its data warehouse. This data is then bundled up into PDF or CSV reports that are either emailed to Campspot partners or made available for viewing on a Web-based dashboard. The company also offers a “signals” product to its partners that compares their existing reservations to an anonymized set of competitors in their space.

The Data Pipeline Conundrum

Prior to 2022, managing all of these data transformation jobs was mostly a manual affair. It was up to individual engineers to decide how data a data pipeline should be constructed to enable data to flow from the reservation system, which runs on a mix of Postgres, MySQL, and DynamoDB databases, into its data warehouse, which runs on a combination of Snowflake and Postgres.

The Solution: Apache Airflow and Astronomer

Marriott and his team realized they needed to get a handle on these data pipeline jobs. They had heard of tools that can automate the execution of thousands of data pipelines. They perceived that Apache Airflow was the early leader in this space, and after investigating Airflow, they adopted it in 2022.

Ease of Use

Apache Airflow’s ease of use was a major selling point for Campspot. While Airflow offers a few different ways to work with the product, including GUIs, Campspot’s developers are code-first types, and they gravitated to Airflow’s command-line and programmatic interfaces. Similarly, they also liked how Airflow and its Python-based batch jobs easily fit into their existing DevOps workflows.

The Switch to Astronomer

As an AWS shop, Campspot decided to take advantage of AWS’s Amazon Managed Workflows for Apache Airflow (MWAA) offering out of the gate. While AWS’s managed Airflow environment was better than what they had in place before, Campspot found that MWAA wasn’t as easy to manage as they had initially hoped.

Conclusion

In conclusion, Campspot’s switch to Apache Airflow and Astronomer’s Astro environment has streamlined their data pipeline, allowing them to focus on what matters most – delivering high-quality data to their partners. With Airflow, Campspot has been able to automate the execution of thousands of data pipelines, reducing the complexity and increasing the efficiency of their data management process.

Frequently Asked Questions

Q: What is the main challenge in managing data pipelines?
A: The main challenge is to automate the execution of thousands of data pipelines, reducing the complexity and increasing the efficiency of the data management process.

Q: What is Apache Airflow?
A: Apache Airflow is an open-source platform for programmatically defining, scheduling, and monitoring workflows.

Q: What is Astronomer’s Astro environment?
A: Astronomer’s Astro environment is a hosted Airflow service that provides a managed platform for running and monitoring Airflow workflows.

Q: How does Campspot use Apache Airflow?
A: Campspot uses Apache Airflow to automate the execution of thousands of data pipelines, reducing the complexity and increasing the efficiency of their data management process.

Q: What are the benefits of using Apache Airflow?
A: The benefits of using Apache Airflow include automating the execution of thousands of data pipelines, reducing the complexity and increasing the efficiency of the data management process, and providing a scalable and flexible platform for managing large-scale data pipelines.

Latest stories

Read More

LEAVE A REPLY

Please enter your comment!
Please enter your name here