How loveholidays used a Cloud Native event-driven system to cut hotel content publishing time from 5 days to less than 1 minute

Mark Duce
loveholidays tech
Published in
7 min readOct 21, 2021

--

At loveholidays hotel content editors used to have to wait up to 5 days for their content changes to appear on the website. Now it happens in less than 1 minute. Here’s how we achieved it.

Descriptive content for hotels is a big conversion driver. When our copywriters change the content, we want to get their changes live as fast as possible

Hotel content at loveholidays

At loveholidays we work with hundreds of thousands of hotels around the world, coming from many different sources. These sources can be direct relationships with hotels, or API connections to bedbanks. On a daily basis, we’re crunching through nearly half a trillion hotel and flight combinations to allow customers to find their dream holiday.

On the Supply team, we’re responsible for providing all the data required to sell holidays, as well as the integrations to actually book the components.

One part of this is to curate our hotel content. We import content from supplier APIs, FTP locations and from our own content team via a CMS. Whenever this content changes or we get new content, we want the website to reflect those changes as fast as possible so that we’re offering our customers the best experience with the most accurate information.

Where we were

Our old system revolved around a SQL database that existed only in production, with a sequence of linux cron jobs executed on a VM that eventually chained together to publish hotel content to our website. The system was difficult to understand as it involved pieces of Java code reading from a database executed by Bash scripts with little in the way of centralised logging and no visibility when parts of it failed. The lack of tests and test coverage meant it was difficult to change and very brittle.

By far the biggest problem was the length of time for changes to be reflected in production. This was up to 5 days because the system essentially threw away everything and started from scratch each week. This long running time meant we’d get regular requests into the team to check where the content update was in the process and we were often asked if we could manually speed it up. This also led to a long feedback cycle, so a small error in code would take a couple of days to show in production, at which point the whole process would need to be started again.

We wanted our new system to have a development environment, be cheap to run, be easy to change and to have good test coverage.

However the most important aim was that we wanted the content changes to go live within 10 minutes.

Our new architecture

In order to satisfy these 5 criteria, we’re relying on these key technologies:

- Google Cloud Storage (GCS)

- Kubernetes to run Java Spring Boot apps

- Google PubSub

- Google Firestore

Here’s our high level architecture

Although the flow looks initially complex, it keeps a consistent way of working. Events flow through the system using PubSub, and the outputs are then written to GCS. It means each application has a clear responsibility and also makes it easy to debug as the inputs and outputs can be found in GCS.

Step 1: Hotel Converter

We get content updates from our suppliers and we have a team of internal copywriters.

The process starts with content arriving from suppliers or arriving from our content team. The content is written to GCS and we then publish an event to trigger the downstream systems.

This event is put on a topic in PubSub. The first application in the process is called the hotel converter. The job of this application is to subscribe to the topic, take the content and standardise it to our object model and then save it to GCS. Suppliers give us XML, JSON, CSVs and deliver it via lots of different methods, so this step helps to simplify everything down the line, as we’re now working with a simple object model and consistent data.

We work with hundreds of thousands of hotels and many suppliers, so writing content to GCS every time they send it to us can become expensive. We therefore take a hash of the content that they have sent us, and store the hash in Google Firestore. Each time we get content, we retrieve the existing hash from Firestore (if it’s there) and if it has changed or it didn’t exist then we write to GCS. Otherwise we skip the write to GCS. This technique was a big cost saver for us. More on that soon in a different post…

We use Horizontal Pod Autoscaling (HPA) in Kubernetes to scale up and down the number of instances that we’re running. This means that when there are lots of updates our system automatically scales up, but when there are not many updates coming through our system scales down.

The final job of the hotel converter is to send an event to a topic that lets other applications know that we have new standardised content ready to go in GCS.

Content Generator

Our content generator subscribes to the aforementioned topic and takes each of the standardised pieces of content that we have in GCS for each of the suppliers. It then applies some hierarchical rules to decide which content we’re going to use, merges the content together and produces what we call our definitive content. This is the content that will eventually go to the website.

As with the hotel converter, the app runs in Kubernetes and scales up and down on demand. It also uses the same technique as the hotel converter to ensure that we’re only saving content to GCS when it has actually changed.

Finally the content generator puts an event on a topic to let other applications know that we’ve got new definitive content in our bucket.

Content Repository

Our final application in the stack is our centralised repository for content, aptly named Content Repository. It’s designed to be one interface for all the users of hotel content around loveholidays.

This application subscribes to the definitive content update topic, and pushes the changes live to the website by updating the static files that we hold to serve the website.

Benefits of new architecture

We’re now reaping the benefits of our new architecture.

  • Content changes go live in about 30 seconds instead of 5 days
  • Since we’ve gone live, we’ve had no requests in to the team to check the status of where content is in the pipeline. This has left us free to work on other priorities.
  • We now have a solid process in development and production, so we can test changes in development and have confidence they’ll work when they go to production. And when they do go live the feedback time is vastly reduced.
  • Our code is clean, well unit tested and covered by SonarQube static code analysis.
  • The flow of events makes it easy for a new recruit to pick up a single piece of the architecture, understand what it does and make changes to it.
  • The process is highly scalable. As we’re not relying on a production only relational database anymore and everything is cloud based, we can just scale up the number of instances easily to make this process work at any scale.
  • We’ve reduced our dependency on manually provisioned virtual machines
  • We managed to improve our reporting with little effort. We just added an extra subscription to one of our topics so that we can write the data to BigQuery. This allowed us to build reporting about how good our content is. This reporting helps our content team prioritise the hotels that need their content updating the most.
  • We’re planning on taking a similar architectural approach with some future initiatives, particularly around hotel images. This gives us a good repeatable pattern that can then be easily understood by new recruits.
  • We’ve adopted a zero errors approach to this system. Every single error gets sent to our Slack alerts channel. We then understand where the error is coming from and fix it. So it’s a really stable process.

Downsides of new architecture

Whilst having single responsibility apps does make it very simple to change that application, it is a little harder to trace a content change all the way through the system, as a few different applications are involved before it goes live.

We also can improve the scaling nature of the application. Our current usage of HPA means that we always have at least one instance of each app running, but if we were to switch to KEDA then we could scale all the way down to no apps running.

Conclusion

We’re really proud of how our architecture turned out. We’ve learned a lot along the way and we’ve ended up with an architecture that’s good for the business and we’re happy with it technically. We’re looking forward to using some of the techniques we’ve employed here in some of our other projects.

If you’d like to work on this type of thing, we’re hiring! You can also check out our other articles.

--

--