Building caching keys in CloudFront

Author: Lee Aplin

Caching is temporarily storing recently accessed data. The benefit of this is that retrieving cached data is much quicker than requesting data from a server, which creates a faster and more enjoyable experience for website visitors.

Our websites benefit greatly from caching and we use several different techniques to improve their performance, but CloudFront has become our primary means of caching in the past few years. When configured correctly, it’s very reliable and drastically improves loading speeds as well as efficiency when serving assets to your website.

CDN caching – CloudFront

The CDN (Content Delivery Network) in our case is AWS CloudFront. Using a CDN we are able to host and access website assets remotely in an Amazon database.

The advantages of this are:

  • Fast asset loading times
  • Less data stored on our servers (because CSS, Javascript and images can be stored in the cloud instead) – this results in faster loading times and less strain on the servers
  • Utilising Amazon’s Edge locations – this means that data can be served to website users from a location that’s physically closer to them, rather than one centralised server. This reduces latency and greatly improves load times

CloudFront can cache whole pages as well as assets. The difference between CloudFront and other transient caching methods, is that we store this data in the cloud rather than a Redis database (an introduction to Redis). This gives us much finer control over what, and for how long data is cached. We control what is cached by building a cache key.

Building the cache key in CloudFront

While CloudFront is a very powerful means of caching pages/content on our sites, we are always careful to configure it so that an effective cache key is produced.

CloudFront offers a lot of flexibility in how you cache your content, and a unique cache key can be made by allowing different values in the key.

This is done by using a caching policy. The values that can be included in a cache key are request headers, query strings and cookies.

Combining these values in different combinations will produce different cache keys, so it is important to consider what exactly will end up being cached. This can be illustrated by the following example.

Example

Consider a search page on a website. The user journey might look like this:

  • A visitor lands on your search page
  • They type a search term into your search form
  • The site adds a query string (`?s=search-term`) to the url of the page and returns any relevant results

Depending on the settings in your caching policy, this could result in two different scenarios.

1 – You do not include query strings in your caching policy

  • The search page will be cached when the user initially lands on it, without any search results
  • The user types a search term into the form and submits the query
  • The user is returned to the search page without any results being displayed

2 – You include query strings in your caching policy (either specifying them individually or allowing all query strings)

  • When the user submits a search query using the form, CloudFront recognises the query string in the url (`?s=cheese`) and creates a new cache key to include it
  • The user sees the expected results and the results page is cached

Let’s explain the above in a bit more detail.

In scenario 1, the empty search page is cached. When the search query is submitted the url changes to look like this:

`https://test-site.com/search/?s=cheese`

But since we haven’t instructed CloudFront to consider query strings it can only see this:

`https://test-site.com/search`

This page has already been cached, so regardless of what the user searches for, CloudFront will only ever use this URL to create the cache key. So every subsequent search will return the same page, without search results. This is unexpected behaviour and is a poor user experience.

In scenario 2 we tell CloudFront to use query strings as part of the cache key, so it uses the whole URL including the query string. The user is returned to a results page with everything related to “cheese” and that page is cached separately. This means that the next person who searches for “cheese” will be shown the cached page, which will load quickly.

Takeaways

The above example is very basic. It doesn’t take into account cookies or HTTP headers, which add layers of complexity to your configuration. But hopefully it illustrates how the cache key is built, and how we are able to combine different values to create unique cache keys.

Another scenario where it would be wise to adjust a caching key is if a web url contains UTM parameters (these are tracking tags included in the URL to track how a user landed on the website). Since every UTM tag is unique, we would want to exclude these parameters to prevent creating a unique cache key for every visitor, because this would negate the benefits of caching the page at all.

Navigating all of the cookies and HTTP headers on a site can be tricky, and there is usually an element of trial and error involved before hitting the right balance and caching just the right content. But when done right, CloudFront can provide us with huge performance benefits.

 

If you have any questions on the above, or would like to have a chat, just get in touch: team@substrakt.com