Object Caching Configuration User Guide

Introduction

nuu:bit Object Caching Concepts

nuu:bit Caching Rules

Introduction

Wildcard Patterns

Rule Execution Order

Rule Structure

Rule Parameters

Rule Examples

 

Introduction

This document is designed to provide basic information to users who need to configure caching rules for nuu:bit edge proxies. The nuu:bit Caching Configuration UI (a portion of the nuu:bit Portal system) is used to configure appropriate caching rules for a particular domain. In the Portal, using the Configuration tab (then click on the “CDN” tab, and then click on "Define Caching Rules (Advanced Mode)" button), a user will see a GUI that lists a default caching configuration in JSON for a domain that has no prior caching configuration, or the present caching configuration, if previously created. After understanding the following descriptions of nuu:bit’s approach to specifying object caching rules, a user may access the nuu:bit Portal to modify the default (or previous) configurations as presented by the Portal interface. After editing has been completed, the user may exit the editor, and “publish” the configuration that was input or modified.

nuu:bit Object Caching Concepts

By default, the nuu:bit edge proxy caches objects that simultaneously match all the conditions below:

  1. The origin marks them as cacheable by returning either an Expires header or a Cache-Control header with an s-max-age or max-age parameter larger than 0. If both are present, the Cache-Control header takes precedence. If both s-max-age and max-age are present, the former takes precedence. By default the object is cached for the duration specified by the origin:
  • If Expires is present, but Cache-Control isn’t, the cache lifetime (referred to as time to live (TTL) in this document) is the difference between the time specified by the header and the current time on the edge proxy.
  • If Cache-Control is present and its [s-]max-age is larger than zero, then TTL is the value, in seconds, specified by the [s-]max-age parameter.
  1. The object is requested through a GET or HEAD HTTP method. All other methods, such as POST, PUT and DELETE are not cached.
  2. The browser doesn’t send any cookies with the request and the origin doesn’t set any cookies in the response. This behavior can be changed, as we will see later.
  3. The object is public - that is, it doesn’t require authentication as signalled by the presence of an Authorization header in the request.

These restrictions make many, if not all objects uncacheable, mostly because the browser sends the domain cookies as part of every request for a resource in that domain, regardless of type. For example, it sends them both for HTML pages and images.

The work around for this issue requires actual knowledge about the website structure. The cache needs to be told:

  1. Which objects are cacheable and for how long, even if the origin doesn’t mark them as such.
  2. Which objects are dynamically generated and, among these, which ones require cookies in order to generate the resource.

The nuu:bit Configuration UI allows the overriding of caching parameters for any set of URLs belonging to a domain. Currently, the interface allows a user to define a list of caching rules in JSON format, as described in the section nuu:bit Caching Rules. In the future, a simpler, user-friendly user interface will be implemented.

Let us now look at how the cache actually works:

  1. First, a request for a resource is received from the browser.
  2. Check that all the conditions for cacheability of this resource, defined above, are met for the request:

a. If at least one of them isn’t met:

i.   Download the object from the origin 

ii.  Override the browser TTL if necessary (see nuu:bit Caching Rules)

iii. Return it to the browser without caching it

iv. DONE

  1. If the object is deemed cacheable:

       a. Compute a hash value, composed of:

i. The domain name

ii. The full URL, containing the query string, if present. (The query string      behavior can be controlled, as described below.)

iii. The user agent capabilities. These capabilities are taken from the “User-Agent” request HTTP header (although the full “User-Agent” header is not utilized.)

iv. The set of request cookies that are relevant for the object, as defined through caching rules (see nuu:bit Caching Rules)

b. Check if the cache already contains an unexpired object associated with this hash value. (The “stale” object TTL default is 10 seconds.)

i. If it does:

    1. Override the browser TTL if necessary (see nuu:bit Caching Rules)
    2. Return the cached object
    3. DONE


c. If the object is not in the cache or it is expired, download the object from the origin.

d. Check that all the conditions for cacheability of this resource, defined above, are met for the response:

i. If at least one of them isn’t met:

    1. Override the browser TTL if necessary (see nuu:bit Caching Rules)
    2. Return the object to the browser without caching it
    3. DONE

e. The object is cacheable

i. Compute the TTL of the object
ii. Insert it into the cache and associate it with the previously computed hash value

f. Override the browser TTL if necessary (see nuu:bit Caching Rules)
g. Return the object to the browser
h. DONE

Please note that, given the manner in which the hash value is computed, there can be more than one instance of the same object in the cache at any time:

  • One instance for each user agent with different capabilities (mobile/desktop, supported HTML features, webp support etc.)
  • One instance for each combination of query parameters and their values, if present
  • One instance for each combination of cookies and their values, if cookies are allowed (see nuu:bit Caching Rules)

nuu:bit Caching Rules

Introduction

A caching rule is a set of parameters that changes the caching behavior of one or more URLs that match a pattern, specified as either a wildcard expression or a Perl-compatible regular expression. Please refer to this document for the syntax of regular expressions.

Wildcard Patterns

The supported wildcard patterns are a subset of the Unix shell glob patterns, described here. The following are supported:

  • *           -  match any number of any characters
  • [...]       -  match any character out of a set of characters
  • ?          -  match the preceding character or range zero or one times
  • {a,b,...} - match any of the patterns a or b or …
  • \ch        - escape the special character ch

An example of the use of a wildcard pattern is as follows: “/documents/**/records[0-9][0-9][0-9].{doc,docx,txt}”. More examples can be found in the Examples Section

In addition, we support the globstar wildcard pattern, **, described here .

Rule Execution Order

It is important to understand the order in which rules are executed: The caching rules are interpreted in the order they are provided. Newer rules that match an object that are also matched by a previous rule will override the previous rule for that object. This means one must ensure that the rules are ordered from “more general” to “more specific”. For example:

  1. The first rule will match all JPEG images on the site
  2. The second rule will match all JPEG images in the /album directory
  3. The third rule will match all JPEG images in the /album/special directory

Rule Structure

A rule has the following template in JSON:

 

{

    "version": 2,

    "url": {

        "is_wildcard": true,

        "value": "url_string"

    },

    "edge_caching": {

        "override_origin": true,

        "override_no_cc": false,

        "new_ttl": 300,

        "query_string_keep_or_remove_list": ["list", "of", "querystring", "parameters", "to", "keep", "or", "ignore"],

        "query_string_list_is_keep": false

    },

    "browser_caching": {

        "override_edge": true,

        "new_ttl": 400,

        "force_revalidate": true

    },

    "cookies": {

        "override": true,

        "remove_ignored_from_request": false,

        "remove_ignored_from_response": false,

        "ignore_all": false,

        "list_is_keep": true,

        "keep_or_ignore_list": [

            "str1",

            "str2"

        ]

    }

}

The default rule (**) is exposed in the Portal JSON editor.

Rule Parameters

Let us define the meaning of each parameter:

  • "version" is the version of the rule structure. It must always be set to 2.
  • "url" defines the URL or set of URLs that the rule applies to.
  • is_wildcard”, if true, means that value is a wildcard. If false, it means that value is a regular expression.
  • value is either a wildcard expression or a regular expression defining a rule.
  • "edge_caching" defines the behavior of the edge cache in relation to the origin.
  • if “override_origin” is false, the rest of the parameters under "edge_caching" are ignored and the cache obeys the TTL set by the origin (the exception is if “override_no_cc” option is used, it is not ignored). If it is true, the origin-set TTL of the object is overridden.
  • “override_no_cc”, if true, allows the cache to store objects for which the origin doesn’t explicitly set a TTL, i.e. there’s no max-age or Expires information.
  • “new_ttl” sets the “shelf life”, in seconds, of the object in the edge cache. A value of 0 means “don’t cache”. At present, the following suffixes are not supported: “m”, “h”, “d”, and “w”.
  • if "query_string_keep_or_remove_list" is empty, the feature is disabled and the URL is passed unchanged.
  • In order to remove the need for any query string, one can enter a unique but non-existent string. In this case, because the string will not be found, no query string parameters are obeyed.  
  • if "query_string_list_is_keep" is true and "query_string_keep_or_remove_list" is not empty, then the query string parameters in the list will be kept and all others will be removed. For example, if the URL is "/index.html?foo=bar&hello=world&nuubit=awesome" and the list is ["hello", "rev"], then the URL will be rewritten to "/index.html?hello=world&nuubit=awesome". This works for both caching and origin calls, and there will be no browser redirects to the new URL.
  • if "query_string_list_is_keep" is false and "query_string_keep_or_remove_list" is not empty, then the query string parameters in the list will be removed and all others will be kept. For example, if the URL is "/index.html?foo=bar&hello=world&nuubit=awesome" and the list is ["hello", "foo"], then the URL will be rewritten to "/index.html?nuubit=awesome".
  • "browser_caching" defines the behavior of the browser-side cache in relation to the edge cache. This browser caching behavior is controlled by setting cache-control header in the nuu:bit response sent to the browser.
  • if “override_edge” is false, the rest of the parameters under “browser_caching” section are ignored and the browser uses either the values set by “edge_caching” (if “override_origin” is true) or by the origin itself (if “override_origin” is false). If “override_edge” is true, the rest of the parameters under “browser_caching” section define the caching behavior of the browser.
  • “new_ttl” sets the maximum time, in seconds, that the browser can store the object in its cache. A value of 0 means “don’t cache”.
  • “force_revalidate”, if true, instructs the browser to check that the object is still fresh in the edge cache every time it is requested, even if a copy already exists in the browser cache.
  • "cookies" defines the behavior of the edge cache with respect to HTTP cookies. It’s therefore possible to ignore some or all of the cookies sent by the browser as part of the request, as well as strip some or all cookies from the response.
  • if “override” is false, the rest of the parameters under cookies are ignored and the edge cache will only cache objects if there aren’t any cookies either in the request or response. If it is true, the cookies that are deemed relevant for the retrieval of the resource will become a part of the hash value of the cached object, which means there will be one instance of the object in the cache for every possible combination of cookies and cookie values.
  • “remove_ignored_from_request” and “remove_ignored_from_response” instruct the edge cache to remove some, or all, cookies from the request and response. See below for details.
  • if “ignore_all” is true, “list_is_keep” and “keep_or_ignore_list” are ignored and the edge cache will not add any cookies to the hash value.
    If “remove_ignored_from_request” is true, all cookies are removed from the request before it is sent to the origin.
    If “remove_ignored_from_response” is true, all cookies are removed from the response before it is returned to the browser.
  • “list_is_keep” and “keep_or_ignore_list” define the list of cookies that are relevant for this object:
  • if “list_is_keep” is true, “keep_or_ignore_list” will define the “keep list”, which is the subset of cookies in the browser request that are used as part of the hash value of the cached object. All other cookies are ignored by the hash.
    If “remove_ignored_from_request” is true, these are also the only cookies that are sent to the origin as part of the request.
    If “remove_ignored_from_response” is true, these are also the only cookies that are returned to the browser as part of the response.
  • if “list_is_keep” is false, “keep_or_ignore_list” will define the “ignore list”, which is the subset of cookies in the browser request that are NOT used as part of the hash value of the cached object. All other cookies are used by the hash.
    If “remove_ignored_from_request” is true, these are also the only cookies that are removed from the request before it is sent to the origin.
    If “remove_ignored_from_response” is true, these are also the only cookies that are removed from the response before it is returned to the browser.

It’s now time to look at some examples.

Rule Examples

Let’s start by defining the URLs that we want our rules to apply to. The easiest way is to use wildcards. Regular expressions are a subject in itself and they will not be covered here. This means that all examples will have the “is_wildcard” flag set to true.

 

Examples

Purge all PNG files under /images, non-recursive (so e.g. files under /images/today/ will not be purged).

{
"version": 1,
"purges": [
 {
   "url": {
     "is_wildcard": true,
     "expression": "/images/*.png"
   }
 }
]
}

 

Purge all PNG files under /images, non-recursive (so e.g. files under /images/today/ will not be purged)

{
"version": 1,
"purges": [
 {
   "url": {
     "is_wildcard": true,
     "expression": "/images/*.png"
   }
 }
]
}

 

Purge all PNG files under /images, non-recursive (so e.g. files under /images/today/ will not be purged)

{
"version": 1,
"purges": [
 {
   "url": {
     "is_wildcard": true,
     "expression": "/images/*.png"
   }
 }
]
}

 

Purge everything, recursively, for current domain

{
"version": 1,
"purges": [
 {
   "url": {
     "is_wildcard": true,
     "expression": "/**/*"
   }
 }
]
}

 

Empty the whole cache, for all domains

{
"version": 1,
"purges": [
 {
   "url": {
     "is_wildcard": true,
     "expression": "**/*"
   }
 }
]
}

 

One important consideration is that the URL that is matched against the expression does not contain the domain name. In other words, it is relative. So, for example, the absolute URL “http://www.mydomain.com/images/flower.jpg” will become “/images/flower.jpg”.

The most generic wildcard expression, which matches all objects on the site, is “**”. The globstar pattern (**) matches any character, any number of times, recursively, so it will match “/index.html” as well as “/images/album/today/001.jpg” or any other URL.

Let’s now match all JPEG images in /images, but not in its subdirectories. The expression for this is “/images/*.jpg”.

If we also want to match the JPEG images in the subdirectories, the expression becomes “/images/**.jpg”.

JPEG images can have either the .jpg or the .jpeg extension. We also want to match PNG files. To match all three, the expression becomes “/images/**.{jpg,jpeg,png}”.

It might be that our images have a specific filename format. Let’s assume it is “img” followed by three digits, then the extension (like “img123.png”). The expression becomes “/images/**/img[0-9][0-9][0-9].{jpg,jpeg,png}”.

Knowing how to define the URL(s), we can now create custom caching rules.

Let’s assume we know we can cache all images for one year (31560000 seconds). Our first rule will look like this:

 

{

    "version": 1,

    "url": {

        "is_wildcard": true,

        "value": "/images/**"

    },

    "edge_caching": {

        "override_origin": true,

        "override_no_cc": true,

        "new_ttl": 31560000

    },

    "browser_caching": {

        "override_edge": false,

        "new_ttl": 0,

        "force_revalidate": false

    },

    "cookies": {

        "override": false,

        "remove_ignored_from_request": false,

        "remove_ignored_from_response": false,

        "ignore_all": false,

        "list_is_keep": false,

        "keep_or_ignore_list": [

        ],

        "remove_ignored_from_request": false,

        "remove_ignored_from_response": false

    }

}

 

Notice that, apart from override_origin, all other override… flags are false. This means we will only modify the edge cache TTL for images and set it to one year, regardless of what the origin says, and we will pass the origin cache control settings to the browser.

However, our site also has a directory containing fresh images that are added each day in “/images/today/”. We want to cache only those images for one day (86400 seconds). We have to add another rule to the one defined above:

 

{

    "version": 1,

    "url": {

        "is_wildcard": true,

        "value": "/images/**"

    },

    "edge_caching": {

        "override_origin": true,

        "override_no_cc": true,

        "new_ttl": 31560000

    },

    "browser_caching": {

        "override_edge": false,

        "new_ttl": 0,

        "force_revalidate": false

    },

    "cookies": {

        "override": false,

        "remove_ignored_from_request": false,

        "remove_ignored_from_response": false,

        "ignore_all": false,

        "list_is_keep": false,

        "keep_or_ignore_list": [

        ],

        "remove_ignored_from_request": false,

        "remove_ignored_from_response": false

    }

},

{

    "version": 1,

    "url": {

        "is_wildcard": true,

        "value": "/images/today/**"

    },

    "edge_caching": {

        "override_origin": true,

        "override_no_cc": true,

        "new_ttl": 86400

    },

    "browser_caching": {

        "override_edge": false,

        "new_ttl": 0,

        "force_revalidate": false

    },

    "cookies": {

        "override": false,

        "remove_ignored_from_request": false,

        "remove_ignored_from_response": false,

        "ignore_all": false,

        "list_is_keep": false,

        "keep_or_ignore_list": [

        ],

        "remove_ignored_from_request": false,

        "remove_ignored_from_response": false

    }

}

 

With these two rules in place, we will cache all images for one year, except the ones in “/images/today”, which we cache for one day.

Let’s now assume there’s a directory containing big background images, which doesn’t change very often. Moreover, these large images are used by the browser very often, so we would like it to cache them for a long time. The trouble is we don’t know when they change, but it’s not more often than once a day.

The solution is to let the edge cache keep these images for one day, but allow the browser to keep them for one year, with the condition that it must check they haven’t changed every time they are loaded. It’s much faster to check if a resource has changed than downloading it completely, especially since the browser only has to validate against the copy stored in the edge cache, which is up to date because the images don’t change more than once per day.

The corresponding rule is the following. The sections that are of no interest (with override… flags set to false) are omitted for clarity:

 

"url": {        

        "is_wildcard": true,

        "value": "/images/backgrounds/**"

    },

    "edge_caching": {

        "override_origin": true,

        "override_no_cc": true,

        "new_ttl": 86400

    },

    "browser_caching": {

        "override_edge": true,

        "new_ttl": 31560000,

        "force_revalidate": true

    },

...

 

Finally, let’s assume we have a special URL, “/images/geobackground.php”, that returns a background image based on the geographical location of the user. The location is stored in a browser cookie called “geoloc”. We’d like to store the background in the edge cache for each possible value of the “geoloc” cookie, which means we have to add the contents of the cookie to the hash value of “/images/geobackground.php” in the edge cache. Again, the relevant parts of the rule follow:

 

    "url": {

        "is_wildcard": true,

        "value": "/images/geobackground.php"

    },

    "cookies": {

        "override": true,

        "remove_ignored_from_request": false,

        "remove_ignored_from_response": false,

        "ignore_all": false,

        "list_is_keep": true,

        "keep_or_ignore_list": [

        "geoloc"

        ],

        "remove_ignored_from_request": false,

        "remove_ignored_from_response": false

    }

 

If all these rules are present, in the order they have been presented, the cache behavior will be:

  • Store one instance of the “/images/geobackground.php” image for every value of the “geoloc” cookie, for one year (because of the first rule, applying to all images) and also for one year in the browser cache.
  • Store all background images in “/images/backgrounds/” for one day in the edge cache, but for one year in the browser cache, forcing a revalidation at every access.
  • Store all of today’s images in “/images/today/” for one day in the edge cache and also for one day in the browser cache.
  • Store all other images for one year in both caches.

As you can see,a more specific rule overrides the setting of any more general rule preceding it:

 

{

    "version": 1,

    "url": {

      "value": "\\.(jpg|jpeg|png|gif|webp|js|css|woff)(\\?.*)?$",

      "is_wildcard": false

    },

    "edge_caching": {

      "new_ttl": 604800,

      "override_origin": true,

      "override_no_cc": true

    },

    "browser_caching": {

      "new_ttl": 0,

      "override_edge": false,

      "force_revalidate": false

    },

    "cookies": {

      "remove_ignored_from_request": true,

      "ignore_all": true,

      "remove_ignored_from_response": true,

      "override": true,

      "list_is_keep": false,

      "keep_or_ignore_list": []

    }

  }

 

 

Summary

 

nuu:bit offers flexible caching rules.  Please feel free to contact nuu:bit’s support if you have any questions about this document.

Have more questions? Submit a request

0 Comments

Article is closed for comments.
Powered by Zendesk