The lastmodified2 plugin
In a previous post I discussed the general problem of validating and caching dynamic content. In order to implement the strategy outlined in that post I decided to create a new version of the lastmodified plugin originally created by Bob Schumaker. The lastmodified plugin was a good base to build on; however it didn't do exactly what I wanted to do, and hence I couldn't resist trying to improve on it.
The strategy revisited
As you may recall, in my previous post I outlined an overall strategy for how to support validating and caching dynamic content. Here's a recap of that strategy, with additional detail added on the subject of validation:
When sending responses to requests, add a
Content-lengthheader to identify the total number of bytes in the response.
When sending responses to requests, add an
ETagheader to identify the "version number" (entity tag) for this particular version of the page, and/or a
Last-Modifiedheader to identify the date/time the page was last modified. These are computed as follows, depending on whether weak or strong validation is used:
For weak validation, the
Last-modifiedheader should reflect the date/time modified of the most recently-updated "semantically-significant" component of the page. (For example, for Blosxom we consider entries to be semantically significant, but not flavour templates.) The
ETagheader can then supply a weak etag directly derived from this date/time.
For strong validation, the
ETagheader should change if even a single bit on a page changes; for example, it could be derived from the MD5 or SHA-1 digest of the page. A
Last-modifiedheader value could then be determined by consulting a cached copy of the
Last-modifiedvalues for the URI; if there is a cache match then the
Last-modifiedvalue can taken from the cache, otherwise it can be arbitrarily assigned to be a date in the recent past.
When sending responses to requests, also add
Expiresheaders to the response to provide a "use by" date/time to clients doing caching.
When processing requests, look for the
If-modified-sinceheaders. If one or both are present, return the full page in the response only if necessary: if the version of the page currently available is different than the version requested in the
If-none-matchheader, or if the page has been modified since the date in the
This section and the next describe in more depth how I implemented the above strategy.
First, the plugin is designed to have its behavior easily modifiable
using configurable variables, as is done with other Blosxom
plugins. In particular, it is possible to specify whether the plugin
should do strong or weak validation (
$strong boolean variable) and
whether it should generate an
or both (
$generate_mod boolean variables). By
default the plugin is configured to be a "plug-compatible" replacement
for the lastmodified plugin, doing weak validation and generating both
The basic plan of the lastmodified2 plugin is as follows:
startsubroutine: Read in the cached information containing the previous
Last-modifiedvalues for this URI.
filtersubroutine: Get the information necessary for weak validation by traversing the list of entries to be displayed on the page and determining the date/time any of the entries was most recently modified. Use this last-modified date/time to create a weak
skipsubroutine: For weak validation interpret any
If-modified-sinceheaders and determine whether or not we need to send a full response. If not we can skip the actual story processing after setting
Statusto 304 (Not Modified) and generating any other headers appropriate for a 304 response.
lastsubroutine: For strong validation generate an MD5 digest of the page and use this to create the
ETagvalue. Create the
Last-modifiedheader by using the cached
Last-modifiedvalue if the new
ETagvalue matches the cached
ETagvalue, otherwise assigning a new
Last-modifiedvalue in the very recent past. Then interpret any
If-modified-sinceheaders in the request and determine whether or not we need to send a full response. In either case send the appropriate headers.
Note that there is also a
story subroutine in the lastmodified2
plugin, but its purpose is restricted to setting output variables
(e.g., for use in flavour templates) for compatibility with the
lastmodified plugin. It does not affect the actual caching and
Like the lastmodified plugin, this version of the plugin looks for and
acts upon the
If-modified-since header itself, instead of letting
the underlying web server deal with it. Note that the 1.3 and 2.0
versions of Apache in common use today have a feature whereby the
underlying web server will handle
If-modified-since checks as long
as the CGI script simply sets the
Last-modified header; this can be
used to easily implement simple validation. (Previous versions of this
plugin relied on this feature.)
The plugin also looks for and acts upon the
header. (Apache does not do this for CGI scripts, so the plugin has no
choice but to do it itself.) Note that for weak validation
generate_etag set to 1 but
$strong set to 0) we generate entity
tag values using the date/time modified of the most recent entry, so
If-modified-since check and the
If-none-match check can
be done as soon as we compute the
Last-modified value, which is done
filter subroutine. This allows the plugin to save processing
time by skipping story processing (i.e., using the
when it does not need to return a full response to a conditional GET
For strong validation using
both set to 1) the
ETag value is computed as an MD5 digest of the
entire page as it will be returned to the user, in order to
distinguish changes that affect even a single bit of the page. We
can't skip story processing in this case since we need the complete
output (including the results of interpolating variables) in order to
compute the correct MD5 digest.
For strong validation using
$strong both set to 1) we also compute an MD5 digest of the entire
page as it will be returned to the user, and we compare that value
against a cached MD5 digest computed for the page on previous
requests. If they match then we know that no changes have occurred
since the previous requests, and we set the
Last-modified value to
the value cached with the MD5 digest. Otherwise we know that some
change has occurred since the time of the previous requests, but do
not know exactly when that change occurred; we therefore arbitrarily
Last-modified value to a time just prior to the time of the
current request. Note that we can't skip story processing in this case
either, since again we need the complete output (including the results
of interpolating variables) in order to compute the correct MD5
Note that for weak validation (
$strong set to 0) the
header does not necessarily provide the date/time at which the
actual (bit for bit) contents of the page last changed; instead it
provides the date/time at which the meaning of the page last
changed, i.e., because the contents of at least one entry on the page
were changed. It is possible for other elements on the page such as
headers, footers, or comments to change without changing the meaning
of the page in this sense, so in this case the
is only a "weak validator" as defined by section 13.3.3 of the HTTP
1.1 specification. When
$strong is set to 0 the entity tag provided
ETag header is derived from the
Last-modified value and
hence is also only a weak validator, and we explicitly mark it as such
by prefixing it with "W/", as described in section 3.11 of the HTTP
The net effect is that with weak validation if browsers, web caches,
and news aggregators caching the page send a conditional GET request
(i.e., with an
If-modified-since header) to
check the current status of the page, they will be given a brand new
copy of the page only if there have been "semantically significant"
changes to the page (in the words of the HTTP 1.1 specification). With
strong validation they will get a new copy of the page if the page has
changed in any way, no matter how slight.
Generation of the
Expires headers is relatively
straightforward: We use the value of
$freshness_time directly with
max-age directive of the
Cache-control header, and add it to
the current date/time to create a date/time in the future for the
Expires header. If
$freshness_time is set to 0 then we instead
no-cache directive with the
Cache-control header and set
Expires header to a date in the past.
Generation of the
Content-length header is also straightforward: We
simply use the length of
$blosxom::output. (This assumes of course
that no other plugin will subsequently be changing that output.) Note
that for HEAD requests Apache will not actually send the output but
will send the
Content-length header if set; in that case the
Content-length value reflects the length of the output that would
have been sent for a GET request, in compliance with section 14.13 of
the HTTP 1.1 specification.
Note that the plugin does not generate the
Content-length headers for a 304 (Not Modified) response, in
accordance with section 10.3.5 of the HTTP 1.1 protocol specification.
Finally, for upward compatibility the lastmodified2 plugin supports the following features present in the original lastmodified plugin:
Optionally checking the
%othershash for last-modified dates: Checking
%othersis one way to detect changes other than changes in the entries themselves; in particular it can be used to detect changes to flavour files used in creating the page. Unfortunately some of the entries in
%othersare not relevant for the page being created (e.g., flavour files for flavours other than the one currently being generated) and may cause the
Last-modifiedtime to be computed incorrectly. Also, checking
%otherswill not detect page changes due to interpolating variables into flavour files (e.g., for comments). Finally, some plugins that replace the default Blosxom
entriessubroutine (including the entries_cache_meta plugin in particular) do not create the
%othershash at all.
For the above reasons this feature is deprecated; you should not use it unless you need it for upward compatibility with your current lastmodified configuration. If you want to check for changes to a page outside the entries themselves then you should simply enable strong validation.
Exporting variables with the last-modified time and other times in RFC 822 and ISO 8601 formats: The lastmodified2 plugin computes these variables essentially in the same way as the lastmodified plugin; see the code and the plugin documentation below for more information.
Note that the variables
$latest_iso8601always refer to the date/time modified for the most recently updated entry, regardless of whether weak or strong validation is being used. The problem with interpreting
Last-modifiedvalue is that when using strong validation we wouldn't have values for these variables until we completed generating output for the page, too late for the variables to be of any use.
If you want to use the lastmodified2 plugin to replace an existing
configuration of the lastmodified plugin, change the plugin's filename
and Perl package name (i.e., in the
package statement at the
beginning of the code) to "lastmodified" and set the
$use_others configurable variables to match
your current values. All other configurable variables can be left as
This section and the succeeding ones contain more in-depth documentation of the lastmodified2 plugin to supplement the material included in the plugin itself.
The lastmodified2 plugin enables caching and validation of dynamically-generated Blosxom pages by web browsers, web proxies, news aggregators, and other clients by generating various cache-related HTTP headers in the response and supporting conditional GET requests, as described below. This can reduce excess network traffic and server load caused by requests for RSS or Atom feeds or for web pages for popular entries or categories.
The plugin generates an
ETag header to identify the particular
version of the page, as well as a
Last-modified header based on the
plugin's determination of when the contents of the page were most
recently modified. The plugin also recognizes and properly acts on an
If-modified-since header in a request,
enabling a client to check whether the page has changed since it last
requested the page. This reduces network traffic for the site, because
the server can skip returning a copy of the page if in fact it has not
The plugin can also optionally generate
Expires headers to specify how long copies of a page should be
retained by caches. This reduces server load for the site, because web
proxies and other caching clients can use a cached copy of the page
and avoid sending additional requests for the page (including
conditional GET requests) to the site's server for as long as the page
remains fresh. Alternatively you can use the
Expires headers to specify that pages should not be cached at all
under any circumstance. This helps ensure that users always get the
most up-to-date content, at the expense of increased server load.
Finally, the plugin also generates a
containing the length in bytes of the content ("entity body" in HTTP
1.1 jargon). Providing a
Content-length header supports persistent
connections for clients that use the HTTP 1.0 "keep-alive" mechanism
(as documented in section 19.7.1 of RFC 2068); this can reduce the
number of connections to the site in some cases.
Note that at present this plugin can be used as a replacement for the lastmodified plugin and its default configuration is essentially equivalent to that of the lastmodified plugin, as discussed below.
Installation and configuration
To install the lastmodifed2 plugin copy the plugin file into your Blosxom plugin directory. You should not normally need to rename the plugin; however see the discussion below.
Configurable variables specify how the plugin handles validation
whether or not to generate any other recommended headers
$generate_length), and whether to implement features from the
lastmodified plugin for compatibility (
For validation the most common configurations are the following:
$generate_modboth set to 0. The plugin does not generate
Last-modifiedheaders, and does not check
If-modified-sinceheaders in the request. Use this configuration if you plan to allow caching of responses (as discussed below) but for some reason you don't want to do validation.
$generate_modboth set to 1,
$strongset to 0. The plugin generates both
Last-modifiedheaders based on the most recent time that any entry on the page was modified; it checks for
If-modified-sinceheaders in the request, and sends a 304 (Not Modified) response with no output when it can do so. Use this configuration if changes to your pages are only (or at least primarily) due to changes to the entries themselves. This is the default configuration, for compatibility with the lastmodified plugin.
$strongall set to 1. The plugin generates both
Last-modifiedheaders based on the current page's contents and our estimate as to when the contents were last modified; the plugin checks for
If-modified-sinceheaders in the request, and sends a 304 response when it can do so. Use this configuration if your pages contain comments or other material that is updated more frequently than the entries themselves.
Strong validation using
$strongset to 1,
$generate_modset to 0. The plugin generates only an
Last-modified) and checks only for an
If-none-matchheader in the request (not
If-modified-since). Use this configuration if you want to support strong validation but don't want the performance overhead of caching
Last-modifiedvalues as previously described. Note that this configuration does not support validation for HTTP 1.0 clients or other clients that do not support validation using
Note that if you set
$strong to 1 then you
might as well set
$generate_etag to 1 as well, since correctly
Last-modified as a strong validator requires that we generate
and cache MD5 digests of the page in order to detect any changes, and
these digests are also what we use to generate
For caching the most common configurations are the following:
$generate_expiresboth set to 0. The plugin does not generate either a
Expiresheader, and thus web proxies and other clients will typically not cache returned pages. This is the default configuration; use it if you don't care about caching.
$generate_expiresboth set to 1, and
$freshness_timeset to a positive integer value. The plugin generates
Expiresheaders that allow for caching of returned pages for up to
$freshness_timeseconds from the time of the request. Use this configuration if you'd like to allow caching by proxies and other clients to reduce server hits due to GET requests (whether conditional or not), and set
$freshness_timeto a value comparable to the frequency with which your site is updated.
$freshness_timeis set to 3,000 seconds, long enough to provide some benefit through caching by web proxies, especially during periods of heavy load, but short enough to ensure that news aggregators doing hourly polling will always use up-to-date copies of feeds.)
$generate_expiresboth set to 1, and
$freshness_timeset to 0. The plugin generates
Expiresheaders that specifically prohibit caching of returned pages. Use this configuration if you want all clients to always see the most up-to-date content.
Note that if you set
$generate_cache to 1 then you might as well
$generate_expires to 1 and vice versa, in order to properly
support both HTTP 1.1 and HTTP 1.0 clients; there is no performance
penalty for doing so.
The other configurable variables are as follows:
$generate_lengthcontrols whether or not generate a
Content-lengthheader. The default is to generate the header; you can disable this by setting
$generate_lengthto 0. Note that support of HTTP 1.0 persistent connections using
Content-lengthrequires that your web server be configured to support persistent connections in the first place; for Apache this is done using the
KeepAlive Ondirective in the Apache configuration file.
Also note that HTTP 1.1 clients can use persistent connections even if the
Content-lengthheader is not present, if (like Apache) the underlying web server supports HTTP 1.1 persistent connections for CGI scripts using the
Connectionheader and chunked transfer coding. However we generate a
Content-lengthheader by default because it's recommended by section 14.13 of the HTTP 1.1 specification.
$use_otherscontrols whether changes to flavour files and other non-entry files in the Blosxom data directory should also be considered semantically significant for weak validation. Note that this feature is provided only for compatibility with the lastmodified plugin and its use is deprecated; by default it is disabled.
$export_datescontrols whether or not the plugin should set the following variables for use in flavour templates and other plugins:
$now_iso8601: Current date/time, in RFC 822 and ISO 8601 formats respectively. These variables can be used in any flavour template.
$latest_iso8601: Date/time modified of the most recently modified entry to be displayed on the page, in RFC 822 and ISO 8601 formats respectively. These variables can be used in any flavour template.
$others_iso8601: Date/time modified of the most recently modified non-entry file in the Blosxom data directory, in RFC 822 and ISO8601 formats respectively. These variables can be used in any flavour template, but are set only if
$use_othersis set to 1.
$story_iso8601: Date/time modified of the current entry, in RFC 822 and ISO 8601 formats respectively. These variables can be used in the story and date templates.
Note that the ISO 8601 format produced is the complete date plus hours, minutes and seconds:
You can set the variable
$debugto 1 or greater to produce additional information useful in debugging the operation of the plugin; the debug output is sent to your web server's error log.
lastsubroutines. It needs to run after any other plugin whose
filtersubroutine changes the list of entries included in the response; otherwise the
Last-modifieddate may be computed incorrectly. It needs to run after any other plugin whose
skipsubroutine does redirection (e.g., the canonicaluri plugin) or otherwise conditionally sets the HTTP status to any value other than 200. Finally, this plugin needs to run after any other plugin whose
lastsubroutine changes the output for the page; otherwise the
Content-lengthvalue (and the
Last-modifiedvalues, if you are using strong validation) may be computed incorrectly. If you are encountering problems in any of these regards then you can force the plugin to run after other plugins by renaming it to, e.g., 99lastmodified2.
Several of the following items are not in fact bugs, but the behaviors in question may cause confusion in some cases; hence their inclusion here:
As discussed above, with weak validation the
Last-modifiedheader generated may not always reflect the date/time at which the bit-for-bit contents of the page most recently changed, and the contents of the page may change without changing the
ETagvalue. In particular, if changes are made to flavour files used in generating the page or comments are added to a page via variable interpolation (e.g., as done by the writeback plugin and others) then a user will not necessarily see such changes without forcing an full reload of the page (i.e., using an unconditional GET request). This should be considered a feature and not a bug; if you are not comfortable with this behavior then you should set
$strongto 1 to enable strong validation.
ETaggeneration is enabled and
Last-modifieddisabled (or vice versa) and a request includes both an
If-modified-sinceheader, the plugin will not return a 304 response under any circumstances. This is not a bug, but rather complies with section 13.3.4 of the HTTP 1.1 specification: "An HTTP/1.1 origin server, upon receiving a conditional request that includes both a Last-Modified date (e.g., in an If-Modified-Since or If-Unmodified-Since header field) and one or more entity tags (e.g., in an If-Match, If-None-Match, or If-Range header field) as cache validators, MUST NOT return a response status of 304 (Not Modified) unless doing so is consistent with all of the conditional header fields in the request."
In other words, if a conditional request contains both tests and we can't perform one of the tests (because we're not generating the header value used in the test) then we can't return a 304 regardless of the results of the other test.
Expiresheaders are enabled then a user requesting to view a page will not necessarily see updates to that page even if the underlying entries have been changed since the last time the user viewed the page. This should be considered a feature and not a bug; if you are not comfortable with this behavior then you should not enable generation of the
Expiresheaders, or you should explicitly prohibit caching by setting the freshness time to 0.
When using the
Expiresheader to prohibit caching, for strict consistency with the HTTP 1.1 specification (section 14.21) the date/time sent with the
Expiresheader should be equal to the date/time sent with the
Dateheader. However we don't necessarily know what the exact
Datevalue is (at least not for Apache, where it is generated by the server itself), and it's possible that the current date/time as measured in the plugin itself may be a little bit later than the time in the
Dateheader, so instead we set the
Expiresvalue to be a minute before the current date/time (as measured in the plugin itself).
This should produce correct behavior for HTTP 1.0 clients relying on the
Expiresheader, per section 10.7 of the HTTP 1.0 specification, as well as for HTTP 1.1 clients in the absence of a
Cache-controlheader, per section 14.9.3 of the HTTP 1.1 specification.
As noted previously, if we're doing strong validation using
Last-modifiedand we don't have a cached
Last-modifiedvalue then we have to make up one; we arbitrarily set it to 5 seconds prior to the current time. Since our value for the current time may be later than that used in the
Dateheader (as noted in the previous item), it's possible that the
Last-modifiedvalue generated may be in the future relative to that sent with the
Dateheader, especially if the CGI script takes a long time to run (e.g., because of heavy load). This violates the HTTP 1.1 specification (see section 14.29).
The probabability of this happening could be lessened by setting an earlier
Last-modifiedtime; however this increases the possibility of having two updates occur within the n-second time window between the ostensible
Last-modifiedtime and the current time, and there may be race conditions associated with this that could cause other problems, such as sending a
Last-modifiedvalue that's earlier than one sent previously for the same URI.
With strong validation using
Last-modifiedit's possible that the plugin may attempt to update the cache file while another plugin invocation (resulting from a simultaneous request) may attempt to read it; more seriously, two plugin invocations may attempt to both update the validator cache file simultaneously. I've tried to minimize problems relating to this by having the plugin write out cache data to a temporary file and then rename it to the real file; if the rename is an atomic operation then this should eliminate the problem of a plugin invocation trying to read from a partially-written validator cache file.
As for simultaneous updates, presumably the worst that can happen is that one of the plugin invocations will fail to update the cache entry for its URI (since its changes will be overwritten by the second plugin invocation); however this simply means that the plugin won't be able to send a 304 on a subsequent conditional GET for that URI, and will then have to update the cache file again.
As noted above, you may experience problems if you install this plugin with other plugins that set HTTP status in the
skipsubroutine. Blosxom stops executing
skipsubroutines as soon as one returns a true value, so whichever plugin is first in the execution order will get to set the final HTTP status.
Here are some ideas for ways in which the lastmodified2 plugin could be enhanced and extended:
Support selective use of strong or weak validation depending on the flavour. For example, weak validation would probably work fine for RSS and Atom feeds, since they typically contain content only for entries; however strong validation may be needed for the HTML flavour of individual entry pages (and, to a lesser extent, HTML index pages) in order to pick up changes due to comments.
Note that doing this would be perfectly compatible with the HTTP 1.1 protocol specification, since different flavours correspond to different URIs; any given URI (or set of URIs) could be either strongly validated or weakly validated independent of any other URIs.
Support specifying different freshness times for different types of content, e.g., for different flavours, for individual entries vs. entry index pages, and/or for current index pages vs. archive index pages.
Try to make the filename for the validator cache temporary file more unique to minimize the possibility of name collisions by simultaneous plugin invocations. (Perhaps use Time::HiRes to get subsecond times?)
For completeness, support the case where the
If-none-matchheader has the value '*' (which matches any entity). See section 14.26 of the HTTP 1.1 specification for the desired behavior in this case.
For completeness, support conditional GETs using the
If-unmodified-sinceheaders within the plugin itself, in addition to
If-modified-since. However note that this doesn't appear to be necessary for Apache, since it appears to correctly make these checks as long as
Last-modifiedheaders are returned by the CGI script.