Skip to content

zhuomingliang/srcache-nginx-module

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Name

ngx_srcache - Transparent subrequest-based caching layout for arbitrary nginx locations

This module is not distributed with the Nginx source. See the installation instructions.

Status

This module is production ready.

Version

This document describes srcache-nginx-module v0.13rc2 released on 15 October 2011.

Synopsis

upstream my_memcached {
    server 10.62.136.7:11211;
    keepalive 512 single; # this requires the ngx_http_upstream_keepalive module
}

location = /memc {
    internal;

    memc_connect_timeout 100ms;
    memc_send_timeout 100ms;
    memc_read_timeout 100ms;

    set $memc_key $query_string;
    set $memc_exptime 300;

    memc_pass my_memcached;
}

location /foo {
    set $key $uri$args;
    srcache_fetch GET /memc $key;
    srcache_store PUT /memc $key;
    srcache_store_statuses 200 301 302;

    # proxy_pass/fastcgi_pass/drizzle_pass/echo/etc...
    # or even static files on the disk
}



location = /memc2 {
    internal;

    memc_connect_timeout 100ms;
    memc_send_timeout 100ms;
    memc_read_timeout 100ms;

    set_unescape_uri $memc_key $arg_key;
    set $memc_exptime $arg_exptime;

    memc_pass unix:/tmp/memcached.sock;
}

location /bar {
    set_escape_uri $key $uri$args;
    srcache_fetch GET /memc2 key=$key;
    srcache_store PUT /memc2 key=$key&exptime=$srcache_expire;

    # proxy_pass/fastcgi_pass/drizzle_pass/echo/etc...
    # or even static files on the disk
}

Description

This module provides a transparent caching layer for arbitrary nginx locations (like those use an upstream or even serve static disk files). The caching behavior is mostly compatible with RFC 2616.

Usually, HttpMemcModule is used together with this module to provide a concrete caching storage backend. But technically, any modules that provide a REST interface can be used as the fetching and storage subrequests used by this module.

For main requests, the srcache_fetch directive works at the end of the access phase, so the standard access module's allow and deny direcives run before ours, which is usually the desired behavior for security reasons.

Subrequest caching

For subrequests, we explicitly disallow the use of this module because it's too difficult to get right. There used to be an implementation but it was buggy and I finally gave up fixing it and abandoned it.

However, if you're using HttpLuaModule, it's easy to do subrequest caching in Lua all by yourself. That is, first issue a subrequest to an HttpMemcModule location to do an explicit cache lookup, if cache hit, just use the cached data returned; otherwise, fall back to the true backend, and finally do a cache insertion to feed the data into the cache.

Using this module for main request caching and Lua for subrequest caching is the approach that we're taking in our business. This hybrid solution works great in production.

Distributed Memcached Caching

Here is a simple example demonstrating a distributed memcached caching mechanism built atop this module. Suppose we do have three different memcacached nodes and we use simple modulo to hash our keys.

http {
    upstream moon {
        server 10.62.136.54:11211;
        server unix:/tmp/memcached.sock backup;
    }

    upstream earth {
        server 10.62.136.55:11211;
    }

    upstream sun {
        server 10.62.136.56:11211;
    }

    upstream_list universe moon earth sun;

    server {
        memc_connect_timeout 100ms;
        memc_send_timeout 100ms;
        memc_read_timeout 100ms;

        location = /memc {
            internal;

            set $memc_key $query_string;
            set_hashed_upstream $backend universe $memc_key;
            set $memc_exptime 3600; # in seconds
            memc_pass $backend;
        }

        location / {
            set $key $uri;
            srcache_fetch GET /memc $key;
            srcache_store PUT /memc $key;

            # proxy_pass/fastcgi_pass/content_by_lua/drizzle_pass/...
        }
    }
}

Here's what is going on in the sample above:

  1. We first define three upstreams, moon, earth, and sun. These are our three memcached servers.
  2. And then we group them together as an upstream list entity named universe with the upstream_list directive provided by HttpSetMiscModule.
  3. After that, we define an internal location named /memc for talking to the memcached cluster.
  4. In this /memc location, we first set the $memc_key variable with the query string ($args), and then use the set_hashed_upstream directive to hash our $memc_key over the upsteam list universe, so as to obtain a concrete upstream name to be assigned to the variable $backend.
  5. We pass this $backend variable into the memc_pass directive. The $backend variable can hold a value among moon, earth, and sun.
  6. Also, we define the memcached caching expiration time to be 3600 seconds (i.e., an hour) by overriding the $memc_exptime variable.
  7. In our main public location /, we configure the $uri variable as our cache key, and then configure srcache_fetch for cache lookups and srcache_store for cache updates. We're using two subrequests to our /memc location defined earlier in these two directives.

One can use HttpLuaModule's set_by_lua or rewrite_by_lua directives to inject custom Lua code to compute the $backend and/or $key variables in the sample above.

One thing that should be taken care of is that memcached does have restriction on key lengths, i.e., 250 bytes, so for keys that may be very long, one could use the set_md5 directive or its friends to pre-hash the key to a fixed-length digest before assigning it to $memc_key in the /memc location or the like.

Further, one can utilize the srcache_fetch_skip and srcache_store_skip directives to control what to cache and what not on a per-request basis, and Lua can also be used here in a similar way. So the possibility is really unlimited.

To maximize speed, we often enable TCP (or Unix Domain Socket) connection pool for our memcached upstreams provided by HttpUpstreamKeepaliveModule, for example,

upstream moon {
    server 10.62.136.54:11211;
    server unix:/tmp/memcached.sock backup;
    keepalive 512;
}

where we define a connection pool which holds up to 512 keep-alive connections for our moon upstream (cluster).

Directives

srcache_fetch

syntax: srcache_fetch <method> <uri> <args>?

default: no

context: http, server, location, location if

phase: post access

This directive registers an access phase handler that will issue an Nginx subrequest to lookup the cache.

When the subrequest returns status code other than 200, than a cache miss is signaled and the control flow will continue to the later phases including the content phase configured by HttpProxyModule, HttpFcgiModule, and others. If the subrequest returns 200 OK, then a cache hit is signaled and this module will send the subrequest's response as the current main request's response to the client directly.

This directive will always run at the end of the access phase, such that HttpAccessModule's allow and deny will always run before this.

You can use the srcache_fetch_skip directive to disable cache look-up selectively.

srcache_fetch_skip

syntax: srcache_fetch_skip <flag>

default: srcache_fetch_skip 0

context: http, server, location, location if

phase: post access

The <flag> argument supports nginx variables. When this argument's value is not empty and not equal to 0, then the fetching process will be unconditionally skipped.

For example, to skip caching requests which have a cookie named foo with the value bar, we can write

location / {
    set $key ...;
    set_by_lua $skip '
        if ngx.var.cookie_foo == "bar" then
            return 1
        end
        return 0
    ';

    srcache_fetch_skip $skip;
    srcache_store_skip $skip;

    srcache_fetch GET /memc $key;
    srcache_store GET /memc $key;

    # proxy_pass/fastcgi_pass/content_by_lua/...
}

where HttpLuaModule is used to calculate the value of the $skip variable at the (earlier) rewrite phase. Similarly, the $key variable can be computed by Lua using the set_by_lua or rewrite_by_lua directive too.

srcache_store

syntax: srcache_store <method> <uri> <args>?

default: no

context: http, server, location, location if

phase: output filter

This directive registers an output filter handler that will issue an Nginx subrequest to save the response of the current main request into a cache backend. The status code of the subrequest will be ignored.

You can use the srcache_store_skip and srcache_store_max_size directives to disable caching for certain requests in case of a cache miss.

Since the v0.12rc7 release, both the response status line, response headers, and response bodies will be put into the cache. By default, the following special response headers will not be cached:

  • Connection
  • Keep-Alive
  • Proxy-Authenticate
  • Proxy-Authorization
  • TE
  • Trailers
  • Transfer-Encoding
  • Upgrade
  • Set-Cookie

You can use the srcache_store_pass_header and/or srcache_store_hide_header directives to control what headers to cache and what not.

This directive works in an output filter.

srcache_store_max_size

syntax: srcache_store_max_size <size>

default: srcache_store_max_size 0

context: http, server, location, location if

When the response body length is exceeding this size, this module will not try to store the response body into the cache using the subrequest template that is specified in srcache_store.

This is particular useful when using cache storage backend that does have a hard upper limit on the input data. For example, for Memcached server, the limit is usually 1 MB.

When 0 is specified (the default value), there's no limit check at all.

srcache_store_skip

syntax: srcache_store_skip <flag>

default: srcache_store_skip 0

context: http, server, location, location if

phase: output filter

The <flag> argument supports Nginx variables. When this argument's value is not empty and not equal to 0, then the storing process will be unconditionally skipped.

Here's an example using Lua to set $nocache to avoid storing URIs that contain the string "/tmp":

set_by_lua $nocache '
    if string.match(ngx.var.uri, "/tmp") then
        return 1
    end
    return 0';

srcache_store_skip $nocache;

srcache_store_statuses

syntax: srcache_store_statuses <status1> <status2> ..

default: srcache_store_statuses 200 301 302

context: http, server, location, location if

phase: output filter

This directive controls what responses to store to the cache according to their status code.

By default, only 200, 301, and 302 responses will be stored to cache and any other responses will skip srcache_store.

You can specify arbitrary positive numbers for the response status code that you'd like to cache, even including error code like 404 and 503. For example:

srcache_store 200 201 301 302 404 503;

At least one argument should be given to this directive.

This directive was first introduced in the v0.13rc2 release.

srcache_header_buffer_size

syntax: srcache_header_buffer_size <size>

default: srcache_header_buffer_size 4k/8k

context: http, server, location, location if

phase: output filter

This directive controles the header buffer when serializing response headers for srcache_store. The default size is the page size, usually 4k or 8k depending on specific platforms.

Note that the buffer is not used to hold all the response headers, but just each individual header. So the buffer is merely needed to be big enough to hold the longest response header.

This directive was first introduced in the v0.12rc7 release.

srcache_store_hide_header

syntax: srcache_store_hide_header <header>

default: no

context: http, server, location, location if

phase: output filter

By default, this module caches all the response headers except the following ones:

  • Connection
  • Keep-Alive
  • Proxy-Authenticate
  • Proxy-Authorization
  • TE
  • Trailers
  • Transfer-Encoding
  • Upgrade
  • Set-Cookie

You can hide even more response headers from srcache_store by listing their names (case-insensitive) by means of this directive. For examples,

srcache_store_hide_header X-Foo;
srcache_store_hide_header Last-Modified;

Multiple occurrences of this directive are allowed in a single location.

This directive was first introduced in the v0.12rc7 release.

See also srcache_store_pass_header.

srcache_store_pass_header

syntax: srcache_store_pass_header <header>

default: no

context: http, server, location, location if

phase: output filter

By default, this module caches all the response headers except the following ones:

  • Connection
  • Keep-Alive
  • Proxy-Authenticate
  • Proxy-Authorization
  • TE
  • Trailers
  • Transfer-Encoding
  • Upgrade
  • Set-Cookie

You can force srcache_store to store one or more of these response headers from srcache_store by listing their names (case-insensitive) by means of this directive. For examples,

srcache_store_pass_header Set-Cookie;
srcache_store_pass_header Proxy-Autenticate;

Multiple occurrences of this directive are allowed in a single location.

This directive was first introduced in the v0.12rc7 release.

See also srcache_store_hide_header.

srcache_methods

syntax: srcache_methods <method>...

default: srcache_methods GET HEAD

context: http, server, location

phase: output filter

This directive specifies HTTP request methods that are considered by either srcache_fetch or srcache_store. HTTP request methods not listed will be skipped completely from the cache.

The following HTTP methods are allowed: GET, HEAD, POST, PUT, and DELETE. The GET and HEAD methods are always implicitly included in the list regardless of their presence in this directive.

This directive was first introduced in the v0.12rc7 release.

srcache_ignore_content_encoding

syntax: srcache_ignore_content_encoding on|off

default: srcache_ignore_content_encoding off

context: http, server, location, location if

phase: output filter

When this directive is turned off (which is the default), non-empty Content-Encoding response header will cause srcache_store skip storing the whole response into the cache and issue a warning into nginx's error.log file like this:

[warn] 12500#0: *1 srcache_store skipped due to response header "Content-Encoding: gzip"
            (maybe you forgot to disable compression on the backend?)

Turning on this directive will ignore the Content-Encoding response header and store the response as usual (and also without warning).

It's recommended to always disable gzip/deflate compression on your backend server by specifying the following line in your nginx.conf file:

proxy_set_header  Accept-Encoding  "";

This directive was first introduced in the v0.12rc7 release.

srcache_request_cache_control

syntax: srcache_request_cache_control on|off

default: srcache_request_cache_control off

context: http, server, location

When this directive is turned on, the request headers Cache-Control and Pragma will be honored by this module in the following ways:

  1. srcache_fetch, i.e., the cache lookup operation, will be skipped when request headers Cache-Control: no-cache and/or Pragma: no-cache are present.
  2. srcache_store, i.e., the cache store operation, will be skipped when the request header Cache-Control: no-store is specified.

Turning off this directive will disable this functionality and is considered safer for busy sites mainly relying on cache for speed.

This directive was first introduced in the v0.12rc7 release.

See also srcache_response_cache_control.

srcache_response_cache_control

syntax: srcache_response_cache_control on|off

default: srcache_response_cache_control on

context: http, server, location

When this directive is turned on, the response headers Cache-Control and Expires will be honored by this module in the following ways:

This directive takes priority over the srcache_store_no_store, srcache_store_no_cache, and srcache_store_private directives.

This directive was first introduced in the v0.12rc7 release.

See also srcache_request_cache_control.

srcache_store_no_store

syntax: srcache_store_no_store on|off

default: srcache_store_no_store off

context: http, server, location

phase: output filter

Turning this directive on will force responses with the header Cache-Control: no-store to be stored into the cache when srcache_response_cache_control is turned on and other conditions are met. Default to off.

This directive was first introduced in the v0.12rc7 release.

srcache_store_no_cache

syntax: srcache_store_no_cache on|off

default: srcache_store_no_cache off

context: http, server, location

phase: output filter

Turning this directive on will force responses with the header Cache-Control: no-cache to be stored into the cache when srcache_response_cache_control is turned on and other conditions are met. Default to off.

This directive was first introduced in the v0.12rc7 release.

srcache_store_private

syntax: srcache_store_private on|off

default: srcache_store_private off

context: http, server, location

phase: output filter

Turning this directive on will force responses with the header Cache-Control: private to be stored into the cache when srcache_response_cache_control is turned on and other conditions are met. Default to off.

This directive was first introduced in the v0.12rc7 release.

srcache_default_expire

syntax: srcache_default_expire <time>

default: srcache_default_expire 60s

context: http, server, location, location if

phase: output filter

This directive controls the default expiration time period that is allowed for the $srcache_expire variable value when neither Cache-Control: max-age=N nor Expires are specified in the response headers.

The <time> argument values are in seconds by default. But it's wise to always explicitly specify the time unit to avoid confusion. Time units supported are "s"(seconds), "ms"(milliseconds), "y"(years), "M"(months), "w"(weeks), "d"(days), "h"(hours), and "m"(minutes). For example,

srcache_default_expire 30m; # 30 minutes

This time must be less than 597 hours.

This directive was first introduced in the v0.12rc7 release.

srcache_max_expire

syntax: srcache_max_expire <time>

default: srcache_max_expire 0

context: http, server, location, location if

phase: output filter

This directive controls the maximal expiration time period that is allowed for the $srcache_expire variable value. This setting takes priority over other calculating methods.

The <time> argument values are in seconds by default. But it's wise to always explicitly specify the time unit to avoid confusion. Time units supported are "s"(seconds), "ms"(milliseconds), "y"(years), "M"(months), "w"(weeks), "d"(days), "h"(hours), and "m"(minutes). For example,

srcache_max_expire 2h;  # 2 hours

This time must be less than 597 hours.

When 0 is specified, which is the default setting, then there will be no limit at all.

This directive was first introduced in the v0.12rc7 release.

Variables

$srcache_expire

type: integer

cacheable: no

writable: no

This Nginx variable gives the recommended expiration time period (in seconds) for the current response being stored into the cache. The algorithm of computing the value is as follows:

  1. When the response header Cache-Control: max-age=N is specified, then N will be used as the expiration time,
  2. otherwise if the response header Expires is specified, then the expiration time will be obtained by subtracting the current time stamp from the time specified in the Expires header,
  3. when neither Cache-Control: max-age=N nor Expires headers are specified, use the value specified in the srcache_default_expire directive.

The final value of this variable will be the value specified by the srcache_max_expire directive if the value obtained in the algorithm above exceeds the maximal value (if any).

You don't have to use this variable for the expiration time.

This variable was first introduced in the v0.12rc7 release.

Known Issues

  • On certain systems, enabling aio and/or sendfile may stop srcache_store from working. You can disable them in the locations configured by srcache_store.

Caveats

  • It's recommended to disable your backend server's gzip compression and use nginx's HttpGzipModule to do the job. In case of HttpProxyModule, you can use the following configure setting to disable backend gzip compression:

    proxy_set_header Accept-Encoding "";

Installation

It's recommended to install this module as well as the Nginx core and many other goodies via the ngx_openresty bundle. It's the easiest way and most safe way to set things up. See OpenResty's installation instructions for details.

Alternatively, you can build Nginx with this module all by yourself:

  • Grab the nginx source code from nginx.org, for example, the version 1.0.9 (see Nginx Compatibility),

  • and then download the latest version of the release tarball of this module from srcache-nginx-module file list,

  • and finally build the Nginx source with this module

      wget 'http://nginx.org/download/nginx-1.0.9.tar.gz'
      tar -xzvf nginx-1.0.9.tar.gz
      cd nginx-1.0.9/
    
      # Here we assume you would install you nginx under /opt/nginx/.
      ./configure --prefix=/opt/nginx \
           --add-module=/path/to/srcache-nginx-module
    
      make -j2
      make install
    

Compatibility

The following versions of Nginx should work with this module:

  • 1.1.x (last tested: 1.1.5)
  • 1.0.x (last tested: 1.0.9)
  • 0.9.x (last tested: 0.9.4)
  • 0.8.x (last tested: 0.8.54)
  • 0.7.x >= 0.7.46 (last tested: 0.7.68)

Earlier versions of Nginx like 0.6.x and 0.5.x, as well as latest nginx 0.8.42+ will not work.

If you find that any particular version of Nginx above 0.7.44 does not work with this module, please consider reporting a bug.

Report Bugs

Although a lot of effort has been put into testing and code tuning, there must be some serious bugs lurking somewhere in this module. So whenever you are bitten by any quirks, please don't hesitate to

Source Repository

Available on github at agentzh/srcache-nginx-module.

ChangeLog

Test Suite

This module comes with a Perl-driven test suite. The test cases are declarative too. Thanks to the Test::Nginx module in the Perl world.

To run it on your side:

$ PATH=/path/to/your/nginx-with-srcache-module:$PATH prove -r t

You need to terminate any Nginx processes before running the test suite if you have changed the Nginx server binary.

Because a single nginx server (by default, localhost:1984) is used across all the test scripts (.t files), it's meaningless to run the test suite in parallel by specifying -jN when invoking the prove utility.

Some parts of the test suite requires modules HttpRewriteModule, HttpEchoModule, HttpRdsJsonModule, and HttpDrizzleModule to be enabled as well when building Nginx.

TODO

  • add gzip compression and decompression support.
  • add new nginx variable $srcache_key and new directives srcache_key_ignore_args, srcache_key_filter_args, and srcache_key_sort_args.

Getting involved

You'll be very welcomed to submit patches to the author or just ask for a commit bit to the source repository on GitHub.

Author

Zhang "agentzh" Yichun (章亦春) agentzh@gmail.com

Copyright & License

Copyright (c) 2010, 2011 Taobao Inc., Alibaba Group ( http://www.taobao.com ).

Copyright (c) 2010, 2011, Zhang "agentzh" Yichun (章亦春) agentzh@gmail.com.

This module is licensed under the terms of the BSD license.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

See Also

About

Transparent subrequest-based caching layout for arbitrary nginx locations.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Perl 61.2%
  • C 37.9%
  • Other 0.9%