How to Deal with Cookies in Varnish stack How to Deal with Cookies in Varnish stack linux linux

How to Deal with Cookies in Varnish stack


That is a lot of questions! :-)

Q. Is this a safe approach?

On the surface, I would say so.

Generally, setting up Varnish on a news site where there is a high volume of traffic and fast-changing content can be a challenge.

A really good way to check is to build a single varnish box and give it direct access to your cluster (not via the load balancer), and give it a temporary public IP address. That will give you a chance to test against VCL changes. You will be able to test commenting, logging in (if you have it), and anything else to make sure there are no surprises.

Q. Will Google still track properly, including repeat visitors?

Yes. The cookies are only used on the client side.

One thing you should watch is that when the backend sends a cookie, Varnish will not cache the content either. You will need to remove any cookies that are not required on vcl_fetch. This might be a problem if cookies are used to track user state.

Q. Is there anything else that I need to watch for in my policies for phase1?

You will need to disable rack-cache in Rails, and set your own headers. Be aware that should you remove varnish, Rails will be running with no cacheing and probably will collapse!

This is what I have in my production.rb:

  # We do not use Rack::Cache but rely on Varnish instead  config.middleware.delete Rack::Cache  # varnish does not support etags or conditional gets  # to the backend (which is this app) so remove them too  config.middleware.delete Rack::ETag  config.middleware.delete Rack::ConditionalGet

And in my application_controller I have this private method:

def set_public_cache_control(duration)  if current_user    response.headers["Cache-Control"] = "max-age=0, private, must-revalidate"  else    expires_in duration, :public => true    response.headers["Expires"] = CGI.rfc1123_date(Time.now + duration)  endend

That is called in my other controllers so that I have very fine-grained control over how much chacheing is applied to various parts of the site. I use a setup method in each controller that is run as a before_filter:

def setup  set_public_cache_control 10.minutesend

(The application_controller has the filter and a blank setup method, so it can be optional in the other controllers)

If you have a part of the site that does not require cookies you can strip them off based on URL in the VCL, and apply headers.

You can set the cache time for your static assets in your apache config like this (assuming you are using the default asset path):

<LocationMatch "^/assets/.*$">    Header unset ETag    FileETag None    # RFC says only cache for 1 year    ExpiresActive On    ExpiresDefault "access plus 1 year"    Header append Cache-Control "public"</LocationMatch><LocationMatch "^/favicon\.(ico|png)$">    Header unset ETag    FileETag None    ExpiresActive On    ExpiresDefault "access plus 1 day"    Header append Cache-Control "public"</LocationMatch><LocationMatch "^/robots.txt$">    Header unset ETag    FileETag None    ExpiresActive On    ExpiresDefault "access plus 1 hour"    Header append Cache-Control "public"</LocationMatch>

Those headers will be sent to your CDN which will cache the assets for much longer. Watching varnish you'll still see requests coming in at a declining rate.

I would also set very short caching on all content where the pages don't need cookies, but change quite often. In my case I set a cache time of 10 seconds for the home page. What this means for Varnish is that one user request will go to the backend every 10 seconds.

You should also consider setting varnish to use grace mode. This allows it to serve slightly stale content from the cache in preference to exposing visitors to a slow response from the backend for items that have just expired.

Q. There are plenty of archived articles that don't get updated, is it safe to cache them forever?

To do this you would need to change your app to send different headers for those articles which are archived. This assumes they won't have cookies. Based on what I do on my site, I would do it this way:-

In the setup above add a conditional to change the cache time:

def setup  # check if it is old. This code could be anything  if news.last_updated_at < 1.months.ago    set_public_cache_control 1.year  else    set_public_cache_control 10.minutes  endend 

This sets a public header, so Varnish will cache it (if there are no cookies), and so will any remote caches (at ISP or corporate gateways).

The problem with this is if you want to remove the story, or update it (say, for legal reasons).

In that case you should send Varnish a private header to change the TTL for that one URL, but send a shorter public header for everyone else.

That would allow you to set Varnish to serve the content for (say) 1 year, while it sends out headers to tell clients to come back every 10 minutes.

You would need to add a regime to purge varnish in those cases.

TO get you started I have a second method in my application_controller:

def set_private_cache_control(duration=5.seconds)  # logged in users never have cached content so no TTL allowed  if ! current_user    # This header MUST be a string or the app will crash    if duration      response.headers["X-Varnish-TTL"] = duration.to_s    end  endend

And in my vcl_fetch I have this:

call set_varnish_ttl_from_header;

and the vcl function is this:

sub set_varnish_ttl_from_header {  if (beresp.http.X-Varnish-TTL) {    C{        char *x_end = 0;      const char *x_hdr_val = VRT_GetHdr(sp, HDR_BERESP, "\016X-Varnish-TTL:"); /* "\016" is length of header plus colon in octal */      if (x_hdr_val) {        long x_cache_ttl = strtol(x_hdr_val, &x_end, 0);        if (ERANGE != errno && x_end != x_hdr_val && x_cache_ttl >= 0 && x_cache_ttl < INT_MAX) {          VRT_l_beresp_ttl(sp, (x_cache_ttl * 1));        }      }    }C    remove beresp.http.X-Varnish-TTL;  }}

That is so the header does NOT get passed on (which s-max-age does) to any upstream caches.

The setup method would look like this:

def setup  # check if it is old. This code could be anything  if news.last_updated_at < 1.months.ago    set_public_cache_control 10.minutes    set_private_cache_control 1.year  else    set_public_cache_control 10.minutes  endend 

Feel free to ask any supplementary questions, and I'll update this answer!