Archive for the ‘Caching Technologies’ category

Using pipe in Varnish

March 15th, 2011

Using pipe

In most cases, the pipe action is not used for anything. However, if you want to stream objects, particularly large ones like videos, big zip files, you can use pipe. Using pipe means Varnish stops inspecting each request and just shuffles bytes to the backend. This can lead to multiple failure modes, from sending requests to the wrong backend to exposing your backend directly to clients. It also means only the first request gets the X-Forwarded-For header added.

To work around this, you should make sure we close the backend connection after the first request.

sub vcl_pipe {
set bereq.http.connection = “close”;
}

Backend Declarations in Varnish

March 12th, 2011

A backend declaration creates and initializes a named backend object:
backend www {
.host = “www.manoj.com”;
.port = “http”;
}

The backend object can later be used to select a backend at request time:

if (req.http.host ~ “^(www.)?manoj.com$”) {
set req.backend = www;
}

The timeout parameters can be overridden in the backend declaration. The timeout parameters are .connect_timeout for the time to wait for a backend connection, .first_byte_timeout for the time to wait for the first byte from the backend and .between_bytes_timeout for time to wait between each received byte.

These can be set in the declaration like this:

backend www {
.host = “www.manoj.com”;
.port = “http”;
.connect_timeout = 1s;
.first_byte_timeout = 5s;
.between_bytes_timeout = 2s;
}

You can limit the amount of connections varnish will send to a backend like this, it help only when you want to limit the number of backend connections

backend www {
.host = “www.manoj.com”;
.port = “http”;
.max_connections = 200;
}

No package ‘libpcre’ found

March 3rd, 2011

I was getting “No package ‘libpcre’ found” error during Varnish compiling, so fixed it by installing gcc lib.  using below command

yum install gcc* -y

How to find hosts which cause most hits in Varnish

March 2nd, 2011

If your Varnish server is under constant high load and you are wondering who causes that varnishlog is a helpful tool.

Start off by logging IP addresses for a while.
varnishlog -c -i RxHeader -I X-Forwarded-For > varnish.log

After you have collected enough information use the following line to find out which IP caused the most hits on your varnish.
cat varnish.log | cut -d ‘ ‘ -f 12 | sort | uniq -c | sort -n

How to Leverage browser caching

March 1st, 2011

Leverage browser caching

Overview

Setting an expiry date or a maximum age in the HTTP headers for static resources instructs the browser to load previously downloaded resources from local disk rather than over the network.

Details

HTTP/S supports local caching of static resources by the browser. Some of the newest browsers (e.g. IE 7, Chrome) use a heuristic to decide how long to cache all resources that don’t have explicit caching headers. Other older browsers may require that caching headers be set before they will fetch a resource from the cache; and some may never cache any resources sent over SSL.

To take advantage of the full benefits of caching consistently across all browsers, we recommend that you configure your web server to explicitly set caching headers and apply them to all cacheable static resources, not just a small subset (such as images). Cacheable resources include JS and CSS files, image files, and other binary object files (media files, PDFs, Flash files, etc.). In general, HTML is not static, and shouldn’t be considered cacheable.

HTTP/1.1 provides the following caching response headers :

  • Expires and Cache-Control: max-age. These specify the “freshness lifetime” of a resource, that is, the time period during which the browser can use the cached resource without checking to see if a new version is available from the web server. They are “strong caching headers” that apply unconditionally; that is, once they’re set and the resource is downloaded, the browser will not issue any GET requests for the resource until the expiry date or maximum age is reached.
  • Last-Modified and ETag. These specify some characteristic about the resource that the browser checks to determine if the files are the same. In the Last-Modified header, this is always a date. In the ETag header, this can be any value that uniquely identifies a resource (file versions or content hashes are typical). Last-Modified is a “weak” caching header in that the browser applies a heuristic to determine whether to fetch the item from cache or not. (The heuristics are different among different browsers.) However, these headers allow the browser to efficiently update its cached resources by issuing conditional GET requests when the user explicitly reloads the page. Conditional GETs don’t return the full response unless the resource has changed at the server, and thus have lower latency than full GETs.

It is important to specify one of Expires or Cache-Control max-age, and one of Last-Modified or ETag, for all cacheable resources. It is redundant to specify both Expires and Cache-Control: max-age, or to specify both Last-Modified and ETag.

Example

Add the following code to your .htaccess file to set your Cache-Control and Expires headers, adjusting the date to be one year from today. I tested and i got good performance.  ico|pdf|flv|jpg|jpeg|png|gif|swf|mp3|mp4 all extensions  files are cached by the browser.

# Set Cache-Control and Expires headers
<filesMatch “\\.(ico|pdf|flv|jpg|jpeg|png|gif|swf|mp3|mp4)$”>
Header set Cache-Control “max-age=2592000, private”
Header set Expires “Sun, 17 July 2012 20:00:00 GMT”
</filesMatch>
<filesMatch “\\.(css|css.gz)$”>
Header set Cache-Control “max-age=604800, private”
</filesMatch>
<filesMatch “\\.(js|js.gz)$”>
Header set Cache-Control “max-age=604800, private”
</filesMatch>
<filesMatch “\\.(xml|txt)$”>
Header set Cache-Control “max-age=216000, private, must-revalidate”
</filesMatch>
<filesMatch “\\.(html|htm)$”>
Header set Cache-Control “max-age=7200, private, must-revalidate”
</filesMatch>

In my article I have recommend some methods on How Optimize and Tweak High-Traffic Servers. 

Varnish Configuration Language – VCL

November 9th, 2010

Varnish Configuration Language – VCL

Varnish has a great configuration system. Most other systems use configuration directives, where you basically turn on and off lots of switches. Varnish uses a domain specific language called Varnish Configuration Language, or VCL for short. Varnish translates this configuration into binary code which is then executed when requests arrive.

The VCL files are divided into subroutines. The different subroutines are executed at different times. One is executed when we get the request, another when files are fetched from the backend server.

Varnish will execute these subroutines of code at different stages of its work. Because it is code it is execute line by line precedence isn’t a problem. At some point you call an action in this subroutine and then the execution of the subroutine stops.

If you don’t call an action in your subroutine and it reaches the end Varnish will execute some built in VCL code. You will see this VCL code commented out in default.vcl.

99% of all the changes you’ll need to do will be done in two of these subroutines. vcl_recv and vcl_fetch.

vcl_recv

vcl_recv (yes, we’re skimpy with characters, it’s Unix) is called at the beginning of a request, after the complete request has been received and parsed. Its purpose is to decide whether or not to serve the request, how to do it, and, if applicable, which backend to use.

In vcl_recv you can also alter the request. Typically you can alter the cookies and add and remove request headers.

Note that in vcl_recv only the request object, req is available.
vcl_fetch

vcl_fetch is called after a document has been successfully retrieved from the backend. Normal tasks her are to alter the response headers, trigger ESI processing, try alternate backend servers in case the request failed.

In vcl_fetch you still have the request object, req, available. There is also a backend response, beresp. beresp will contain the HTTP headers from the backend.
actions

The most common actions to call are these:

pass
When you call pass the request and subsequent response will be passed to and from the backend server. It won’t be cached. pass can be called in both vcl_recv and vcl_fetch.

lookup
When you call lookup from vcl_recv you tell Varnish to deliver content from cache even if the request othervise indicates that the request should be passed. You can’t call lookup from vcl_fetch.

pipe
Pipe can be called from vcl_recv as well. Pipe short circuits the client and the backend connections and Varnish will just sit there and shuffle bytes back and forth. Varnish will not look at the data being send back and forth – so your logs will be incomplete. Beware that with HTTP 1.1 a client can send several requests on the same connection and so you should instruct Varnish to add a “Connection: close” header before actually calling pipe.

deliver
Deliver the cached object to the client. Usually called in vcl_fetch.

esi
ESI-process the fetched document.

Requests, responses and objects

In VCL, there are three important data structures. The request, coming from the client, the response coming from the backend server and the object, stored in cache.

In VCL you should know the following structures.

req
The request object. When Varnish has received the request the req object is created and populated. Most of the work you do in vcl_recv you do on or with the req object.

beresp
The backend respons object. It contains the headers of the object comming from the backend. Most of the work you do in vcl_fetch you do on the beresp object.

obj
The cached object. Mostly a read only object that resides in memory. obj.ttl is writable, the rest is read only.

Operators

The following operators are available in VCL. See the examples further down for, uhm, examples.

=
Assignment operator.

==
Comparison.

~
Match. Can either be used with regular expressions or ACLs.

!
Negation.

&&
Logical and

||
Logical or

Example 1 – manipulating headers

Lets say we want to remove the cookie for all objects in the /static directory of our web server::

sub vcl_recv {
if (req.url ~ “^/images”) {
unset req.http.cookie;
}
}

Now, when the request is handled to the backend server there will be no cookie header. The interesting line is the one with the if-statement. It matches the URL, taken from the request object, and matches it against the regular expression. Note the match operator. If it matches the Cookie: header of the request is unset (deleted).
Example 2 – manipulating beresp

Here we override the TTL of a object comming from the backend if it matches certain criteria::

sub vcl_fetch {
if (beresp.url ~ “\.(png|gif|jpg)$”) {
unset beresp.http.set-cookie;
set beresp.ttl = 3600;
}
}

Example 3 – ACLs

You create a named access control list with the acl keyword. You can match the IP address of the client against an ACL with the match operator.:

# Who is allowed to purge….
acl local {
“localhost”;
“192.168.1.0″/24; /* and everyone on the local network */
! “192.168.1.23″; /* except for the dialin router */
}

sub vcl_recv {
if (req.request == “PURGE”) {
if (client.ip ~ local) {
return(lookup);
}
}
}

sub vcl_hit {
if (req.request == “PURGE”) {
set obj.ttl = 0s;
error 200 “Purged.”;
}
}

sub vcl_miss {
if (req.request == “PURGE”) {
error 404 “Not in cache.”;
}
}

Some more examples

backend default {
.host = “127.0.0.1″;
.port = “7500″;
}
sub vcl_recv {

if (req.url ~ “gif”  ||
req.url ~ “\.jpg”  ||
req.url ~ “content_type=jpeg”) {
return (lookup);
}
#
if (req.http.Accept-Encoding) {
if ((req.http.Accept-Encoding ~ “gzip”) && (req.url ~ “zip=yes”)) {
return (pass);
}
}
return (lookup);
}
sub vcl_fetch {

if (req.url ~ “content_type=gif”  ||
req.url ~ “\.jpg” ||
req.url ~ “content_type=jpeg”) {
return (deliver);
}
if (req.http.Accept-Encoding) {
if ((req.http.Accept-Encoding ~ “gzip”) && (req.url ~ “zip=no”)) {
return (deliver);
}
}
if ( beresp.http.Content-Length ~ “[0-9]{1,}” ) {
return ( pass );
}
return (pass);
}

# This is the rule to knock out big files
if ( beresp.http.Content-Length ~
“[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]” ) {
return ( pass );
}

Useful Links:
http://web3us.com/drupal6/how-create-web-site-handbook/how-improve-web-performance-and-scalability-using-varnish/varnish-comma

http://manpages.ubuntu.com/manpages/hardy/man7/vcl.7.html
http://bart.motd.be/experimenting-with-varnish
http://man.cx/vcl%287%29
http://www.varnish-cache.org/docs/trunk/faq/general.html
http://tomayko.com/man/vcl.html
http://www.varnish-cache.org/trac/wiki/VCL#vcl_pipe

Varnish commands

November 9th, 2010

Some Important Varnish commands

* varnishncsa: Displays the varnishd shared memory logs in Apache / NCSA combined log format
* varnishlog: Reads and presents varnishd shared memory logs.
* varnishstat: Displays statistics from a running varnishd instance.
* varnishadm: Sends a command to the running varnishd instance.
* varnishhist: Reads varnishd shared memory logs and presents a continuously updated histogram showing the distribution of the last N requests by their processing.
* varnishtop: Reads varnishd shared memory logs and presents a continuously updated list of the most commonly occurring log entries.
* varnishreplay: Parses varnish logs and attempts to reproduce the traffic.

For further information and example of usage, please refer to the man-pages

varnishtop

This command shows the most often-made requests to the backend:

varnishtop -b -i TxURL

It’s excellent for spotting often-requested items that are currently not being cached. The “-b” flag filters for requests made to the backend. “-i TxURL” filters for the request URL that triggered the request to the backend.

Top of the list, most often-requested URL from the backend. A prime candidate for caching.

varnishhist

This command hows a histogram for the past 1000 requests, whether they were cache hits (denoted by a ‘|’) or misses (denoted by a ‘#’), and how long the requests took to process (further to the right, longer time). It’s good for a high-level view of how the server is doing under load.

varnishlog

varnishlog -c -o ReqStart

This command displays all varnish traffic for a specific client. It’s helpful for seeing exactly what a particular page or request is doing. Set it to your workstation IP, load the page, see everything Varnish does with your connection including hit/miss/pass status. Varnishlog is really useful, but it puts out an overwhelmingly-large amount of data that isn’t easily filtered. The “-o” option groups all of the entries for a specific request together (without it all entries from all requests are displayed fifo) and it accepts a tag (”ReqStart” in this example) and regex (the IP address in this case) to filter for only requests associated with that tag & regex. It’s the only way I’ve found to filter down the firehose of log entries into something useful.

This command provides an overview of the stats for the current Varnish instance. It shows hit/miss/pass rates and ratios, lots of other gory internal details.

Watch that RAM, or “vmstat, oh how I love thee!”

Varnish can eat RAM like there’s no tomorrow. Be careful and be sure to configure its max memory to be something less than your available RAM. I forgot when I first set things up. The system worked great for a while, and then took a nosedive as the Varnish cache ate up all the available RAM and pushed the system into a swap death spiral.

Thanks
Manoj Chauhan