HTTP Conditional Get for RSS Hackers
Given the massive confusion exhibited here, I've written a nice, simple guide on how to implement HTTP's Conditional GET mechanism, with regards to producers and consumers of RSS feeds.
This article presumes you are familiar with the mechanics of an HTTP query, and understand the layout of request, response, header and body.
What is a conditional get?
My full-length RSS feed is about 24,000 bytes long. It probably gets updated on average twice a day, but given the current tools, people still download the whole thing every hour to see if it's changed yet. This is obviously a waste of bandwidth. What they really should do, is first ask whether it's changed or not, and only download it if it has.
The people who invented HTTP came up with something even better. HTTP allows you to say to a server in a single query: “If this document has changed since I last looked at it, give me the new version. If it hasn't just tell me it hasn't changed and give me nothing.” This mechanism is called “Conditional GET”, and it would reduce 90% of those significant 24,000 byte queries into really trivial 200 byte queries.
Client implementation
The mechanism for performing a conditional get has changed slightly between HTTP versions 1.0 and 1.1. Like many things that changed between 1.0 and 1.1, you really have to do both to make sure you're satisfying everybody.
When you receive the RSS file from the webserver, check the response header for two fields: Last-Modified and ETag. You don't have to care what is in these headers, you just have to store them somewhere with the RSS file.
Next time you request the RSS file, include two headers in your request.. Your If-Modified-Since header should contain the value you snagged from the Last-Modified header earlier. The If-None-Match header should contain the value you snagged from the ETag header.
If the RSS file has changed since you last requested it, the server will send you back the new RSS file in the perfectly normal way. However, if the RSS file has not changed, the server will respond with a ‘304’ response code (instead of the usual 200), where 304 means ‘Not Modified’. In the case of a 304, the response will have an empty body and the RSS file won't be sent back to you at all.
There's a temptation for clients to put their own date in the If-Modified-Since header, instead of just copying the one the server sent. This is a bad thing, what you should be sending back is exactly the same date the server sent you when you received the file. There's two reasons for this. Firstly, your computer's clock is unlikely to be exactly synchronised with the webserver, so the server could still send you files by mistake. Secondly, if the server programmer has followed this guide (see below), it'll only work if you send back exactly what you received.

