periodic cache regeneration with rails redis and a spider

today i want to continue my little series about how to use redis as rails cache.

i will show you how to build a caching system that does not rely on cache invalidation and is able to constantly deliver up to date cached results.

if you haven’t read the articles before this one, you should at least skim them, since i’ll referring to some parts of the setup.

here at rapidrabbit we deliver millions of json files to our customers every day. all of them come from rails applications, but none of the customers ever really hit the rails apps.

why? imagine you have a million users, that get an alert, asking them to open your app, all at the same time.
now imagine you just invalidated your cache for the controller they are about to hit. what would happen?
sadly…i can tell from my own experience…your rails app goes the way of the dodo and simply rolls over, waiting for it all to be over. even if your rails app is very fast you can still run into the limit, where you simply can’t generate the cache fast enough before some thousand other people came through to your app, dosing it.

so what do we do? simply put we never invalidate the cache, a.k.a. deleting it. we just refresh it.
today we’ll start with the ‘simpler’ form of the exercise by regenerating your cache with a cron job. don’t worry the next article will be about how to trigger the regeneration process.

so before we begin i assume following things:

you have redis, nginx and the redis gem installed
you have set up your nginx as described here
your controllers are as described here

by the way…if you haven’t run into the issue yet:
go check /etc/redis.conf and set

maxmemory 3gb

to something sensible. if you forget this, it will bite you very soon.

so first off we need to modify our controllers. if you truly want to have no real rails stack hits at any time we have to remove the cache lifetime all together. But this has some drawbacks, as your cache will constantly grow and especially very rare calls will be regenerated just as any other call.
so if your app allows indefinite variable urls it could be possible to bring your redis to a grinding halt by simply overloading it with data.
that’s why you should ask your admin to monitor the redis usage closely and report any problems to you.

so here is our new app/controllers/application_controller.rb

class ApplicationController < ActionController::Base
  protect_from_forgery

  def save_cache_to_redis
    #we just use set here now
    $redis.set( request.request_uri, response.body )
    
    #just like before
    $redis.sadd("#{@model_name.downcase}_instances_collection", @model_id)
    $redis.sadd("#{@model_name.downcase}_#{@model_id}_urls_collection", request.request_uri)
  end

end

in your other controllers you can now just remove the cache_lifetime variables.

next we need to modify our app/models/thingy.rb

class Thingy < ActiveRecord::Base
  #all the invalidations methods vanished, instead we just want a list of the urls that were cached
  def self.cached_urls
    cached_urls = []
    $redis_cache.smembers("thingy_instances_collection").each do |instance_id|
      $redis_cache.smembers("thingy_#{instance_id}_urls_collection").each do |url|
        cached_urls << url
      end
    end
    cached_urls
  end
end

so now we now the urls that have to be refreshed and we can spider them using a rails task.
but wait, how will we be able to access the rails app if the redis is in front of it?

we just define another nginx server with a local name and no cache in front of it.
so we have a /etc/nginx/sites-enabled/yourapp-local

server {
    listen 80;
      #this name should be entered in the /etc/hosts e.g. 'your.website.local 127.0.0.1'
      server_name your.website.local;
      root /home/appuser/app/current/public;
      
      #so we removed the redis cache
      location / {
          proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
          proxy_set_header Host $http_host;
          proxy_redirect off;
          if (!-f $request_filename) {
              #this upstream is the same as in the config for your external server
              proxy_pass http://yourunicornupstream;
              break;
          }
      }
}

now we can ask rails directly and it will overwrite the old redis cache when being called.

last but not least we need a spider rake task to crawl all your urls and give them a freshen up ;)
let's look at lib/tasks/spider.rake

require "./config/environment"
require 'open-uri'

namespace :spider do

  desc 'get all urls and spider them'
  task :crawl do
    @todo = []
    @max_threads = 4 #try what works best for you

    #you either enter a list of classes you want to spider
    #or you use this. (only works in rails 3)
    ActiveRecord::Base.descendants.each do |k|
      @todo += k.cached_urls if k.respond_to?(:cached_urls)
    end

    #the actual spidering 
    @threads = []
    @max_threads.times do
      @threads << Thread.new do
        while @todo != []
          url = "http://your.website.local" + @todo.pop
          p url #depending how fast your app is, this will become a bottleneck ;)
          open(url)
        end
      end
    end
    #you may have to set the timeout higher depending on your workload
    @threads.each { |thread| thread.join(640) }

  end
end

tada...now all you have todo is set up a cronjob that refreshes the cache at a time interval of your liking.

i hope you could learn something new yet again. if there are any questions feel free to ask, i may even answer ;)

till the next time (when we do the cool triggered thingy)

have fun.

periodic cache regeneration with rails redis and a spider

Leave a Reply Click here to cancel reply.

Search

Categories

Archives