Created_at :Aug 2010

Ruby on Rails : Cronjobs

Here are some tips about setting up cronjobs in rails app.

My rails applications use cronjobs fequently.  I use them to
  • refresh data from Internet : for example download book cover images
  • going through logs and producing a summary like 'most viewed books today',  'most favorited items ' ..etc
    (number of user_logs are too many to efficiently go through and calculate stats dynamically.  That is why I use 'summary tables')

Running Rails code in command line

Here is a ruby script, that access 'Book' model,
Book.find(:all).each do |book|
   book.update_stats
end
For this to work, this script needs 'Rails Environment'. We can make any script to have full rails environment (access to models ..etc).  We run the ruby script 'script/runner'


From your Rails directory:
./script/runner  myscripts/update-stats.rb

'script/runner'  will read appropriate configuration files and provide a rails environment for the script.
By default it is 'development'.  To use another environment

  ./script/runner  -e staging myscripts/update-stats.rb
  


Preventing Scripts Stepping Over Each Other

Some Scripts run once a day, some run once an hour.  Some are run every few minutes.  Cron process will kick off the script at the specified intervals.  Lets say we run the 'update-stats.rb' every 5 minutes.  What happens if one run of the runs takes longer than 5 minutes (because say the database is slow or there is just too much data to consume).   Another instance of the script is kicked off in the next 5 minutes, before the first one had a chance to shutdown.   This can have unintended consequences, like incorrect stats ...etc.

We don't want this to happen.  We want to prevent a script running if another instances of the same script is running.

  • We can space the runs sufficiently apart.   For example if a script runs once a day and takes a few hours, it is highly unlikely it will run into the next run.
  • A script can set a flag in the database that it is running.  At the start of the script it checks the flag.  If it is set, script refuses to run.
    But this is not reliable, if the script died in the middle (exception / out of memory) the flag would still be set and no future runs is possible
  • A script can write a file to the file system ('script_A_IN_PROGRESS'). But this also suffers the same fate that if the script died un-expectedly, the file will be ORPHANED, and no script will run untill the file is manually removed
So what we need is a SIMPLE, FOOL-PROOF way to indicate a script is running.

Enter FILE_LOCKS.

On most UNIX/LINUX systems,  a process can LOCK a file.  This lock can be a shared lock or exclusive lock.  In case of exclusive lock, the file stays locked untill the process terminates.  Duing that time, no other process can get an exclusive lock on the file.   The beauty is even if the process has died (of an exception, memory exhaustion),  the lock is released.  The operating system takes care of it.

So here is the process:
  • when a process starts up, it tries to acquire an exclusive lock on a file
  • if locking is successful, go ahead
  • if not, exit
Here is how to do this:   It is a ONE LINER, that is SO SIMPLE :-)


if File.new(__FILE__).flock(File::LOCK_EX | File::LOCK_NB) == false
  puts "*** can't lock file, another instance of script running?  exiting"
  exit 1
end

# the rest of the script
...
 
Parameters explained
  • __FILE__ : The process tries get an exclusive lock on itself.
  • File:LOCK_EX - obtain exclusive lock
  • FIle::LOCK_NB - non blocking. IF the lock is not acquirable, return rather the blocking

Keeping Tabs on CRONTABs

The beauty of CRON jobs is, once you set them up correctly, you can forget about them.  They dutifully run at specified times and do their jobs.  It is still a good idea to produce some logging, so once in a while, you can check that the job is alive and well.

Here is some simple logging.  It prints out the timestamp that the job is starting and finishing.  And prints out how much time it took.  We print out some '====' so we can seperate the outputs from each run

## run this with script/runner

t1 = Time.now
puts "================"
puts Time.now.strftime("%Y%m%d-%H%M%S") + " : " + __FILE__ + " starting..."

if File.new(__FILE__).flock(File::LOCK_EX | File::LOCK_NB) == false
  puts "*** can't lock file, another instance of script running?  exiting"
  exit 1
end

# do the processing...
# ...

t2 = Time.now
puts Time.now.strftime("%Y%m%d-%H%M%S") + " : " + __FILE__ + " finished  #{t2 - t1} secs"
puts "================"



This is the template I use for my cronjobs


Wrapping them in a SHELL script

I usually like to wrap the ruby scripts in a shell script,  because it saves me some typing :-)
Here is a sample script:
#!/bin/bash

## lives in RAILS_APP/myscripts

if [ -z "$1" ] 
then
    echo "usage : $0  "
    echo "eg : $0  staging"
    exit -1
fi
environment=$1

mydir=$(readlink -f $(dirname  $0))
scriptdir=$(readlink -f "$mydir/../script")
## get RAILS_APP/script

$scriptdir/runner -e "$environment" $mydir/update-stats.rb



See how it figures out the script directory?   This way, you can run this script from any directory and it will work correctly.  It doesn't depend on you to run the script from the rails application directory.   Make sure you have an uptodate version of 'readlink'  we use.  (Linux distros come with the right one,  Mac-OSX  users can get it from mac-ports)


Setting up the CRON

Before setting up the CRON job, make sure the script runs. 
  • make sure shell script is executable (chmod 755 script.sh )
  • If you are using a shell script, run it to make sure it works.
  • Also run it from a different directory other than Rails application dir.

$ crontab -e
will open your editor to enter cronjobs
Here is couple of entries:

  # run every 5 minutes
  */5 * * * *   /var/www/myapp_prod/current/myscripts/update-stats.sh production >> /var/www/myapp_prod/update-stats.log 2>&1

  # runs at 1am
  0 1 * * *   /var/www/myapp_prod/current/db-backup.sh
  


Note I love the short-hand syntax (*/5).  It means every 5 minutes.  Beats writing
  0,5,10,15,20,25,30,35,49,45,50,55 * * * * /var/www/myapp_prod/current/myscripts/update-stats.sh production >> /var/www/myapp_prod/update-stats.log 2>&1
  


Also note I redirect output a log file, so I can check it later if required.  Also I am sending stderr and stdout  to the same file (2>&1)


Thats is it..



** Comment on this article **