Created_at :Aug 2010
Ruby on Rails : Cronjobs
Here are some tips about
setting up cronjobs in rails app.
My rails applications use cronjobs fequently. I use them to
From your Rails directory:
'script/runner' will read appropriate configuration files and provide a rails environment for the script.
By default it is 'development'. To use another environment
We don't want this to happen. We want to prevent a script running if another instances of the same script is running.
Enter FILE_LOCKS.
On most UNIX/LINUX systems, a process can LOCK a file. This lock can be a shared lock or exclusive lock. In case of exclusive lock, the file stays locked untill the process terminates. Duing that time, no other process can get an exclusive lock on the file. The beauty is even if the process has died (of an exception, memory exhaustion), the lock is released. The operating system takes care of it.
So here is the process:
Here is some simple logging. It prints out the timestamp that the job is starting and finishing. And prints out how much time it took. We print out some '====' so we can seperate the outputs from each run
This is the template I use for my cronjobs
Here is a sample script:
See how it figures out the script directory? This way, you can run this script from any directory and it will work correctly. It doesn't depend on you to run the script from the rails application directory. Make sure you have an uptodate version of 'readlink' we use. (Linux distros come with the right one, Mac-OSX users can get it from mac-ports)
Here is couple of entries:
Note I love the short-hand syntax (*/5). It means every 5 minutes. Beats writing
Also note I redirect output a log file, so I can check it later if required. Also I am sending stderr and stdout to the same file (2>&1)
Thats is it..
** Comment on this article **
Ruby on Rails : Cronjobs
Here are some tips about
setting up cronjobs in rails app.My rails applications use cronjobs fequently. I use them to
- refresh data from Internet : for example download book
cover images
- going through logs and producing a summary like 'most
viewed books today', 'most favorited items ' ..etc
(number of user_logs are too many to efficiently go through and calculate stats dynamically. That is why I use 'summary tables')
Running Rails code in command line
Here is a ruby script, that access 'Book' model,Book.find(:all).each do |book| book.update_stats endFor this to work, this script needs 'Rails Environment'. We can make any script to have full rails environment (access to models ..etc). We run the ruby script 'script/runner'
From your Rails directory:
./script/runner myscripts/update-stats.rb
'script/runner' will read appropriate configuration files and provide a rails environment for the script.
By default it is 'development'. To use another environment
./script/runner -e staging myscripts/update-stats.rb
Preventing Scripts Stepping Over Each Other
Some Scripts run once a day, some run once an hour. Some are run every few minutes. Cron process will kick off the script at the specified intervals. Lets say we run the 'update-stats.rb' every 5 minutes. What happens if one run of the runs takes longer than 5 minutes (because say the database is slow or there is just too much data to consume). Another instance of the script is kicked off in the next 5 minutes, before the first one had a chance to shutdown. This can have unintended consequences, like incorrect stats ...etc.We don't want this to happen. We want to prevent a script running if another instances of the same script is running.
- We can space the runs sufficiently apart. For example if a script runs once a day and takes a few hours, it is highly unlikely it will run into the next run.
- A script can set a flag in the database that it is
running. At the start of the script it checks the
flag. If it is set, script refuses to run.
But this is not reliable, if the script died in the middle (exception / out of memory) the flag would still be set and no future runs is possible - A script can write a file to the file system ('script_A_IN_PROGRESS'). But this also suffers the same fate that if the script died un-expectedly, the file will be ORPHANED, and no script will run untill the file is manually removed
Enter FILE_LOCKS.
On most UNIX/LINUX systems, a process can LOCK a file. This lock can be a shared lock or exclusive lock. In case of exclusive lock, the file stays locked untill the process terminates. Duing that time, no other process can get an exclusive lock on the file. The beauty is even if the process has died (of an exception, memory exhaustion), the lock is released. The operating system takes care of it.
So here is the process:
- when a process starts up, it tries to acquire an exclusive lock on a file
- if locking is successful, go ahead
- if not, exit
if File.new(__FILE__).flock(File::LOCK_EX | File::LOCK_NB) == false puts "*** can't lock file, another instance of script running? exiting" exit 1 end # the rest of the script ...Parameters explained
- __FILE__ : The process tries get an exclusive lock on itself.
- File:LOCK_EX - obtain exclusive lock
- FIle::LOCK_NB - non blocking. IF the lock is not acquirable, return rather the blocking
Keeping Tabs on CRONTABs
The beauty of CRON jobs is, once you set them up correctly, you can forget about them. They dutifully run at specified times and do their jobs. It is still a good idea to produce some logging, so once in a while, you can check that the job is alive and well.Here is some simple logging. It prints out the timestamp that the job is starting and finishing. And prints out how much time it took. We print out some '====' so we can seperate the outputs from each run
## run this with script/runner
t1 = Time.now
puts "================"
puts Time.now.strftime("%Y%m%d-%H%M%S") + " : " + __FILE__ + " starting..."
if File.new(__FILE__).flock(File::LOCK_EX | File::LOCK_NB) == false
puts "*** can't lock file, another instance of script running? exiting"
exit 1
end
# do the processing...
# ...
t2 = Time.now
puts Time.now.strftime("%Y%m%d-%H%M%S") + " : " + __FILE__ + " finished #{t2 - t1} secs"
puts "================"
This is the template I use for my cronjobs
Wrapping them in a SHELL script
I usually like to wrap the ruby scripts in a shell script, because it saves me some typing :-)Here is a sample script:
#!/bin/bash
## lives in RAILS_APP/myscripts
if [ -z "$1" ]
then
echo "usage : $0 "
echo "eg : $0 staging"
exit -1
fi
environment=$1
mydir=$(readlink -f $(dirname $0))
scriptdir=$(readlink -f "$mydir/../script")
## get RAILS_APP/script
$scriptdir/runner -e "$environment" $mydir/update-stats.rb
See how it figures out the script directory? This way, you can run this script from any directory and it will work correctly. It doesn't depend on you to run the script from the rails application directory. Make sure you have an uptodate version of 'readlink' we use. (Linux distros come with the right one, Mac-OSX users can get it from mac-ports)
Setting up the CRON
Before setting up the CRON job, make sure the script runs.- make sure shell script is executable (chmod 755 script.sh )
- If you are using a shell script, run it to make sure it works.
- Also run it from a different directory other than Rails application dir.
$ crontab -ewill open your editor to enter cronjobs
Here is couple of entries:
# run every 5 minutes */5 * * * * /var/www/myapp_prod/current/myscripts/update-stats.sh production >> /var/www/myapp_prod/update-stats.log 2>&1 # runs at 1am 0 1 * * * /var/www/myapp_prod/current/db-backup.sh
Note I love the short-hand syntax (*/5). It means every 5 minutes. Beats writing
0,5,10,15,20,25,30,35,49,45,50,55 * * * * /var/www/myapp_prod/current/myscripts/update-stats.sh production >> /var/www/myapp_prod/update-stats.log 2>&1
Also note I redirect output a log file, so I can check it later if required. Also I am sending stderr and stdout to the same file (2>&1)
Thats is it..
** Comment on this article **