Hi All
I have a question relating to the lua_shared_dict
I've implemented a mechanism to serve cache contents from redis via lua
Every time a cache hit occurs I record a bunch of data (timestamp, id of item returned, browsers user agent etc). Originally I submitted this instantly to a Gearmand instance (using the openresty Gearman lib)
This worked relatively well, but Gearman struggled to cope with the number of connections required (15K/sec had it struggling)
Using the lua_shared_dict I wanted to chunk up these 'stats records' and then submit to Gearman once I had a batch of them (I chose a batch size of 40). The code for this is as follows:
-- we concat the job and increment the job count
local jobs = ngx.shared.jobs
local new_job = ngx.encode_args( { ["impressions[]"] = string.format('%s||s=%s||i=%s||u=%s', ngx.var.request_uri, creative_id, tostring(ngx.time()), tostring(ngx.var.http_user_agent) ) } )
-- combine the job on and incr the job count
local job_string, err = jobs:get('job_string')
-- set the string
if not job_string then
jobs:add("job_string", string.format("worker_impression_processor#%s",new_job))
else
jobs:set('job_string', string.format("%s&%s", job_string, new_job))
end
-- incr the count safely
local newval, err = jobs:incr("job_count", 1)
if not newval and err == "not found" then
jobs:add("job_count", 0)
newval = jobs:incr("job_count", 1)
end
-- submit the job if we've seen 40 calls
if (newval == 50) then
-- proceed to submit the impression to gearman
local gearman = require "resty.gearman"
local gm = gearman:new()
gm:set_timeout(500) -- 0.5 sec
local ok, err = gm:connect("127.0.0.1", 4730)
if not ok then
-- can't submit the job we we continue to serve the request
ngx.log(ngx.ERR, "@cache: unable to connect to gearman: ")
else
-- submit the job to gearman to
local ok, err = gm:submit_job_bg("creative_call", jobs:get('job_string'))
if not ok then
ngx.log(ngx.ERR, "@cache: unable to submit job to gearman: (",ngx.var.request_uri,") ", err)
else
-- put it into the connection pool of size 100, with 0 idle timeout
local ok, err = gm:set_keepalive(0, 100)
end
end
-- reset the shared dictionary
jobs:flush_all()
end
(pastie - http://pastie.org/6120580)
When testing this at load (12K/sec) I am seeing a large number of 'ngx_slab_alloc() failed: no memory in lua_shared_dict zone "jobs"'
The other issue i'm having, which is slightly more concerning (as i've read the above 'error' is not a concern) is that I can put 12K calls into the system but not subsequently get 12K (in chunks of 40) submitted to gearman - its always a varying number short (I'm getting no other errors such as gearman connection errors for example)
Could this be because of concurrent access to the shared dict? is this a known limitation (and so i'm not using it in the correct manor?)
Any help would be greatly appreciated as I'm a little stumped for the best course of action here
Many thanks
/Matt