mkbosner
Hello there,
I'm runnig lua scripts on
nginx 1.2.6 &
ngx_openresty-1.2.6.6
with latest memcached & redis 2.6.7 installed.
lua_code_cache is enabled
Configuration [nginx.conf & sites-enabled/my.com]
worker_processes 4;
worker_rlimit_nofile 33000;
error_log /var/log/nginx/error.log debug;
pid /var/run/nginx.pid;
events {
worker_connections 32000;
}
location /cat {
default_type text/html;
content_by_lua_file /home/mydev/frontend/categories.lua;
}
-----------
Here are 2 errors i'm getting from time to time in nginx's error.log (sometimes under average concurrency ~10-20 processes accessing redis/memcached per second):
1) the 1st error is the following:
2013/04/16 23:23:25 [error] 28990#0: *134838 lua tcp socket read timed out, client: 178.192.54.186, server: my.com, request: "GET /de/cat/1.html?ref=one.com HTTP/1.1", host: "my.com", referrer: "http://referer.com/1.html
2013/04/16 23:23:25 [error] 28990#0: *134838 attempt to send data on a closed socket: u:0000000003E3C018, c:0000000000000000, ft:2 eof:0, client: 178.192.54.186, server: my.com, request: "GET /de/cat/1.html?ref=one.com HTTP/1.1", host: "my.com", referrer: "http://referer.com/1.html"
2013/04/16 23:23:25 [error] 28990#0: *134838 attempt to send data on a closed socket: u:0000000003E3C018, c:0000000000000000, ft:2 eof:0, client: 178.192.54.186, server: my.com, request: "GET /de/cat/1.html?ref=one.com HTTP/1.1", host: "my.com", referrer: "http://referer.com/1.html"
I create redis instance in content_by_lua_file and then pass it to TPS2.lua module's function as param. Routines use the instance "red" to execute get's/set's.
categories.lua:
local slt = require "resty.slt"
local cjson = require "cjson"
local redis = require "resty.redis"
local TPS = require "TPS2"
local red = redis:new()
local domain = TPS.domain
local ok, err = red:connect(TPS.redis_socket)
if not ok then
ngx.say("failed to connect: ", err)
return
end
-- getting categories list
local cats_list = TPS:get_categories(red, stl, geo)
-- some output stuff
ngx.say(cats_list)
----- [TPS2 module brief source]
TPS2.lua:
local base = _G
module('TPS2', package.seeall)
meta = { __index = _M }
domain = 'my.com'
…..
function get_categories(self, red, slt, geo)
-- cat's links, amounts and names
local clinks = {}
local cnames = {}
local ccids = {}
local camounts = {}
local cletters = {}
-- decoding and splitting categories data
all_cats, err = red:get("global_letters_"..geo)
if not all_cats then
ngx.say("can't get global_letters: ", err)
return
end
local total = 0
-- filling cLinks, cAmounts and cNames arrays with cat's data
for cat_letter, cat_data in orderedPairs(cjson.decode(all_cats)) do
…….
end
local list = {}
local cats_list = assert(slt.loadfile(string.format(cats_list_template,geo)))
…...
list.cnames = cnames
slt.render(cats_list, list) -- will call outputfn and populate result
return table.concat(list_result)
end
2.
And the second one (seems to be about memcached)
2013/04/17 11:40:06 [error] 12591#0: *3502 lua tcp socket read timed out, client: 88.208.57.189, server: my.com, request: "GET /cat?recache=1&category=154&page=0&geo=en HTTP/1.1", host: "my.com"
2013/04/17 11:40:06 [error] 12591#0: *3502 lua entry thread aborted: runtime error: /usr/local/openresty/lualib/resty/resty/slt.lua:29: bad argument #1 to 'find' (string expected, got nil)
stack traceback:
coroutine 0:
[C]: in function 'find'
/usr/local/openresty/lualib/resty/resty/slt.lua:29: in function 'precompile'
/usr/local/openresty/lualib/resty/resty/slt.lua:59: in function 'precompile'
/usr/local/openresty/lualib/resty/resty/slt.lua:89: in function </usr/local/openresty/lualib/resty/resty/slt.lua:74>
(tail call): ?
/home/mydev/frontend/categories.lua:144: in function </home/mydev/frontend/categories.lua:1>, client: 88.208.57.189, server: my.com, request: "GET /cat?recache=1&category=154&page=0&geo=en HTTP/1.1", host: my.com
I'm using memcache it to get temlates from memory so i've modified slt.lua a bit
below are the lines mentioned in error.log
slt.lua:29 :
local start1, end1 = string.find(template, start_tag, 1, true)
slt.lua:59
table.insert(result, precompile(memc:get(domain.."_"..filename), memc, domain, orig_stag, end_tag))
slt.lua:89 :
template, err = precompile(template, memc, domain, start_tag, end_tag)
if template == nil then
return nil, err
end
and here's how i use "slt" and memcached in
categories.lua:
local slt = require "resty.slt"
local memcached = require "resty.memcached"
local memc, err = memcached:new()
if not memc then
ngx.say("failed to instantiate memc: ", err)
return
end
local ok, err = memc:connect(TPS.memcache_socket)
if not ok then
ngx.say("failed to connect: ", err)
return
end
memc:set_timeout(1000) -- 1 sec
local temp_body = assert(slt.loadfile(string.format(TPS.body_template,geo),memc,TPS.domain))
so "memc" is created outside the module
and shouldn't be used by another requests..
I just pass it into module routine to execute "get" or "set" for redis or memcached in both cases.
For the (1) first error i can't even figure out where does it happen.. There's no line number in error log.
In other words problem occures for both memcached and redis connections.
Redis is sometimes under heavy load but memcached never is.
10-20 requests per seconds shouldn't produce anything similar.. Ever 10k shouldn't.
I suppose the problem is somewhere with global varibales but i can't understand what am i doing wrong..
Do you have any ideas on how to fix that?
Thanks in advance guys
This really makes me nervous for the last couple of days
agentzh
Hello!
On Thu, Apr 18, 2013 at 5:59 AM, Bosner wrote:
> 1) the 1st error is the following:
>
> 2013/04/16 23:23:25 [error] 28990#0: *134838 lua tcp socket read timed out,
> client: 178.192.54.186, server: my.com, request: "GET
> /de/cat/1.html?ref=one.com HTTP/1.1", host: "my.com", referrer:
> "http://referer.com/1.html
> 2013/04/16 23:23:25 [error] 28990#0: *134838 attempt to send data on a
> closed socket: u:0000000003E3C018, c:0000000000000000, ft:2 eof:0, client:
> 178.192.54.186, server: my.com, request: "GET /de/cat/1.html?ref=one.com
> HTTP/1.1", host: "my.com", referrer: "http://referer.com/1.html"
When socket timeout happens, the current socket object will
automatically get closed. So any attempts to use it later without
re-connecting will return an error, like "attempt to send data on a
closed socket".
Socket timeout is a quite common error especially when you're using a
relatively small timeout setting. The following document may be
useful:
https://code.google.com/p/memcached/wiki/Timeouts
Packet drop at any place (kernel, network links, or even delay in the
user space due to heavy load or other busy processes in the same
system) will trigger packet re-transmission in seconds (by default).
The best way to handle timeout is to use relatively small timeout
setting and retry the operation yourself for a limited number of
times.
It is kinda hard to quickly identify exactly what causes a particular
random timeout in production. I've had the plan to write
systemtap/dtrace-based tools to diagnose such errors in great details
in production.
> 2013/04/17 11:40:06 [error] 12591#0: *3502 lua tcp socket read timed out,
> client: 88.208.57.189, server: my.com, request: "GET
> /cat?recache=1&category=154&page=0&geo=en HTTP/1.1", host: "my.com"
> 2013/04/17 11:40:06 [error] 12591#0: *3502 lua entry thread aborted: runtime
> error: /usr/local/openresty/lualib/resty/resty/slt.lua:29: bad argument #1
> to 'find' (string expected, got nil)
>
It seems that you do not properly handle nil values returned from the
socket operations when errors like timeout happens. You should ensure
that you never pass nil values to string.find function.
>
> I suppose the problem is somewhere with global varibales but i can't
> understand what am i doing wrong..
>
I doubt though. Sharing sockets with global variables would result in
the "socket busy" error.
BTW, if you handle the memcached/redis socket errors yourself, you can
consider disabling the automatic error logging in ngx_lua's cosocket
implementation:
http://wiki.nginx.org/HttpLuaModule#lua_socket_log_errors
Best regards,
-agentzh