blpop操作固定在60s时超时，但redis.conf和new时都指定了无超时限制，求解？

fatfatson · 2015-06-17T11:42:31+00:00

Hello! 2015-06-14 15:11 GMT+08:00 John Dickerson: > 1. ngx 开启服务 > 2. 测试机打开压&#x52...

blpop操作固定在60s时超时，但redis.conf和new时都指定了无超时限制，求解？

fatfatson

之前一直没注意这个问题，最近打了log才发现，原来在spawn出来的coroutine里用一个大while循环，blpop阻塞方式接收redis消息。

在redis.conf和new时都指定了无超时限制：

 daemonize yes                     
 pidfile /var/run/redis.pid        
 timeout 0                         
 tcp-keepalive 0                   
 save 900 1                        
 save 300 10                       
 save 60 10000                     
 stop-writes-on-bgsave-error yes   
 dbfilename dump.rdb               
                                   
 dir DBSTORE                       
 logfile DBLOG                     
 port PORT                         
 bind 127.0.0.1     

local redis = require "resty.redis"                    
                                                       
ip = "127.0.0.1"                                       
port = g_env.db_port                                   
                                                       
db_cache = {}                                          
setmetatable(db_cache, {__mode="kv"})                  
                                                       
get = function()                                       
    local co = coroutine.running()                     
    local db = db_cache[co]                            
    if db then                                         
        if db:ping() ~= "PONG" then                    
            db = nil                                   
            print("[DB] connection closed, remove it") 
        end                                            
    end                                                
                                                       
    if not db then                                     
        db = redis:new()                               
        db:set_timeout(0)                              
        local ok, err = db:connect(ip, port)           
        if not ok then                                 
            print("[DB] connect fail",ip,port,err)     
            return                                     
        end                                                                      
        db_cache[co] = db                              
    end                                                
    return db                                          
end                          

使用的地方：

recv = function( key, timeout )                                       
    local db = redis.get()                                            
    if not db then return false, "can't open db" end                  
                                                                      
    --timeout = timeout or 10                                           
    print("chan.recv for", key, timeout, os.daytime())              
    local msg, err = db:blpop( key, timeout or 0 )                    
    if err then                                                       
        printf("[DB] err:%s when read channel; %s", err, os.daytime())
        return false, err                                             
    end                                                               
                                                                      
    if db_null(msg) then                                              
        return true, nil                                              
    else                                                              
        print("chan.recv get", unpack(msg))                           
        return true, msg[2]                                           
    end                                                               
end       

如果timeout指定为一个一般值，比如5，10，20什么的，那么blpop确实会在这么多时间后超时返回，但没有err，下一轮while里取该db时ping测试也正常，该连接继续使用。

但是如果timeout为nil也就是0时，预期应该是一直阻塞直到有数据返回，但实际上每固定60s后blpop返回，err为timeout，而且下一次db:ping调用失败，连接被关闭，只好重新new一个出来用。

请问，这个是否不正常行为呢？

fatfatson

用redis-cli连上去挂blpop alist 0，是没问题的，会一直卡到有数据来

在 2015年6月17日星期三 UTC+8上午11:42:31，小冶写道：

之前一直没注意这个问题，最近打了log才发现，原来在spawn出来的coroutine里用一个大while循环，blpop阻塞方式接收redis消息。
在redis.conf和new时都指定了无超时限制：
daemonize yes pidfile /var/run/redis.pid timeout 0 tcp-keepalive 0 save 900 1 save 300 10 save 60 10000 stop-writes-on-bgsave-error yes dbfilename dump.rdb dir DBSTORE logfile DBLOG port PORT bind 127.0.0.1

local redis = require "resty.redis" ip = "127.0.0.1" port = g_env.db_port db_cache = {} setmetatable(db_cache, {__mode="kv"}) get = function() local co = coroutine.running() local db = db_cache[co] if db then if db:ping() ~= "PONG" then db = nil print("[DB] connection closed, remove it") end end if not db then db = redis:new() db:set_timeout(0) local ok, err = db:connect(ip, port) if not ok then print("[DB] connect fail",ip,port,err) return end db_cache[co] = db end return db end

使用的地方：
recv = function( key, timeout ) local db = redis.get() if not db then return false, "can't open db" end --timeout = timeout or 10 print("chan.recv for", key, timeout, os.daytime()) local msg, err = db:blpop( key, timeout or 0 ) if err then printf("[DB] err:%s when read channel; %s", err, os.daytime()) return false, err end if db_null(msg) then return true, nil else print("chan.recv get", unpack(msg)) return true, msg[2] end end

如果timeout指定为一个一般值，比如5，10，20什么的，那么blpop确实会在这么多时间后超时返回，但没有err，下一轮while里取该db时ping测试也正常，该连接继续使用。
但是如果timeout为nil也就是0时，预期应该是一直阻塞直到有数据返回，但实际上每固定60s后blpop返回，err为timeout，而且下一次db:ping调用失败，连接被关闭，只好重新new一个出来用。
请问，这个是否不正常行为呢？

agentzh

Hello!

2015-06-17 11:42 GMT+08:00 小冶:
> 如果timeout指定为一个一般值，比如5，10，20什么的，那么blpop确实会在这么多时间后超时返回，但没有err，下一轮while里取该db时ping测试也正常，该连接继续使用。
> 但是如果timeout为nil也就是0时，预期应该是一直阻塞直到有数据返回，但实际上每固定60s后blpop返回，err为timeout，而且下一次db:ping调用失败，连接被关闭，只好重新new一个出来用。
> 请问，这个是否不正常行为呢？

这是期望的行为。settimeout 方法的文档里面并没有说 0 或者 nil 代表没有超时：

https://github.com/openresty/lua-nginx-module#tcpsocksettimeout

事实上，根据目前的实现，0 或 nil 表示沿用 lua_socket_connect_timeout,
lua_socket_send_timeout, 和 lua_socket_read_timeout
配置指令的设置。而这些配置指令的默认值正是 60s.

Regards,
-agentzh

fatfatson

谢谢答复

参数0不代表超时这个了解了。但是超时就超时了，为何连接都关闭了呢？导致下一次还得重新new个来用

在 2015年6月17日星期三 UTC+8下午2:21:27，agentzh写道：

Hello!

2015-06-17 11:42 GMT+08:00 小冶:
> 如果timeout指定为一个一般值，比如5，10，20什么的，那么blpop确实会在这么多时间后超时返回，但没有err，下一轮while里取该db时ping测试也正常，该连接继续使用。
> 但是如果timeout为nil也就是0时，预期应该是一直阻塞直到有数据返回，但实际上每固定60s后blpop返回，err为timeout，而且下一次db:ping调用失败，连接被关闭，只好重新new一个出来用。
> 请问，这个是否不正常行为呢？

这是期望的行为。settimeout 方法的文档里面并没有说 0 或者 nil 代表没有超时：

https://github.com/openresty/lua-nginx-module#tcpsocksettimeout

事实上，根据目前的实现，0 或 nil 表示沿用 lua_socket_connect_timeout,
lua_socket_send_timeout, 和 lua_socket_read_timeout
配置指令的设置。而这些配置指令的默认值正是 60s.

Regards,
-agentzh

agentzh

Hello!

2015-06-17 14:38 GMT+08:00 小冶:
> 参数0不代表超时这个了解了。但是超时就超时了，为何连接都关闭了呢？导致下一次还得重新new个来用
>

只有当 cosocket 发生写超时或者连接超时的时候才会关闭连接，因为在这两种情况下，连接的状态变得不可确定，所以继续使用该连接是不安全的。当发生致命错误时，你也不用重新
new；而只需在当前对象上调用 connect() 方法重连即可。

Regards,
-agentzh