upstream backend {
server 1.1.1.1;
balancer_by_lua_block {
local balancer = require "ngx.balancer"
local status, code = balancer.get_last_failure()
ngx.log(ngx.ERR, "get_last_failure status: ", status)
local host = "127.0.0.1"
--此端口无法访问
local port = 83
local ok, err = balancer.set_current_peer(host, port)
if not ok then
ngx.log(ngx.ERR, "failed to set the current peer: ", err)
return ngx.exit(500)
end
ok, err = balancer.set_more_tries(2)
if not ok then
ngx.log(ngx.ERR, "set_more_tries failed, ", err)
return
end
}
}
server {
listen 88;
location / {
proxy_next_upstream_tries 5;
proxy_pass http://backend;
}
}
会导致出现bug,在ngx_http_lua_ffi_balancer_set_more_tries函数的686行导致count为负数,然后在696行赋值给bp->more_tries时,int转成uint,从而出现了一个超级大的正整数。最后赋值给pc->tries,出现无限重试。
第1次经过ngx_http_lua_balancer_get_peer之后: r->upstream->peer.tries: 3 bp->total_tries: 1 bp->more_tries: 2
ngx_http_lua_balancer_free_peer之后: r->upstream->peer.tries: 2
第2次经过ngx_http_lua_balancer_get_peer之后: r->upstream->peer.tries: 4 bp->total_tries: 2 bp->more_tries: 2
ngx_http_lua_balancer_free_peer之后: r->upstream->peer.tries: 3
第3次经过ngx_http_lua_balancer_get_peer之后: r->upstream->peer.tries: 5 bp->total_tries: 3 bp->more_tries: 1
ngx_http_lua_balancer_free_peer之后: r->upstream->peer.tries: 4
![]()
第4次经过ngx_http_lua_balancer_get_peer之后: r->upstream->peer.tries: 5 bp->total_tries: 4 bp->more_tries: 1
ngx_http_lua_balancer_free_peer之后: r->upstream->peer.tries: 4
![]()
第5次经过ngx_http_lua_balancer_get_peer之后: r->upstream->peer.tries: 4 bp->total_tries: 5 bp->more_tries: 0
ngx_http_lua_balancer_free_peer之后: r->upstream->peer.tries: 3
![]()
第6次经过ngx_http_lua_balancer_get_peer之后: r->upstream->peer.tries: 4 bp->total_tries: 6 bp->more_tries: 18446744073709551615(-1)
ngx_http_lua_balancer_free_peer之后: r->upstream->peer.tries: 3
看下面ngx_http_lua_ffi_balancer_set_more_tries函数中的683-689行:
max_tries = r->upstream->conf->next_upstream_tries;
if (bp->total_tries + count > max_tries) {
count = max_tries - bp->total_tries;
*err = "reduced tries due to limit";
} else {
*err = NULL;
}
此时, max_tries = 5, count = 2, bp->total_tries = 6
显然,bp->total_tries + count > max_tries 即 6 + 2 > 5成立
count = max_tries - bp->total_tries = 5 - 6 = -1
问题出现。
这是由于r->upstream->peer.tries多次累加 bp->total_tries,但是在ngx_http_lua_ffi_balancer_set_more_tries函数中的685行的判断却没有考虑这种情况导致的。
通过加入下面标红的代码可以解决这个bug。
if (bp->total_tries + count >= max_tries) {
count = max_tries - bp->total_tries;
count = count > 0 ? count : 0;
*err = "reduced tries due to limit";
}
当然,也可以记录r->upstream->peer.tries多次累加 bp->total_tries的总和,在改变count的计算方法来解决。