hi @agentzh,
本测试由朱德江(DeJiang Zhu
douj...@gmail.com) 首先完成,我完整复现了一次,整理报告如下。
openresty 测试版本 1.9.3.2rc3,其中 ngx_lua 使用 balancer-by-lua 分支当前版本
# /usr/local/openresty/nginx/sbin/nginx -V
nginx version: openresty/1.9.3.2rc3
built by gcc 4.4.7 20120313 (Red Hat 4.4.7-11) (GCC)
built with OpenSSL 1.0.1e-fips 11 Feb 2013
TLS SNI support enabled
configure arguments: --prefix=/usr/local/openresty/nginx --with-cc-opt=-O2 --add-module=../ngx_devel_kit-0.2.19 --add-module=../echo-nginx-module-0.58 --add-module=../xss-nginx-module-0.05 --add-module=../ngx_coolkit-0.2rc3 --add-module=../set-misc-nginx-module-0.29 --add-module=../form-input-nginx-module-0.11 --add-module=../encrypted-session-nginx-module-0.04 --add-module=../srcache-nginx-module-0.30 --add-module=../ngx_lua-0.9.19 --add-module=../ngx_lua_upstream-0.04 --add-module=../headers-more-nginx-module-0.28 --add-module=../array-var-nginx-module-0.04 --add-module=../memc-nginx-module-0.16 --add-module=../redis2-nginx-module-0.12 --add-module=../redis-nginx-module-0.3.7 --add-module=../rds-json-nginx-module-0.14 --add-module=../rds-csv-nginx-module-0.07 --with-ld-opt=-Wl,-rpath,/usr/local/openresty/luajit/lib --with-pcre=/root/softwares/ngx_openresty-1.9.3.2rc3/../pcre-8.37 --with-pcre-jit --with-http_ssl_module
正常使用时的配置文件
# vim nginx.conf
worker_processes 1;
error_log /data/logs/error.log notice;
pid /data/logs/nginx.pid;
events {
worker_connections 10;
}
http {
include mime.types;
default_type application/octet-stream;
access_log /data/logs/access.log;
sendfile on;
keepalive_timeout 600;
upstream backend {
server 0.0.0.0;
balancer_by_lua_block {
ngx.log(ngx.NOTICE, "hello from balancer");
local b = require "ngx.balancer"
assert(b.set_current_peer("172.16.2.227", 8090))
}
}
server {
listen 80;
server_name localhost;
location / {
proxy_pass
http://backend;
}
}
}
请求返回,另外一端我为了调试方便,打印了所有 header
# curl "
http://localhost"
***GET / HTTP/1.0
Host: backend
Connection: close
User-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/
3.16.2.3 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2
Accept: */*
***
同时日志中输出
2015/11/17 15:41:01 [notice] 26933#0: *12313 [lua] balancer_by_lua:2: hello from balancer while connecting to upstream, client: 127.0.0.1, server: localhost, request: "GET / HTTP/1.1", host: "localhost"
所有看起来正常。
------------- 以下是出错部分 ----------------
1. 在 nginx.conf 里 balancer_by_lua_block 段前面加 keepalive 配置,启动 nginx 时会报 warning
upstream backend {
server 0.0.0.0;
keepalive 16;
balancer_by_lua_block {
ngx.log(ngx.NOTICE, "hello from balancer");
local b = require "ngx.balancer"
assert(b.set_current_peer("172.16.2.227", 8090))
}
}
# /usr/local/openresty/nginx/sbin/nginx -s reload
nginx: [warn] load balancing method redefined in /usr/local/openresty/nginx/conf/nginx.conf:25
实际使用时效果与上面正常时完全一样,keepalive 效果似乎没有,感觉是 keepalive 的效果被 balancer_by_lua 覆盖掉了。
2. 在 nginx.conf 里 balancer_by_lua_block 段后面加 keepalive 配置
upstream backend {
server 0.0.0.0;
balancer_by_lua_block {
ngx.log(ngx.NOTICE, "hello from balancer");
local b = require "ngx.balancer"
assert(b.set_current_peer("172.16.2.227", 8090))
}
keepalive 16;
}
请求时报 502 错误(朱德江报告coredump),如下
# curl "
http://localhost"
<html>
<head><title>502 Bad Gateway</title></head>
<body bgcolor="white">
<center><h1>502 Bad Gateway</h1></center>
<hr><center>openresty/1.9.3.2rc3</center>
</body>
</html>
日志:
2015/11/17 15:48:39 [notice] 26940#0: *12315 [lua] balancer_by_lua:2: hello from balancer while connecting to upstream, client: 127.0.0.1, server: localhost, request: "GET / HTTP/1.1", host: "localhost"
2015/11/17 15:48:39 [notice] 26940#0: *12317 [lua] balancer_by_lua:2: hello from balancer while connecting to upstream, client: 127.0.0.1, server: localhost, request: "GET / HTTP/1.0", host: "backend"
2015/11/17 15:48:39 [notice] 26940#0: *12319 [lua] balancer_by_lua:2: hello from balancer while connecting to upstream, client: 127.0.0.1, server: localhost, request: "GET / HTTP/1.0", host: "backend"
2015/11/17 15:48:39 [notice] 26940#0: *12321 [lua] balancer_by_lua:2: hello from balancer while connecting to upstream, client: 127.0.0.1, server: localhost, request: "GET / HTTP/1.0", host: "backend"
2015/11/17 15:48:39 [alert] 26940#0: 10 worker_connections are not enough
2015/11/17 15:48:39 [error] 26940#0: *12321 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 127.0.0.1, server: localhost, request: "GET / HTTP/1.0", upstream: "
http://0.0.0.0:80/", host: "backend"
其中诡异的地方是 hello 出现了四次,最后报 worker_connections 不够 (这里 10 是我为了演示特意改小的),而且最后一条错误日志中居然出现了
http://0.0.0.0:80/ ,而不是我在 balancer 中指定的后端,看起来象是 server 0.0.0.0 配置没有被完整覆盖,然后掉入了某个死循环。把 server 改掉再试
upstream backend {
server 1.1.1.1;
balancer_by_lua_block {
ngx.log(ngx.NOTICE, "hello from balancer");
local b = require "ngx.balancer"
assert(b.set_current_peer("172.16.2.227", 8090))
}
keepalive 16;
}
再次请求经过比较长的时间后报 502
# curl "
http://localhost"
<html>
<head><title>502 Bad Gateway</title></head>
<body bgcolor="white">
<center><h1>502 Bad Gateway</h1></center>
<hr><center>openresty/1.9.3.2rc3</center>
</body>
</html>
日志
2015/11/17 15:53:40 [notice] 26948#0: *12323 [lua] balancer_by_lua:2: hello from balancer while connecting to upstream, client: 127.0.0.1, server: localhost, request: "GET / HTTP/1.1", host: "localhost"
2015/11/17 15:53:55 [error] 26948#0: *12323 connect() failed (110: Connection timed out) while connecting to upstream, client: 127.0.0.1, server: localhost, request: "GET / HTTP/1.1", upstream: "
http://1.1.1.1:80/", host: "localhost"
最终报出来的是超时错误,而且 upstream 是 1.1.1.1,所以上面猜测的“没有覆盖完全”应该是有道理的。
结论:
balancer_by_lua 的当前实现与 upstream keepalive 有冲突。