大家好,线上运行的时候,nginx运行出现主题描述的问题,具体信息如下:
操作系统信息:
Red Hat Enterprise Linux Server release 6.5 (Santiago)
Linux inta 2.6.32-431.el6.x86_64 #1 SMP Sun Nov 10 22:19:54 EST 2013 x86_64 x86_64 x86_64 GNU/Linux
nginx.conf部分内容
worker_processes auto;
worker_rlimit_nofile 65535;
events {
worker_connections 65535;
}
log_format main '[$req_timestamp]-[$request_time]-[$status]-[$upstream_response_time]-[$upstream_status]';
upstream backend {
server 0.0.0.0;
balancer_by_lua_file lua/balance.lua;
keepalive 300;
}
location / {
access_log logs/access.log main;
rewrite_by_lua_file lua/rewrite.lua;
proxy_pass http://backend;
proxy_read_timeout 1600ms;
proxy_send_timeout 1600ms;
proxy_connect_timeout 500ms;
proxy_request_buffering off;
proxy_http_version 1.1;
proxy_set_header Connection "";
log_by_lua_file lua/log.lua;
}
rewrite.lua里主要是对多个共享内存的访问。
access.log如下:
log_format main '[$req_timestamp]-[$request_time]-[$status]-[$upstream_response_time]-[$upstream_status]';
$req_timestamp的值是在进入rewrite.lua时获得的时间
[2019-04-21 17:44:01.211]-[0.076]-[200]-[0.076]-[200]
[2019-04-21 17:44:01.216]-[35.435]-[504]-[35.435]-[504]
[2019-04-21 17:44:01.211]-[35.440]-[504]-[35.440]-[504]
[2019-04-21 17:44:01.181]-[35.470]-[200]-[35.470]-[200]
[2019-04-21 17:44:01.287]-[35.364]-[504]-[35.364]-[504]
[2019-04-21 17:44:01.260]-[35.391]-[504]-[35.391]-[504]
[2019-04-21 17:44:01.268]-[35.383]-[499]-[-]-[-]
[2019-04-21 17:44:01.265]-[35.386]-[504]-[35.386]
[2019-04-21 17:44:01.244]-[35.407]-[200]-[35.407]-[200]
[2019-04-21 17:44:01.160]-[35.489]-[200]-[35.489]-[200]
[2019-04-21 17:44:01.218]-[35.431]-[200]-[35.431]-[200]
[2019-04-21 17:44:01.262]-[35.390]-[504]-[35.390]-[504]
[2019-04-21 17:44:01.211]-[0.088]-[200]-[0.088]-[200]
[2019-04-21 17:44:01.214]-[35.440]-[504]-[35.440]-[504]
[2019-04-21 17:44:01.216]-[35.438]-[504]-[35.438]-[504]
[2019-04-21 17:44:01.267]-[35.387]-[504]-[35.387]-[504]
[2019-04-21 17:44:01.283]-[35.371]-[504]-[35.371]-[504]
[2019-04-21 17:44:01.299]-[35.355]-[504]-[35.355]-[504]
[2019-04-21 17:44:01.254]-[35.397]-[504]-[35.397]-[504]
略去一些17:44:01时间点日志信息
[2019-04-21 17:44:36.657]-[0.000]-[499]-[-]-[-]
[2019-04-21 17:44:36.656]-[0.000]-[502]-[0.000]-[502]
[2019-04-21 17:44:36.657]-[0.000]-[499]-[-]-[-]
[2019-04-21 17:44:36.651]-[0.000]-[502]-[0.000]-[502]
[2019-04-21 17:44:36.657]-[0.000]-[502]-[0.000]-[502]
在17:44:02-17:44:35 时间内没有日志信息,这段使劲内客户端访问nginx,提示连接被拒绝。
日志的错误信息如下:
epoll_wait() reported that client prematurely closed connection, so upstream connection is closed too while connecting t
o upstream--------->499错误原因
recv() failed (104: Connection reset by peer) while sending to client --------->???
upstream prematurely closed connection while reading response header from upstream--------->504错误原因
upstream timed out (110: Connection timed out) while reading response header from upstream--------->502错误原因
程序大部分时间运行正常,偶尔出现一段不翼而飞的几十秒。
有种想法是代码阻塞在哪里,导致nginx一直不去accept socket连接,导致TCP队列满了,连接拒绝。
可是分析代码,并没有什么耗时的逻辑,rewrite.lua里的竞争也只是共享内存的使用。
大家有什么想法吗?