下面是一段以请求(request)的生命周期为主线,抓的一段我们的日志:
2017-04-16 18:19:42.252][tid 3417802][co 0x7f99c99c6f18]
[DEBUG] ngx_http_create_request: r:0x7f99ca2a4bf0, c:0x7f993a558a70,
pool:0x7f99ca2a4ba0, ctx 0x7f99ca2a54e8
[2017-04-16 18:19:42.252][tid 3417802][co 0x7f99c99c6f18]
[DEBUG] ngx_http_lua_content_by_chunk: r:0x7f99ca2a4bf0, vm:0x41a1b378,
ctx:0x7f99ca2ba280 => co:0x40c19f40
[2017-04-16 18:19:42.252][tid 3417802][co 0x7f99c99c6f18]
[DEBUG] ngx_http_lua_content_by_chunk: r:0x7f99ca2a4bf0, vm:0x41a1b378,
ctx:0x7f99ca2ba280 => run thread
[2017-04-16 18:19:44.453][tid 3417802][co 0x7f99c99c6f18]
[DEBUG] ngx_http_lua_ngx_sleep: r:0x7f99ca2a4bf0, c:0x7f993a558a70,
ctx:0x7f99ca2ba280, cur:0x7f99ca2ba2a8, cleanup:0x5e0afe(0x7f99ca2a4bf0),
ev:0x7f99ca2ba2f8
[2017-04-16 18:19:44.453][tid 3417802][co 0x7f99c99c6f18]
[DEBUG] ngx_http_lua_ngx_sleep: r:0x7f99ca2a4bf0, delay:546,
uri:/flv/100/109/o0018qvarf5.mp4, ev:0x7f99ca2ba2f8
[2017-04-16 18:19:44.453][tid 3417802][co 0x7f99c99c6f18]
[DEBUG] ngx_http_lua_content_by_chunk: r:0x7f99ca2a4bf0, vm:0x41a1b378,
ctx:0x7f99ca2ba280 => run back, rc:-2
[2017-04-16 18:19:44.453][tid 3417802][co 0x7f99c99c6f18]
[DEBUG] ngx_http_lua_content_by_chunk: r:0x7f99ca2a4bf0, vm:0x41a1b378,
ctx:0x7f99ca2ba280 => AGAIN, run posted
[2017-04-16 18:19:44.453][tid 3417802][co 0x7f99c99c6f18]
[DEBUG] ngx_http_close_request: r:0x7f99ca2a4bf0,
c:0x7f993a558a70, ctx:0x7f99ca2a54e8, count:2, blocked:0
[2017-04-16 18:19:45.027][tid 3417802][co 0x7f9936232d00]
[DEBUG] ngx_http_lua_sleep_handler: r:0x7f99ca2a4bf0, c:0x7f993a558a70,
ctx:0x7f99ca2ba280, r->cleanup:0x7f99ca2a5b80
[2017-04-16 18:19:45.027][tid 3417802][co 0x7f9936232d00]
[DEBUG] ngx_http_lua_sleep_handler: r:0x7f99ca2a4bf0, c:0x7f993a558a70,
ctx:0x7f99ca2ba280, cur:0x7f99ca2ba2a8, r->cleanup:0x7f99ca2a5b80
[2017-04-16 18:19:45.730][tid 3417802][co 0x7f9936232d00]
[DEBUG] ngx_http_lua_ngx_sleep: r:0x7f99ca2a4bf0, c:0x7f993a558a70,
ctx:0x7f99ca2ba280, cur:0x7f99ca2ba2a8, cleanup:0x5e0afe(0x7f99ca2a4bf0),
ev:0x7f99ca2ba2f8
[2017-04-16 18:19:45.730][tid 3417802][co 0x7f9936232d00]
[DEBUG] ngx_http_lua_ngx_sleep: r:0x7f99ca2a4bf0, delay:270,
uri:/flv/100/109/o0018qvarf5.mp4, ev:0x7f99ca2ba2f8
[2017-04-16 18:19:45.730][tid 3417802][co 0x7f9936232d00]
[DEBUG] ngx_http_lua_sleep_handler: r:0x7f99ca2a4bf0, c:0x7f993a558a70,
ctx:0x7f99ca2ba280, cur:(nil), r->cleanup:0x7f99ca2a5b80
[2017-04-16 18:19:46.038][tid 3417802][co 0x7f99362364f0]
[DEBUG] ngx_http_lua_sleep_handler: r:0x7f99ca2a4bf0, c:0x7f993a558a70,
ctx:0x7f99ca2ba280, r->cleanup:0x7f99ca2a5b80
[2017-04-16 18:19:46.038][tid 3417802][co 0x7f99362364f0]
[DEBUG] ngx_http_lua_sleep_handler: r:0x7f99ca2a4bf0, c:0x7f993a558a70,
ctx:0x7f99ca2ba280, cur:0x7f99ca2ba2a8, r->cleanup:0x7f99ca2a5b80
[2017-04-16 18:19:46.411][tid 3417802][co 0x7f99362364f0]
[DEBUG] ngx_http_lua_ngx_sleep: r:0x7f99ca2a4bf0,
c:0x7f993a558a70, ctx:0x7f99ca2ba280, cur:0x7f99ca2ba2a8,
cleanup:0x5e0afe(0x7f99ca2a4bf0), ev:0x7f99ca2ba2f8
[2017-04-16 18:19:46.411][tid 3417802][co 0x7f99362364f0]
[DEBUG] ngx_http_lua_ngx_sleep: r:0x7f99ca2a4bf0, delay:588,
uri:/flv/100/109/o0018qvarf5.mp4, ev:0x7f99ca2ba2f8
[2017-04-16 18:19:46.411][tid 3417802][co 0x7f99362364f0]
[DEBUG] ngx_http_lua_sleep_handler: r:0x7f99ca2a4bf0, c:0x7f993a558a70,
ctx:0x7f99ca2ba280, cur:(nil), r->cleanup:0x7f99ca2a5b80
[2017-04-16 18:19:47.043][tid 3417802][co 0x7f993623ada8]
[DEBUG] ngx_http_lua_sleep_handler: r:0x7f99ca2a4bf0, c:0x7f993a558a70,
ctx:0x7f99ca2ba280, r->cleanup:0x7f99ca2a5b80
[2017-04-16 18:19:47.043][tid 3417802][co 0x7f993623ada8]
[DEBUG] ngx_http_lua_sleep_handler: r:0x7f99ca2a4bf0, c:0x7f993a558a70,
ctx:0x7f99ca2ba280, cur:0x7f99ca2ba2a8, r->cleanup:0x7f99ca2a5b80
[2017-04-16 18:19:48.123][tid 3417802][co 0x7f9936232d00]
[DEBUG] ngx_http_lua_finalize_request: r:0x7f99ca2a4bf0, rc:499
[2017-04-16 18:19:48.123][tid 3417802][co 0x7f9936232d00]
[DEBUG] ngx_http_lua_finalize_request: r:0x7f99ca2a4bf0, ctx:0x7f99ca2ba280,
cur:0x7f99ca2ba2a8 => cleanup pending
[2017-04-16 18:19:48.123][tid 3417802][co 0x7f9936232d00]
[DEBUG] ngx_http_lua_finalize_request: r:0x7f99ca2a4bf0, ctx:0x7f99ca2ba280,
c:0x7f993a558a70 => fin req
[2017-04-16 18:19:48.123][tid 3417802][co 0x7f9936232d00]
[DEBUG] ngx_http_close_request: r:0x7f99ca2a4bf0, c:0x7f993a558a70,
ctx:0x7f99ca2a54e8, count:1, blocked:0
[2017-04-16 18:19:48.123][tid 3417802][co 0x7f9936232d00]
[DEBUG] ngx_http_free_request: r:0x7f99ca2a4bf0, pool:0x7f99ca2a4ba0,
ctx:0x7f99ca2a54e8, cleanup:(nil)
NGX流程在检测到客户端断链后,第一次尝试关闭请求,发现count = 2,放弃。
进入LUA脚本执行,ngx.sleep被再次调用,ngx_http_lua_ngx_sleep注册cleanup清理函数(ngx_http_lua_sleep_cleanup)
回到NGX流程,进入ngx_http_lua_finalize_request,发现499码,此时,请求的count = 1,开始关闭请求、关闭连接,在清理现场时,发现请求(r)的清理函数堆栈(r->cleanup)为空。
问题出在关闭请求时,发现清理函数堆栈为空
Sleep时已经主动注册过自己的清理函数到LUA协程的上下文中。进一步跟踪流程发现在ngx_http_lua_run_thread中,resume后的处理分支:case LUA_YIELD => case
NGX_HTTP_LUA_USER_CORO_NOP处,当前协程的上下文被强行清除(ctx->cur_co_ctx = NULL),而此时sleep拉起的事件在等待超时触发,此后进入NGX关断流程时,该事件残留在系统中被再次执行。
我们的一个修补方式是,在当前协程上下文被强行清除前,主动执行一次pending的清理:
case NGX_HTTP_LUA_USER_CORO_NOP:
dd("hit! it is the API yield");
ngx_http_lua_assert(lua_gettop(ctx->cur_co_ctx->co)
== 0);
ngx_http_lua_cleanup_pending_operation(ctx->cur_co_ctx);/*patch*/
ctx->cur_co_ctx = NULL;
return NGX_AGAIN;
我们的问题是:
1, 请求与连接已关闭时,为什么没有将与之相关的所有事件清理干净?这是有基于某些特别的考虑,还是确实是一个BUG?
2, 上面这样的修补方式是否正确?我们对修补点不是有十分的把握,但修补后确实没有再发现sleep事件残留。