Hello, everybody
We have our Kronometrix project based on OpenResty, and we are using the
websocket server module "lua-resty-websocket" (
require "resty.websocket.server")
In the handling code on the "
/websocket"
path we are doing these (generic) steps:
- connecting to Redis with a timeout of 1 year
- creating the websocket server instance, like this:
local wb, err = server:new{
timeout = 5000, -- in
milliseconds
max_payload_len =
6553500,
}
- creating an infinite loop (
while
true do)
- in the infinite loop we are doing these operations:
- handling close event
- sending ping events
- spawning several light threads in which we do:
- creating own Redis connections
- sending/receiving some payload over the websocket (the
same one from the parent thread)
- sleep for 100 ms
- when the websocket is closed, we close the Redis connection and
we are killing the light threads
We keep receiving these errors in the error log:
2017/09/08
19:48:44 [error] 68593#0: *5377 lua tcp socket read timed out, client:
127.0.0.1, server: , request: "GET /websocket HTTP/1.1", host:
"dev.kronometrix.com"
2017/09/08 19:48:44 [error] 68592#0: *5673 lua
tcp socket read timed out, client: 127.0.0.1, server: , request: "GET
/websocket HTTP/1.1", host: "dev.kronometrix.com"
2017/09/08 19:48:49
[error] 68593#0: *5377 lua tcp socket read timed out, client:
127.0.0.1, server: , request: "GET /websocket HTTP/1.1", host:
"dev.kronometrix.com"
2017/09/08 19:48:49 [error] 68592#0: *5673 lua
tcp socket read timed out, client: 127.0.0.1, server: , request: "GET
/websocket HTTP/1.1", host: "dev.kronometrix.com"
2017/09/08 19:48:57
[error] 68592#0: *5673 lua tcp socket read timed out, client:
127.0.0.1, server: , request: "GET /websocket HTTP/1.1", host:
"dev.kronometrix.com"
2017/09/08 19:48:57 [error] 68593#0: *5377 lua
tcp socket read timed out, client: 127.0.0.1, server: , request: "GET
/websocket HTTP/1.1", host: "dev.kronometrix.com"
2017/09/08 19:49:02
[error] 68593#0: *5377 lua tcp socket read timed out, client:
127.0.0.1, server: , request: "GET /websocket HTTP/1.1", host:
"dev.kronometrix.com"
2017/09/08 19:49:02 [error] 68592#0: *5673 lua
tcp socket read timed out, client: 127.0.0.1, server: , request: "GET
/websocket HTTP/1.1", host: "dev.kronometrix.com"
2017/09/08 19:49:08
[error] 68592#0: *5673 lua tcp socket read timed out, client:
127.0.0.1, server: , request: "GET /websocket HTTP/1.1", host:
"dev.kronometrix.com"
2017/09/08 19:49:08 [error] 68593#0: *5377 lua
tcp socket read timed out, client: 127.0.0.1, server: , request: "GET
/websocket HTTP/1.1", host: "dev.kronometrix.com"
2017/09/08 19:49:13
[error] 68593#0: *5377 lua tcp socket read timed out, client:
127.0.0.1, server: , request: "GET /websocket HTTP/1.1", host:
"dev.kronometrix.com"
2017/09/08 19:49:13 [error] 68592#0: *5673 lua
tcp socket read timed out, client: 127.0.0.1, server: , request: "GET
/websocket HTTP/1.1", host: "dev.kronometrix.com"
2017/09/08 19:49:19
[error] 68593#0: *5377 lua tcp socket read timed out, client:
127.0.0.1, server: , request: "GET /websocket HTTP/1.1", host:
"dev.kronometrix.com"
2017/09/08 19:49:19 [error] 68592#0: *5673 lua
tcp socket read timed out, client: 127.0.0.1, server: , request: "GET
/websocket HTTP/1.1", host: "dev.kronometrix.com"
2017/09/08 19:49:24
[error] 68593#0: *5377 lua tcp socket read timed out, client:
127.0.0.1, server: , request: "GET /websocket HTTP/1.1", host:
"dev.kronometrix.com"
2017/09/08 19:49:24 [error] 68592#0: *5673 lua
tcp socket read timed out, client: 127.0.0.1, server: , request: "GET
/websocket HTTP/1.1", host: "dev.kronometrix.com"
2017/09/08 19:50:14
[error] 68593#0: *5377 lua tcp socket read timed out, client:
127.0.0.1, server: , request: "GET /websocket HTTP/1.1", host:
"dev.kronometrix.com"
We added this directive in the
nginx.conf
file, but it didn't help:
lua_check_client_abort on;
We think the errors might come from:
- either from the way the websocket is closed on the client vs on the
server side - maybe the ping/pong frames fail, or something else?
- or from the way we are handling the connection with Redis (maybe
because we kill the light threads without signaling it somehow?)
How can we pinpoint the source of the error? What could we try in order
to fix this error?
Thank you
Bogda.