Ian,
What you are wanting to do sounds very reasonable but turns out to be stunningly, excruciatingly difficult, in the general sense. Its hard to truly replicate flow and even harder to do it without adversely impacting the production system, unless you are willing to invest in network taps or spanning ports. I have seen many folk try to achieve similar results using a range of different approaches, that have included adding functionality at the hardware load balancer layer, capturing logs and replaying, capturing pcaps and replaying (very different).
What is the underlying goal? If a production web request involves a lost TCP packet that gets retransmitted, is that part of what you are trying to replicate?
Right now, do you have a production website with nontrivial traffic that you are wanting to divert? Is the nginx instance that you’re referring to one that already functions as a reverse proxy in your stack, forwarding requests to back end end systems? Does it have things like hardware load balancers, firewalls, IPS firewall, content filetrs, web application firewalls, caching reverse proxies etc in the stack?
Peter
Thanks for your reply! I looked into tools like tcpcopy and also gor ( https://github.com/buger/gor ), but it didn't give me that level of granularity I was looking for (specifically, only replaying a percentage of session id's requests, not requests as a whole). Good point on running get_uri_args in body_filter_by_lua. Even though I know my response size will never be more than 256 bytes (and in most cases 0 bytes), it can never hurt to be safe. Regarding the slim possibility that the session ID could be in multiple chunks, I believe I'm already handling that by appending each chunk to ngx.ctx.buffered and only processing the response once ngx.arg[2] is true. Am I missing something here? Regarding the timer issue you brought up, the current plan is to have the production system replay requests to a single server that acts as a relay server and does no actual processing, simply does the replaying of requests to the different dev systems, but I suppose there's still the chance that that replay server could have some issues that would cause the issue you mentioned. I'll look deeper into how resty-logger-socket does it. Hopefully it doesn't go over my head as this is still my first Lua project :). On Thursday, May 7, 2015 at 3:28:07 AM UTC-4, agentzh wrote: Hello!
On Thu, May 7, 2015 at 6:50 AM, Ian Carpenter wrote:
> Hey, so I've been tasked with setting up a system that will clone requests
> from a configurable percentage of session IDs to 1 or more servers for
> testing production data on development machines.
Hmm, I hope you use lower-level tools for such purposes, like tcpcopy:
https://github.com/session-replay-tools/tcpcopy
> I'm still pretty new to Lua
> in Nginx (and Lua in general actually), so I wanted to know if I have the
> right approach. Right now I'm reading the session ID from either the query
> string args, or the response body (if the request is to generate a new
> session ID) in body_filter_by_lua
One issue that I can see in your body_filter_by_lua code is that you
really need to take care of splitted session IDs across two (or even
more) data chunks.
The documentation for body_filter_by_lua already states that the body
filter will be invoked on one or multiple data chunks, that is, in a
streaming processing fashion. So one really need to code a small state
machine to take into account arbitrary data chunk boundaries. (Well,
in your case, the chance of splitting a session ID might be very small
because the data chunks are usually quite large, like a page size. But
to ensure it works 100%, you need to take care of extreme conditions).
For a similar reason, you might want to avoid calling costly
operations like ngx.req.get_uri_args(), in every invocation of the
body filter handler. It just needs to be called once in an earlier
context, like in your header_filter_by_lua, or even access_by_lua.
Just set a flag in the ngx.ctx table such that your body_filter_by_lua
can just test the flag directly.
>, and then dispatching the request via a
> timer in log_by_lua.
>
Be careful about time-consuming I/O operations in 0-delay timers
triggered by per-request code like log_by_lua. When these I/O
operations are slow, you may accumulate many 0-delay pending timers
very quickly, exceeding the limits. The best practice is to batch up
and buffer the upstream I/O requests and only create a timer to fire
off these I/O requests when there's no pending pushes. (Well, you can
throw the buffered data away when the backend is too slow and exhaust
your own in-memory buffers. This is defensive programming.)
You can have a look at CloudFlare's lua-resty-logger-socket library
for such an example:
https://github.com/cloudflare/lua-resty-logger-socket
Well, just my 2 cents :)
Good luck!
Best regards,
-agentzh
-- .
|