Async subrequests with lua-resty-http

icarpenter · 2015-05-07T06:50:36+00:00

Hello! On Tue, May 5, 2015 at 9:05 PM, Remco Verhoef wrote: > I've added functionality to the ssl-cert-lua branch to dynamically generate > certificat...

Async subrequests with lua-resty-http

icarpenter

Hey, so I've been tasked with setting up a system that will clone requests from a configurable percentage of session IDs to 1 or more servers for testing production data on development machines. I'm still pretty new to Lua in Nginx (and Lua in general actually), so I wanted to know if I have the right approach. Right now I'm reading the session ID from either the query string args, or the response body (if the request is to generate a new session ID) in body_filter_by_lua, and then dispatching the request via a timer in log_by_lua.

Here's currently what I have:

nginx.conf
location / {
...
access_by_lua 'ngx.req.read_body()';

set $session_token "";
body_filter_by_lua_file store_token.lua;

log_by_lua_file dispatcher.lua;
}

store_token.lua

local args = ngx.req.get_uri_args()
-- If the id argument is present, in the query string, that's our token.
if args.id then
    ngx.var.session_token = args.id
-- For GenerateToken requests, the token must be extracted from the response.
elseif ngx.var.uri == '/GenerateToken' or ngx.var.uri == '/GenerateToken.js' then
    -- Build the response body from the buffer
    local resp_body = string.sub(ngx.arg[1], 1, 1000)
    ngx.ctx.buffered = (ngx.ctx.buffered or "") .. resp_body
    if ngx.arg[2] then
        -- For GET requests with a callback, extract the token from the callback response
        if ngx.var.uri == '/GenerateToken.js' and ngx.req.get_uri_args().callback then
            _, _, ngx.var.session_token = string.find(ngx.ctx.buffered, "(%x+-%x+-%x+-%x+%-%x+)")
        -- Otherwise, the response should just contain the token
        else
            ngx.var.session_token = ngx.ctx.buffered
        end
        -- Add the token to the query string so the remote server will not re-create a new token
        args.id = ngx.var.session_token
        ngx.req.set_uri_args(args)
    end
end

dispatcher.lua

local config = require "config"
local http = require "resty.http"
if not config.servers then
    -- Fill in with default values for unit testing
    config.servers = {
        ["127.0.0.1:" .. ngx.var.server_port] = .99
    }
end

local function send_request(premature, hostname, port, uri, method, headers, body)
    local httpc = http.new()
    httpc:set_timeout(0)
    httpc:connect(hostname, port or 80)
    headers.host = hostname

    -- Use send_request instead of request to prevent needing to read the response back.
    httpc:send_request{
        method = method,
        path = uri,
        headers = headers,
        body = body
    }
end

-- If we have a token stored, clone this request to other servers
if ngx.var.session_token and ngx.var.session_token ~= "" then
    for hostname, percentage in pairs(config.servers) do
        -- Deterministic way of selecting a percentage of tokens. It takes the last 4 bytes of the
        -- uuid, converts it to an integer, and modulos it by 10K, (which should generate a number
        -- between 0 and 9999). If it's less than the configured percentage for this host multiplied
        -- by 10K, then this request is a candidate for cloning. We use 10K as this would give us a
        -- precision of .01%
        if (tonumber(ngx.var.session_token:sub(-8), 16) % 10000) < (percentage * 10000) then
            -- Slit the IP and port up
            local remote_addr = {}
            for part in string.gmatch(hostname, '([^:]+)') do
                table.insert(remote_addr, part)
            end
            -- Dispatch the request on a timer so it will happen in the background.
            ngx.timer.at(0,
                send_request,
                remote_addr[1],
                remote_addr[2],
                -- Append the query string if we have it, othwerise just use the URI
                ngx.var.args and ngx.var.uri .. "?" .. ngx.var.args or ngx.var.uri,
                ngx.req.get_method(),
                ngx.req.get_headers(),
                ngx.req.get_body_data()
            )
        end
    end
end

What I have up there is working on small-scale tests, but I haven't done any load testing on it. I'm wondering if I've done anything I shouldn't have or made any mistakes that would potentially crop up once under load.

Thanks in advance for any advice.

agentzh

Hello!

On Thu, May 7, 2015 at 6:50 AM, Ian Carpenter wrote:
> Hey, so I've been tasked with setting up a system that will clone requests
> from a configurable percentage of session IDs to 1 or more servers for
> testing production data on development machines.

Hmm, I hope you use lower-level tools for such purposes, like tcpcopy:

https://github.com/session-replay-tools/tcpcopy

> I'm still pretty new to Lua
> in Nginx (and Lua in general actually), so I wanted to know if I have the
> right approach. Right now I'm reading the session ID from either the query
> string args, or the response body (if the request is to generate a new
> session ID) in body_filter_by_lua

One issue that I can see in your body_filter_by_lua code is that you
really need to take care of splitted session IDs across two (or even
more) data chunks.

The documentation for body_filter_by_lua already states that the body
filter will be invoked on one or multiple data chunks, that is, in a
streaming processing fashion. So one really need to code a small state
machine to take into account arbitrary data chunk boundaries. (Well,
in your case, the chance of splitting a session ID might be very small
because the data chunks are usually quite large, like a page size. But
to ensure it works 100%, you need to take care of extreme conditions).

For a similar reason, you might want to avoid calling costly
operations like ngx.req.get_uri_args(), in every invocation of the
body filter handler. It just needs to be called once in an earlier
context, like in your header_filter_by_lua, or even access_by_lua.
Just set a flag in the ngx.ctx table such that your body_filter_by_lua
can just test the flag directly.

>, and then dispatching the request via a
> timer in log_by_lua.
>

Be careful about time-consuming I/O operations in 0-delay timers
triggered by per-request code like log_by_lua. When these I/O
operations are slow, you may accumulate many 0-delay pending timers
very quickly, exceeding the limits. The best practice is to batch up
and buffer the upstream I/O requests and only create a timer to fire
off these I/O requests when there's no pending pushes. (Well, you can
throw the buffered data away when the backend is too slow and exhaust
your own in-memory buffers. This is defensive programming.)

You can have a look at CloudFlare's lua-resty-logger-socket library
for such an example:

https://github.com/cloudflare/lua-resty-logger-socket

Well, just my 2 cents :)

Good luck!

Best regards,
-agentzh

icarpenter

Thanks for your reply!

I looked into tools like tcpcopy and also gor ( https://github.com/buger/gor ), but it didn't give me that level of granularity I was looking for (specifically, only replaying a percentage of session id's requests, not requests as a whole).

Good point on running get_uri_args in body_filter_by_lua. Even though I know my response size will never be more than 256 bytes (and in most cases 0 bytes), it can never hurt to be safe. Regarding the slim possibility that the session ID could be in multiple chunks, I believe I'm already handling that by appending each chunk to ngx.ctx.buffered and only processing the response once ngx.arg[2] is true. Am I missing something here?

Regarding the timer issue you brought up, the current plan is to have the production system replay requests to a single server that acts as a relay server and does no actual processing, simply does the replaying of requests to the different dev systems, but I suppose there's still the chance that that replay server could have some issues that would cause the issue you mentioned. I'll look deeper into how resty-logger-socket does it. Hopefully it doesn't go over my head as this is still my first Lua project :).

On Thursday, May 7, 2015 at 3:28:07 AM UTC-4, agentzh wrote:

Hello!

On Thu, May 7, 2015 at 6:50 AM, Ian Carpenter wrote:
> Hey, so I've been tasked with setting up a system that will clone requests
> from a configurable percentage of session IDs to 1 or more servers for
> testing production data on development machines.

Hmm, I hope you use lower-level tools for such purposes, like tcpcopy:

https://github.com/session-replay-tools/tcpcopy

> I'm still pretty new to Lua
> in Nginx (and Lua in general actually), so I wanted to know if I have the
> right approach. Right now I'm reading the session ID from either the query
> string args, or the response body (if the request is to generate a new
> session ID) in body_filter_by_lua

One issue that I can see in your body_filter_by_lua code is that you
really need to take care of splitted session IDs across two (or even
more) data chunks.

The documentation for body_filter_by_lua already states that the body
filter will be invoked on one or multiple data chunks, that is, in a
streaming processing fashion. So one really need to code a small state
machine to take into account arbitrary data chunk boundaries. (Well,
in your case, the chance of splitting a session ID might be very small
because the data chunks are usually quite large, like a page size. But
to ensure it works 100%, you need to take care of extreme conditions).

For a similar reason, you might want to avoid calling costly
operations like ngx.req.get_uri_args(), in every invocation of the
body filter handler. It just needs to be called once in an earlier
context, like in your header_filter_by_lua, or even access_by_lua.
Just set a flag in the ngx.ctx table such that your body_filter_by_lua
can just test the flag directly.

>, and then dispatching the request via a
> timer in log_by_lua.
>

Be careful about time-consuming I/O operations in 0-delay timers
triggered by per-request code like log_by_lua. When these I/O
operations are slow, you may accumulate many 0-delay pending timers
very quickly, exceeding the limits. The best practice is to batch up
and buffer the upstream I/O requests and only create a timer to fire
off these I/O requests when there's no pending pushes. (Well, you can
throw the buffered data away when the backend is too slow and exhaust
your own in-memory buffers. This is defensive programming.)

You can have a look at CloudFlare's lua-resty-logger-socket library
for such an example:

https://github.com/cloudflare/lua-resty-logger-socket

Well, just my 2 cents :)

Good luck!

Best regards,
-agentzh

peter_booth

Ian,

What you are wanting to do sounds very reasonable but turns out to be stunningly, excruciatingly difficult, in the general sense.

Its hard to truly replicate flow and even harder to do it without adversely impacting the production system,

unless you are willing to invest in network taps or spanning ports. I have seen many folk try to achieve similar results

using a range of different approaches, that have included adding functionality at the hardware load balancer layer,

writing custom proxies (http://208.85.150.247/videos/1453-gogaruco2009-building-custom-web-proxies-with-ruby),

capturing logs and replaying, capturing pcaps and replaying (very different).

What is the underlying goal? If a production web request involves a lost TCP packet that gets retransmitted, is that

part of what you are trying to replicate?

Right now, do you have a production website with nontrivial traffic that you are wanting to

divert? Is the nginx instance that you’re referring to one that already functions as a reverse proxy

in your stack, forwarding requests to back end end systems? Does it have things like hardware

load balancers, firewalls, IPS firewall, content filetrs, web application firewalls, caching reverse

proxies etc in the stack?

Peter

On 9 May 2015, at 1:42 PM, Ian Carpenter <icar...@leadid.com> wrote:

Thanks for your reply!

I looked into tools like tcpcopy and also gor ( https://github.com/buger/gor ), but it didn't give me that level of granularity I was looking for (specifically, only replaying a percentage of session id's requests, not requests as a whole).

Good point on running get_uri_args in body_filter_by_lua. Even though I know my response size will never be more than 256 bytes (and in most cases 0 bytes), it can never hurt to be safe. Regarding the slim possibility that the session ID could be in multiple chunks, I believe I'm already handling that by appending each chunk to ngx.ctx.buffered and only processing the response once ngx.arg[2] is true. Am I missing something here?

Regarding the timer issue you brought up, the current plan is to have the production system replay requests to a single server that acts as a relay server and does no actual processing, simply does the replaying of requests to the different dev systems, but I suppose there's still the chance that that replay server could have some issues that would cause the issue you mentioned. I'll look deeper into how resty-logger-socket does it. Hopefully it doesn't go over my head as this is still my first Lua project :).

On Thursday, May 7, 2015 at 3:28:07 AM UTC-4, agentzh wrote:
Hello!

On Thu, May 7, 2015 at 6:50 AM, Ian Carpenter wrote:
> Hey, so I've been tasked with setting up a system that will clone requests
> from a configurable percentage of session IDs to 1 or more servers for
> testing production data on development machines.

Hmm, I hope you use lower-level tools for such purposes, like tcpcopy:

https://github.com/session-replay-tools/tcpcopy

> I'm still pretty new to Lua
> in Nginx (and Lua in general actually), so I wanted to know if I have the
> right approach. Right now I'm reading the session ID from either the query
> string args, or the response body (if the request is to generate a new
> session ID) in body_filter_by_lua

One issue that I can see in your body_filter_by_lua code is that you
really need to take care of splitted session IDs across two (or even
more) data chunks.

The documentation for body_filter_by_lua already states that the body
filter will be invoked on one or multiple data chunks, that is, in a
streaming processing fashion. So one really need to code a small state
machine to take into account arbitrary data chunk boundaries. (Well,
in your case, the chance of splitting a session ID might be very small
because the data chunks are usually quite large, like a page size. But
to ensure it works 100%, you need to take care of extreme conditions).

For a similar reason, you might want to avoid calling costly
operations like ngx.req.get_uri_args(), in every invocation of the
body filter handler. It just needs to be called once in an earlier
context, like in your header_filter_by_lua, or even access_by_lua.
Just set a flag in the ngx.ctx table such that your body_filter_by_lua
can just test the flag directly.

>, and then dispatching the request via a
> timer in log_by_lua.
>

Be careful about time-consuming I/O operations in 0-delay timers
triggered by per-request code like log_by_lua. When these I/O
operations are slow, you may accumulate many 0-delay pending timers
very quickly, exceeding the limits. The best practice is to batch up
and buffer the upstream I/O requests and only create a timer to fire
off these I/O requests when there's no pending pushes. (Well, you can
throw the buffered data away when the backend is too slow and exhaust
your own in-memory buffers. This is defensive programming.)

You can have a look at CloudFlare's lua-resty-logger-socket library
for such an example:

https://github.com/cloudflare/lua-resty-logger-socket

Well, just my 2 cents :)

Good luck!

Best regards,
-agentzh

--
.

icarpenter

I think I should give a quick overview on what our system does. Basically, out client-side code captures information about a user's workflow through web forms and sends it to our application server as it each event happens. So if a user checks a box, we capture that action. If a user navigates to another page, we capture that as well, etc. The reason why we need this replay system is because we are in the middle of a major rewrite of our codebase, and because of the unpredictability of client configurations, we get a lot of data that we would never have expected or been able to replicate during QA. Thus, we needed a system to clone traffic to 1 or more development branches of our new codebase to test these scenarios, as well as to see if any aggregations on the collected data match between the old and new systems. This is why doing a percentage of session ids as opposed to a percentage of requests as a whole is so important, as every request a particular session sends us factors into those aggregations.

The traffic we receive can be considered non-trivial, on the order of around 2K requests per second (though this is distributed across multiple instances via our load balancer). The only load balancer we take advantage of is Amazon's ELB, which is transparent to us. Same with the firewall, which is essentially just EC2 IP whitelisting. The nginx instance I spoke of previously, the one that would handle the forwarding to multiple servers, doesn't exist yet, but it'll just function as a way to prevent the server that does the processing of the data from being possibly overloaded in the event that we want to clone traffic to more than 1 server. This way, the application server will only copy requests to just that 1 server, and that server will handle all the copying to multiple instances.

Hopefully that clears things up. Let me know if you need any more info.

On Sunday, May 10, 2015 at 8:45:58 AM UTC-4, Peter Booth wrote:

Ian,

What you are wanting to do sounds very reasonable but turns out to be stunningly, excruciatingly difficult, in the general sense.
Its hard to truly replicate flow and even harder to do it without adversely impacting the production system,
unless you are willing to invest in network taps or spanning ports. I have seen many folk try to achieve similar results
using a range of different approaches, that have included adding functionality at the hardware load balancer layer,
writing custom proxies (http://208.85.150.247/videos/1453-gogaruco2009-building-custom-web-proxies-with-ruby),
capturing logs and replaying, capturing pcaps and replaying (very different).

What is the underlying goal? If a production web request involves a lost TCP packet that gets retransmitted, is that
part of what you are trying to replicate?

Right now, do you have a production website with nontrivial traffic that you are wanting to
divert? Is the nginx instance that you’re referring to one that already functions as a reverse proxy
in your stack, forwarding requests to back end end systems? Does it have things like hardware
load balancers, firewalls, IPS firewall, content filetrs, web application firewalls, caching reverse
proxies etc in the stack?

Peter

On 9 May 2015, at 1:42 PM, Ian Carpenter <icarp...@leadid.com> wrote:

Thanks for your reply!

I looked into tools like tcpcopy and also gor ( https://github.com/buger/gor ), but it didn't give me that level of granularity I was looking for (specifically, only replaying a percentage of session id's requests, not requests as a whole).

Good point on running get_uri_args in body_filter_by_lua. Even though I know my response size will never be more than 256 bytes (and in most cases 0 bytes), it can never hurt to be safe. Regarding the slim possibility that the session ID could be in multiple chunks, I believe I'm already handling that by appending each chunk to ngx.ctx.buffered and only processing the response once ngx.arg[2] is true. Am I missing something here?

Regarding the timer issue you brought up, the current plan is to have the production system replay requests to a single server that acts as a relay server and does no actual processing, simply does the replaying of requests to the different dev systems, but I suppose there's still the chance that that replay server could have some issues that would cause the issue you mentioned. I'll look deeper into how resty-logger-socket does it. Hopefully it doesn't go over my head as this is still my first Lua project :).

On Thursday, May 7, 2015 at 3:28:07 AM UTC-4, agentzh wrote:
Hello!

On Thu, May 7, 2015 at 6:50 AM, Ian Carpenter wrote:
> Hey, so I've been tasked with setting up a system that will clone requests
> from a configurable percentage of session IDs to 1 or more servers for
> testing production data on development machines.

Hmm, I hope you use lower-level tools for such purposes, like tcpcopy:

https://github.com/session-replay-tools/tcpcopy

> I'm still pretty new to Lua
> in Nginx (and Lua in general actually), so I wanted to know if I have the
> right approach. Right now I'm reading the session ID from either the query
> string args, or the response body (if the request is to generate a new
> session ID) in body_filter_by_lua

One issue that I can see in your body_filter_by_lua code is that you
really need to take care of splitted session IDs across two (or even
more) data chunks.

The documentation for body_filter_by_lua already states that the body
filter will be invoked on one or multiple data chunks, that is, in a
streaming processing fashion. So one really need to code a small state
machine to take into account arbitrary data chunk boundaries. (Well,
in your case, the chance of splitting a session ID might be very small
because the data chunks are usually quite large, like a page size. But
to ensure it works 100%, you need to take care of extreme conditions).

For a similar reason, you might want to avoid calling costly
operations like ngx.req.get_uri_args(), in every invocation of the
body filter handler. It just needs to be called once in an earlier
context, like in your header_filter_by_lua, or even access_by_lua.
Just set a flag in the ngx.ctx table such that your body_filter_by_lua
can just test the flag directly.

>, and then dispatching the request via a
> timer in log_by_lua.
>

Be careful about time-consuming I/O operations in 0-delay timers
triggered by per-request code like log_by_lua. When these I/O
operations are slow, you may accumulate many 0-delay pending timers
very quickly, exceeding the limits. The best practice is to batch up
and buffer the upstream I/O requests and only create a timer to fire
off these I/O requests when there's no pending pushes. (Well, you can
throw the buffered data away when the backend is too slow and exhaust
your own in-memory buffers. This is defensive programming.)

You can have a look at CloudFlare's lua-resty-logger-socket library
for such an example:

https://github.com/cloudflare/lua-resty-logger-socket

Well, just my 2 cents :)

Good luck!

Best regards,
-agentzh

.

icarpenter

Hi, just bumping this thread to give an update and ask for more advice. I actually took the earlier advice to set up a buffer system and did just that. I've posted the code over at the code review StackExchange for exposure and hope that someone can say if my approach seems good, as nobody at my job knows Lua, let alone Lua within Nginx.

Here's the StackExchange post: https://codereview.stackexchange.com/questions/110277/asynchronous-request-cloning-using-lua-openresty

Thanks again for any advice.