Dealing with large responses from ngx.location.capture

james · 2012-12-21T22:25:08+00:00

Hello! On Fri, Dec 28, 2012 at 2:30 AM, Antoine wrote: > > Could you just tell how much the performances are impacted when using > this kind of red...

Dealing with large responses from ngx.location.capture

james

Hi,

There doesn't seem to be a way to control the reading of response bodies from captured locations. Say for example you proxy a very large file, and capture this location, you have no way of knowing the size in advance to avoid allocating memory in Lua.

This seems potentially dangerous. Some kind of control where you can disable the automatic reading of the body until explicitly asking for it (similar to the request body functionality) would do the trick.

As a workaround we're considering issuing a HEAD on paths likely to have large responses before deciding whether to process them in Lua or simply redirect... But this requires knowledge in advance which isn't going to work for using ngx_lua as a generic intermediary.

Has this been discussed / considered before? Any thoughts on what could or should be done?

Cheers,

James.

brian

On Dec 21, 2012, at 9:25 AM, James Hurst <ja...@riverratrecords.com> wrote:

> Has this been discussed / considered before? Any thoughts on what could or should be done?
> 

Pure Lua HTTP client?

james

On 21 December 2012 23:45, Brian Akins <br...@akins.org> wrote:

Pure Lua HTTP client?

That might solve one case (presuming you're proxying to an upstream HTTP resource), but really I think ngx.location.capture needs some kind of deferred body loading mechanism. Say you're capturing a FastCGI location for example... or anything really.

I suppose another option might be to use body_filter_by_lua at the location being captured to test the Content-Length and return an error code if it's over a certain size. But that assumes you have a Content-Length header. I guess perhaps you could count the size of the chunks during the body filter phase.

I do think this is a problem though. It seems too easy to accidentally throw lots of data into Lua. If you're writing a module, you can't always assume that the end user will adequately protect the module from abuse. I think Lua modules need to be able to look after themselves if you see what I mean?

James.

agentzh

Hello!

On Fri, Dec 21, 2012 at 6:25 AM, James Hurst wrote:
> There doesn't seem to be a way to control the reading of response bodies
> from captured locations. Say for example you proxy a very large file, and
> capture this location, you have no way of knowing the size in advance to
> avoid allocating memory in Lua.
>
> This seems potentially dangerous. Some kind of control where you can disable
> the automatic reading of the body until explicitly asking for it (similar to
> the request body functionality) would do the trick.
>

ngx.location.capture and ngx.location.capture_multi always buffer all
the subrequest bodies by design. This is not good for large subrequest
responses.

The recommend way to do non-buffered processing in ngx_lua is to use
the cosocket API:

    http://wiki.nginx.org/HttpLuaModule#ngx.socket.tcp

One can easily code up a nonblocking HTTP or FastCGI or WSGI client
atop it and enjoy the ability of non-buffered streaming processing.
Thanks to the passive data pulling model of the cosocket API.

Even though it is technically possible to do non-buffered processing
with Nginx subrequests based on the new subrequest model introduced in
the most recent ngx_lua releases (which is kinda incompatible with
Nginx's default subrequest model, but way more flexible and simpler),
I (personally) do not have much motivation to polish the subrequest
thing because I like cosockets more :)

Another limitation about subrequests is that they share the same
memory pool with their parent request, so resources are hard to
release early, which is bad for large data processing.

Best regards,
-agentzh

james

On 22 December 2012 20:32, agentzh <age...@gmail.com> wrote:

Even though it is technically possible to do non-buffered processing
with Nginx subrequests based on the new subrequest model introduced in
the most recent ngx_lua releases (which is kinda incompatible with
Nginx's default subrequest model, but way more flexible and simpler),
I (personally) do not have much motivation to polish the subrequest
thing because I like cosockets more :)

Of course, generally I agree - cosockets are fantastic and make lots of sense. But subrequests are also very important in reality, because they allow you to reuse other parts of your configuration. I totally agree that if you're using subrequests just to do some arbitrary i/o, then cosockets should be explored, and the more client drivers we have in the ecosystem the better!

But things like pools of upstream backends, for example, come for free with the subrequest model. Why implement a HTTP Lua client for proxying to upstream servers when the proxy module + upstream gets you everything you need, tested and guaranteed? Often this separation of concerns is really desirable too. For example, my proxy cache module shouldn't need to care about the origin itself - it's really elegant to use subrequests because Nginx configuration abstracts the details.

I'm interested to learn what it might take to do non-buffered processing with subrequests. Is it a large refactor do you think? We may be able to throw some resource at it given some guidance on design.

Also, any other suggestions on practical ways to "protect" Lua from accidental large subrequests? Because in reality, right now I'd be happy to simply handle large responses as an exception and redirect to them directly. I just can't see a nice way to do this without first proxying at least the size of the largest response we'll accept, before bailing and redirecting. Not great really.

Any advice much appreciated!

James.

agentzh

Hello!

On Sun, Dec 23, 2012 at 4:04 AM, James Hurst wrote:
>
> Of course, generally I agree - cosockets are fantastic and make lots of
> sense. But subrequests are also very important in reality, because they
> allow you to reuse other parts of your configuration. I totally agree that
> if you're using subrequests just to do some arbitrary i/o, then cosockets
> should be explored, and the more client drivers we have in the ecosystem the
> better!
>

Yes, people have already been adding their new lua-resty-* libraries.
And I'm going to implement more client drivers myself too :)

> But things like pools of upstream backends, for example, come for free with
> the subrequest model. Why implement a HTTP Lua client for proxying to
> upstream servers when the proxy module + upstream gets you everything you
> need, tested and guaranteed? Often this separation of concerns is really
> desirable too. For example, my proxy cache module shouldn't need to care
> about the origin itself - it's really elegant to use subrequests because
> Nginx configuration abstracts the details.
>

I can understand this :)

> I'm interested to learn what it might take to do non-buffered processing
> with subrequests. Is it a large refactor do you think? We may be able to
> throw some resource at it given some guidance on design.
>

No, it will not be a large refactor. We just need to implement a new
set of ngx.subreq.* API. We cannot reuse the ngx.location.capture*
API.

Here is the basic idea for the API (details do not matter and are
subject to change):

    -- ngx.subreq.spawn will not return until the response headers
    --    are received or an error occurs
    local sr, err = ngx.subreq.spawn("/proxy", ...)
    if not sr then
        ngx.log(ngx.ERR, "failed to spawn subrequest: ", err)
        ngx.exit(500)
    end

    -- the sr.get_headers call does not do I/O itself,
    --    it just retrieves the response headers received
    --    in ngx.subreq.spawn():
    local resp_headers, err = sr.get_headers()

    -- now let's receive the response body chunk by chunk:
    while true do
         local chunk, err = sr.receive_body_chunk(4096)
         if err == "eof" then
             -- process the last chunk
             break
         end

         if err then
             ngx.log(ngx.ERR, "failed to receive body chunk: ", err)
             ngx.exit(500)
         end

         -- process the data chunk
    end

Yes, it looks very much like the TCP cosocket API.

Regarding implementation details, here's my proposal:

* return NGX_AGAIN and avoid marking the incoming chain bufs as
"consumed" in our ngx_lua "capture" output body filter. Upstream
modules like ngx_proxy that supports non-buffered output mode can work
out of the box.

* track each subrequest object's state in each coroutine, just as we
do for the cosocket objects.

Essentially the basic idea is to emulate a "slow client" in ngx_lua's
"capture" output filter such that the content handler defined in the
location serving the subrequest can have a chance to slow down while
outputing response body data. This should work if the corresponding
content handler does support the non-buffered output mode.

> Also, any other suggestions on practical ways to "protect" Lua from
> accidental large subrequests? Because in reality, right now I'd be happy to
> simply handle large responses as an exception and redirect to them directly.
> I just can't see a nice way to do this without first proxying at least the
> size of the largest response we'll accept, before bailing and redirecting.
> Not great really.
>

We can allow the user to set an upper limit on the size of the
subrequest captured by ngx.location.capture*. For example:

    local res = ngx.location.capture("/proxy", { max_body_size = 4096 })
    if res.truncated then
        -- do your error handling here
    end

Do you like it?

You're very welcome to contribute patches for these new features :)

Best regards,
-agentzh