Sharing variable between requests within a worker.

alexeev.roman · 2016-05-12T22:25:23+00:00

You may want to search for a Lua HTML parser to accomplish this, along with a body_filter_by_lua directive. Additionally, may I recommend that you take...

Sharing variable between requests within a worker.

alexeev.roman

Hello, openresty fans!

Please suggest me how to share a big table (the table contains a graph size of several gigabytes) across client requests within one worker process?

I understand that sharing between workers is bad because it needs serialization.

So i set "worker_processes 1" and initialized the table a large volume of data.

Now i want allow all users access to this big table without overheads.

aapo.talvensaari

On 12 May 2016 at 17:25, Roman Alexeev <alexee...@gmail.com> wrote:

Hello, openresty fans!

Please suggest me how to share a big table (the table contains a graph size of several gigabytes) across client requests within one worker process?

If you have enough RAM for that data, I would load it inside Lua module level variable on init_by_lua.

aapo.talvensaari

On 12 May 2016 at 17:56, Aapo Talvensaari <aapo.ta...@gmail.com> wrote:

On 12 May 2016 at 17:25, Roman Alexeev <alexee...@gmail.com> wrote:
Hello, openresty fans!

Please suggest me how to share a big table (the table contains a graph size of several gigabytes) across client requests within one worker process?

If you have enough RAM for that data, I would load it inside Lua module level variable on init_by_lua.

You may hit LuaJIT memory limits though, so the alternative is to run PUC Lua instead or preferrably use FFI to gain access for more memory (or some LuaJIT patch that allows more memory).

jonathan

On Thursday, May 12, 2016 at 10:25:23 AM UTC-4, Roman Alexeev wrote:

Please suggest me how to share a big table (the table contains a graph size of several gigabytes) across client requests within one worker process?

Can you explain how the data will be used? ie the purpose or Read vs Write, number of lookups, etc. Someone may know of another approach.

alexeev.roman

On Thursday, May 12, 2016 at 10:25:23 AM UTC-4, Roman Alexeev wrote:
Please suggest me how to share a big table (the table contains a graph size of several gigabytes) across client requests within one worker process?

Can you explain how the data will be used? ie the purpose or Read vs Write, number of lookups, etc. Someone may know of another approach.

0. I will start several servers (each with single worker) with similar copy of the graph for accept parallel requests.

1. Each server initializes with same dataset and this dataset is READONLY all the lifetime. When i need to make changes - i successively reinitialize servers with new dataset.

2. Users request operations that discover the graph (can take a few seconds). Most operations require a lot (thousands/millions) of table lookups and can create temporary local variables (they will not be shared across client requests).

I tried to do this in redis, but

a. redis is not friendly for long scripts

b. it is slow to send each lookup (or group of lookups) by a separate commands

jonathan

I had seen some tickets and threads about this stuff in the archives. A quick search turned up this ticket, which references a few different modules: https://github.com/openresty/lua-nginx-module/issues/96

Does this need to be done in openresty/lua?

Other languages have decent readonly shared memory implementations for threads and child processes - depending on your use case, you might have better luck with openresty handing this task off to another service via a private API.

me+lists.openresty

Hi Roman,

If you want to do the request processing in nginx and not in another
process over IPC like jonathan suggested, you can use shared memory
from an nginx module (in C).

I just finished doing something vaguely similar (IP addresses blocking)
and I'll give you a brief run-down on how I did it, should it be of any
use to you. Ultimately it uses a server-client model.

To begin with a Golang program of ~400 loc (with ~100 devoted to a
simple TUI) creates shared memory with `shm_open` on which `mmap` is
called. The data is copied and pointers to it (relative to the mmap
addr) are stored in a fixed size header. The header contains a rwlock
based on Golang's sync.RWMutex
(https://golang.org/src/sync/rwmutex.go?s=479:811#L9) owing to its
simplicity, the relative pointer and length of the data, a revision
field that is incremented whenever the header changes and some flags.
The pointers and the revision field are only ever changed inside the
write lock.

At start-up the nginx module (~400 loc) opens the shared memory with
`shm_open` and calls `mmap` to map it into nginx's address space. When a
request comes in, the worker takes a read lock out. It then checks
whether the revision field has changed, iff it has the shared memory has
to be re-mmaped. Then it performs whatever processing it needs to do on
the data (in my case searching for remote IP address) and releases the
read lock.

There are optimisations that may or may not work that are beyond the
scope of this email but suffice to say, they aim to minimise the length
of time the write lock needs to be held.

If you need to invoke the processing from Lua, you can export a function
and call it with FFI.

Also, with this method (to keep it simple) you have exactly one server
copy writing the data and each client reads the same data. Because of
the rwlock there is no need to reload or restart nginx to update the
data either.

Kind Regards,
Tom Thorogood.

On Thursday, May 12, 2016 at 4:19:50 PM UTC, Roman Alexeev wrote:

On Thursday, May 12, 2016 at 10:25:23 AM UTC-4, Roman Alexeev wrote:
Please suggest me how to share a big table (the table contains a graph size of several gigabytes) across client requests within one worker process?

Can you explain how the data will be used? ie the purpose or Read vs Write, number of lookups, etc. Someone may know of another approach.

0. I will start several servers (each with single worker) with similar copy of the graph for accept parallel requests.
1. Each server initializes with same dataset and this dataset is READONLY all the lifetime. When i need to make changes - i successively reinitialize servers with new dataset.
2. Users request operations that discover the graph (can take a few seconds). Most operations require a lot (thousands/millions) of table lookups and can create temporary local variables (they will not be shared across client requests).

I tried to do this in redis, but
a. redis is not friendly for long scripts
b. it is slow to send each lookup (or group of lookups) by a separate commands

aapo.talvensaari

On Thursday, 12 May 2016 19:19:50 UTC+3, Roman Alexeev wrote:

On Thursday, May 12, 2016 at 10:25:23 AM UTC-4, Roman Alexeev wrote:
Please suggest me how to share a big table (the table contains a graph size of several gigabytes) across client requests within one worker process?

Can you explain how the data will be used? ie the purpose or Read vs Write, number of lookups, etc. Someone may know of another approach.

0. I will start several servers (each with single worker) with similar copy of the graph for accept parallel requests.
1. Each server initializes with same dataset and this dataset is READONLY all the lifetime. When i need to make changes - i successively reinitialize servers with new dataset.
2. Users request operations that discover the graph (can take a few seconds). Most operations require a lot (thousands/millions) of table lookups and can create temporary local variables (they will not be shared across client requests).

Did you try Lua Module Level variable?

e.g. in init_worker_by_lua you do something like this:

require "data"

And in your data.lua you have (you may run code to initialize this, e.g. read it from file, deserialize, lookup from db etc.)

return {

here = "is",

my = "data"

}

-- this return value gets cached automatically by Lua to package.loaded

Now in your normal code path (e.g. in content_by_lua) you just do:

local data = "" "data"

-- I need 'here' data:

local here = data.here

-- do whatever you want with that data

If you have a lot of string lookups etc. then check out this patch (it makes a huge difference!!!):

https://github.com/LuaJIT/LuaJIT/pull/174

Now, if you cannot fit this in memory, this is not a solution. Also if you hit LuaJIT memory limits, you have to use FFI and allocate your owen memory in C structs etc.

Regards

Aapo

alexeev.roman

Hello, Aapo

Did you try Lua Module Level variable?

e.g. in init_worker_by_lua you do something like this:
require "data"

And in your data.lua you have (you may run code to initialize this, e.g. read it from file, deserialize, lookup from db etc.)
return {
here = "is",
my = "data"
}

I tried this solution (https://github.com/openresty/lua-nginx-module#data-sharing-within-an-nginx-worker), it works well, but it works only if lua_code_cache=on and data initializes on first user request (in content_by_lua).

I think your solution with init_worker_by_lua is better and i will implement this.

Now, if you cannot fit this in memory, this is not a solution. Also if you hit LuaJIT memory limits, you have to use FFI and allocate your owen memory in C structs etc.

Thank you for warn me about this memory limitations. I read several articles with workarounds (for example http://bayesanalytic.com/access-extra-memory-from-lua-jit/) and will try to use FFI malloc.

I invented a way how to use arrays for lookup edges and attributes in my graph and i think this will be a very good, fast and memory-friendly solution.

agentzh

Hello!

On Fri, May 13, 2016 at 4:27 AM, Roman Alexeev wrote:
>  I tried this solution
> (https://github.com/openresty/lua-nginx-module#data-sharing-within-an-nginx-worker),
> it works well, but it works only if lua_code_cache=on and data initializes
> on first user request (in content_by_lua).
>  I think your solution with init_worker_by_lua is better and i will
> implement this.

Well, you can always require your Lua module in init_by_lua* to warm
it up early instead of in the first request. Also, lua_code_cache off
is sloooooooooooooow; do not disable the Lua code cache for production
usage.

Regards,
-agentzh