Hello!
On Thu, Oct 29, 2015 at 3:41 AM, Thibault Charbonnier wrote:
> However, in an openresty context, each request being sandboxed and handled
> by its own coroutine, it seems to me there is no way to persist that data,
Well, you can share data per worker process via Lua module level data,
https://github.com/openresty/lua-nginx-module#data-sharing-within-an-nginx-worker
or with per nginx server instance across all its workers:
https://github.com/openresty/lua-nginx-module#lua_shared_dict
> 1. I don't know how this would affect performance: on each query the driver
> would now have to retrieve a socket from the connection pool (or create
> one), make sure the remote node is responding, and if it is not, access the
> cluster informations from the shared dictionary (this is where I am unsure
> about performance degradation), and test nodes one by one (naive
> implementation of what the other Cassandra drivers are doing).
>
Shm accesses are usually VERY fast unless you have huge amount of
entries in an individual store and/or you have to do (expensive)
serialization and deserialization before/after accessing the shm store
all the time.
Shm accesses usually involve no context switches, no I/O operations,
no syscalls (cold shm zones might introduce page faults in the very
beginning, but that's transparent to userland and is another story).
Anyway, you can always benchmark things yourself (using lua-resty-core
can boost performance a lot by actually JITting the ngx.shared.DICT's
method calls).
> 2. The cluster's information is now eventually vulnerable from user code,
> which could access the shared memory zone (even call 'flush_all()' and
> destroy it all).
>
Well, this one is easy. Just in your own Lua module, do something like
this in the top level scope code:
local store = ngx.shared.my_store
ngx.shared.my_store = nil
It's a bit hacky but it should work (we might encapsulate this up with
a proper API call in ngx_lua, maybe).
> 3. (minor) Since the shared memory zone only stores string values, would it
> be preferable to dump a table representation of all the cluster's data in a
> single key and serialize/deserialize it or store multiple keys with smaller
> chunks of data. Not sure what is more performant here (again, not sure of
> the impact of calling the shared memory zone).
>
If you have to store complicated data structures as whole values, it's
recommended to use lua-resty-lrucache which is a VM-level cache that
supports arbitrarily complicated Lua values (nested tables, and even
Lua function objects). Depending on the use cases and requirements, it
can be combined with lua_shared_dict.
Generally, if you can avoid serialization/deserialization, then avoid
it, since it's inevitably expensive. For example, the
lua-resty-upstream-healthcheck library uses keys with smaller chunks
of data:
https://github.com/openresty/lua-resty-upstream-healthcheck
Another interesting (but a bit dangerous) way is to store raw C data
structs via FFI cdata values in lua_shared_dict, which is used by the
lua-resty-limit-traffic library:
https://github.com/openresty/lua-resty-limit-traffic
Regards,
-agentzh