Hello!
On Tue, Jul 26, 2016 at 8:02 PM, Shivakumar Gokaram wrote:
> We are using shared dictionary to store large amounts of data. The data set
> size is about 20 - 25 M records with the file size containing all the
> records = 2.5GB.
> When i load the data from this file ( Key / Value) NGINX seem to occupy a
> lot of memory ( duff / cache) approx 10GB. I have checked the code and
> ensure there is no duplicate entries added etc.
>
Because there are many different memory metrics in a modern *NIX
system, what kind of memory metric are you referring to? It is virtual
memory size (VIRT) or residence size (RSS)?
Your latest private email seems to provide more useful details, please
allow me to quote it here:
> I want to store a file with key / value pairs and the file size is about 3.1GB ( output of ls-lah).
> The file contains 69M lines of the format key,val.
> I am reading this file in lua and storing into nginx shared dictionary. I am facing No Memory error when
> i set the LUA Dictionary size to anything less <12GB. It works fine if the size is set of dic 12GB+.
> This is almost 4x the size of my data file.
Okay, this paragraph contains much more detailed information. So we
can do the math here:
3.1GB file with 69M lines of key-value pairs per line, which means
each key-value pair is about 48.2 bytes on average. Excluding the
comma and trailing new line, then it should be about 46 bytes each for
the real data.
You said that lua_shared_dict requires 12GB of room to accommodate all
these 69M key-value pairs, which means for each key-value pair should
need 186.74 bytes on average. Let's subtract out the true data size,
46 bytes, then we have an average per-pair memory overhead of about
140 bytes.
The true memory overhead (including red-black tree, expiration queues
and timestamps, key length field and value length field) for each
key-value pair in lua_shared_dict is currently 76 bytes on a typical
64-bit Linux system, plus a little memory overhead of the nginx slab
allocator (which should be very small for each pair). So what you're
seeing is 1.8x as much as the real overhead. But it's still not wildly
divergent. Not sure if it's calculation error or anything else.
Oh, BTW, I've just rearranged the lua_shared_dict node struct layout
and saves 8 byte overhead for each key-value pair on x86_64 systems at
least:
https://github.com/openresty/lua-nginx-module/commit/da08f59a
So now it's just 68 bytes per key-value pair. You can try this patch
out though do not expect to see a dramatic difference, obviously ;)
Regards,
-agentzh