On Wednesday, February 19, 2014 10:13:35 PM UTC+2, agentzh wrote:
I was actually thinking about writing a JSON library in pure Lua ;)
Oh, I hear you. More in lines of this:
https://github.com/fperrad/lua-MessagePack/blob/master/src/MessagePack.lua
Aka reimplementation / fork of the C-lib. This is something where I'm a total newbie.
It seems it is a hell lot more work than what I did with simply interfacing C-lib with FFI
from Lua. While I agree that it is a way to go when you need to get last bits of
performance, I'm not sure it is practical/pragmatic. What I mean is that you end up
forking the project (C-lib), and when doing that, updating your code gets more difficult
(vs. just updating the lib, and making small adjustments on Lua side if needed). Pure Lua is
optimal from deployment, and installation perspective, but you also lose a lot while
you are not able to directly use tried, and true C-libs.
For core features, that I think JSON fits (in web development stack), I can see the good
sides of Pure Lua implementation. If I think OpenResty, and its progress I think that bringing
lua-resty-cjson type of libs that interface with the C-libs, could push the project farther much
quicker. And we could always go back when needed to update some of the most performance
critical parts of the libs to pure Lua.
Btw. Have you seen this:
https://github.com/umegaya/ffiex
I think that is something OpenResty could look when implementing package
manegement (i.e. supplying C-code, as sources with e.g. lua-resty-cjson, and compiling
it with ffiex.csrc (on runtime?).
For 1, we should use LuaJIT 2.1's table.new() to preallocate table
space (because table auto grow introduces new allocations and GC
overhead!). Also we should use table.clear() for recycling Lua tables
(for the decoder) if applicable. Also, recycle internal buffers
wherever possible and avoid unnecessary intermediate data structures.
Ok, I will look at this. I see, that I could use cJSON_GetArraySize() to get narr or nrec sizes (is narr number of array elements, and nrec number of hash keys?). But whatabout table.clear (the sizes could be very different for each structure, and substructure?).
For 2, we also need a stronger JIT compiler in LuaJIT. My employer,
CloudFlare, is going to sponsor more features in the JIT compiler. The
most interesting ones for a fast JSON library (and alike) are
1. JIT compile hash table iteration loops (i.e., "for k, v in
pairs(tb) do ... end")
2. a string.buffer API (for both efficient input stream parsing and
output stream).
Sounds great.
In addition, I think it is good to have a JIT-able table.emptyhash()
primitive that returns a boolean values indicating whether a hash
table part of a Lua table is empty. This will outperform your
"is_array" function a lot.
Yes, I would be more than happy to drop that is_array -function.
Another thing that could be the bottleneck here is stringifying Lua
numbers to strings because stringifying floating point numbers are
complicated and what makes it even worse is that LuaJIT currently uses
libc's sprintf() to do this conversion, which is horribly slow.
Converting Lua numbers (or anything other than strings) to strings is only
done for non-string keys in js objects (which should be rare). Everything else
is done in C-code (value, and string encodings) with no conversion on Lua side.
I think it'd be a fun exercise to profile your lua-resty-cjson library
Let's see if I can get some time to try these.
Regards
Aapo