Re: [openresty-en] Odd Behaviour executing string functions in a tight loop

agentzh · 2015-09-25T20:54:37+00:00
two questions:1 should we pay attention to the possible lost in valgrind report of nginx + lua2 the report below mean illegal &#xA0...
Re: [openresty-en] Odd Behaviour executing string functions in a tight loop

agentzh

Hello!

On Thu, Sep 24, 2015 at 2:05 AM, bjoe2k4 wrote:
> when i execute this code in a location via content_by_lua_file, then i get
> the following results:
>
> string_find:                                                 0.002s
> string_match:                                                0.684s
>

The main difference here is that the string.find() builtin function
can be JIT compiled if the pattern is treated as a plain string [1]
since LuaJIT v2.1. string.match(), on the other hand, is a NYI [2],
which cannot be JIT compiled (yet).

It's easiest to see by using the jit.v and/or jit.dump modules shipped
with LuaJIT. For example, we can add the following lines of Lua code
to the beginning of you Lua file:

    local jit_v = require "jit.v"
    jit_v.on("/dev/stderr")

Then running your sample Lua code with the "resty" command-line
utility (or the full-blown OpenResty nginx) produces the following
output from the LuaJIT JIT compiler:

    $ resty a.lua
    [TRACE   1 a.lua:21 loop]
    string_find:                                                 0.002s
    [TRACE --- (1/0) a.lua:22 -- NYI: FastFunc string.match at a.lua:51]
    [TRACE --- (1/0) a.lua:22 -- NYI: FastFunc string.match at a.lua:51]
    [TRACE --- (1/0) a.lua:22 -- NYI: FastFunc string.match at a.lua:51]
    [TRACE --- (1/0) a.lua:22 -- NYI: FastFunc string.match at a.lua:51]
    [TRACE   2 (1/0) a.lua:22 -- fallback to interpreter]
    [TRACE --- a.lua:50 -- NYI: FastFunc string.match at a.lua:51]
    [TRACE --- a.lua:50 -- NYI: FastFunc string.match at a.lua:51]
    [TRACE --- a.lua:50 -- NYI: FastFunc string.match at a.lua:51]
    [TRACE --- a.lua:50 -- NYI: FastFunc string.match at a.lua:51]
    [TRACE --- a.lua:50 -- NYI: FastFunc string.match at a.lua:51]
    [TRACE --- a.lua:50 -- NYI: FastFunc string.match at a.lua:51]
    [TRACE --- a.lua:50 -- NYI: FastFunc string.match at a.lua:51]
    [TRACE --- a.lua:50 -- NYI: FastFunc string.match at a.lua:51]
    [TRACE --- a.lua:50 -- NYI: FastFunc string.match at a.lua:51]
    [TRACE --- a.lua:50 -- NYI: FastFunc string.match at a.lua:51]
    [TRACE --- a.lua:50 -- NYI: FastFunc string.match at a.lua:51]
    string_match:                                                0.269s

It's obvious that the string.find() is JIT'd, producing the trace #1
while the string.match() frequently aborts the JIT compilation and
falls back to the (slower) LuaJIT interpreter.

Alternatively, we can also see these in the on-CPU flamegraphs:

1) on-CPU flame graph for the case running string.find:

    http://agentzh.org/misc/flamegraph/2015-09-25/jit-string-find.svg

Here we have no meaningful backtraces in bottom of the graph, which
means they are initiated by JITted code [3].

Another interesting thing worth noting here is that we also have no
visible frames on the lj_str_find() C function that implements
string.find(). This means that we are *not* really testing the
performance of string.find() here! What's happening? Let's check out
the actual IR code [4] generated by the LuaJIT JIT compiler:

    ---- TRACE 1 IR
    0001 >  int SLOAD  #7    CRI
    0002 >  int LE     0001  +2147483646
    0003    int SLOAD  #6    CI
    0004 >  fun SLOAD  #3    T
    0005 >  str SLOAD  #4    T
    0006 >  fun EQ     0004  a.lua:45
    0007    int FLOAD  0005  str.len
    0008    p32 STRREF 0005  +0
    0012    p32 CALLN  lj_str_find  (0008 [0x41f18748] 0007 +1  )
    0013 >  p32 NE     0012  [NULL]
    0016  + int ADD    0003  +1
    0017 >  int LE     0016  0001
    0018 ------ LOOP ------------
    0019  + int ADD    0016  +1
    0020 >  int LE     0019  0001
    0021    int PHI    0016  0019

Okay, there is a lj_str_find C function call, but it is placed out of
the loop! Yeah, the JIT compiler of LuaJIT is definitely smart enough
to note this call produces constant results here [5].

So, your naive benchmark code is not testing what it is supposed to
test! It's really testing an empty loop here! This is why there's so
many misleading benchmark case out there in the wild [6].

2) on-CPU flame graph for the case running string.match:

    http://agentzh.org/misc/flamegraph/2015-09-25/interp-string-match.svg

So almost all the samples in the graph have complete backtraces
(starting from the C main function). Also, we see all the code frames
in the LuaJIT interpreter too, like lj_BC_FUNCC, which is for
interpreting LuaJIT bytecode instructions.

Interpreted code is definitely slower than compiled code. Another hot
spot is the lj_str_new function call, which is for creating new Lua
strings. This is due to the fact that string.match() needs to return a
(new) Lua string object. Creating a Lua string is definitely more
expensive than directly using a primitive unboxed Lua numbers (which
are not GC objects at all).

At this point, we can make sense of the benchmark results perfectly by
using the right tools (on-CPU flame graphs and LuaJIT's JIT compiler's
trace dump).

> However, when i reverse the two 'bench' functions (or reload the previous
> page with 'lua_code_cache on;') i get the following:
>
> string_match:                                                0.632s
> string_find:                                                 2.416s
>

This cannot be explained without using the same set of tools to
analyze either. So let me elaborate the details below.