B
boy

  • Apr 7, 2020
  • Joined Apr 3, 2020
  • Environment
    Openresty1.15, luajit use openresty branch(2020/01), and the problem still exists when we use the newest version(https://github.com/openresty/luajit2)

    Description of the problem
    We use two url(only this two continuous url can cause error ) to access our server(http),server will occur a error with great probability such as "attempt to perform arithmetic on local ‘id’(a boolean value" )

    Code analysis
    session is a table which define in a lua module(can be considered as a global variable), and business is a table too. the type of session.business['id'] is number and the value is 0 at the begin of initialization.
    session.lua:
    function _M.GetBusinessinfo(key)
    if key then
    return self.business[key]
    else
    return self.business
    end
    end
    rewrite.lua:
    local id = session:GetBusinessInfo('id')
    local test = id + 1

    Error occurs at "id + 1". we print some information as following, and the type of "id" is boolen and the value is true.
    ngx.log(ngx.ERR,"type:",type(id),",value",id)

    Then we print information by another mode as following:
    for k, v in pairs(session.bussine) do
    info = info .. '[' .. tostring(k) .. ']' .. tostring(v)
    end
    ngx.log(ngx.ERR,info)

    We find that the type of "session.business['id']" is number and the value is 0, which is correct.

    In summary, looping table is correct, but getting value from table by session.business['id'] is wrong.

    Then we packet this two code line to a function in a lua module as following, error does not occur.

    session.lua:
    function _M:AddSliceId()
    local id = self.business['id']
    if id then
    self.business['id'] = id + 1
    end
    end
    rewrite.lua:
    session:AddSliceId()_

    Gdb analysis(error has happened)
    a. session's memory is as following:
    lval (TValue)0x40031477b240
    ...
    key:
    string: "business"(len 8 )
    value:
    table: (TValue)0x4003169120f8
    ...

    Continue to print 0x4003169120f8
    lval (TValue*)0x4003169120f8
    ...
    key:
    string: "id"(len 2)
    value:
    number 0

    It prove that the table is currect in memory.

    b. ltrace
    ltrace
    Found 51 traces

    then loop the result, we can find GetBusinessinfo:
    ltrace 31
    (GCtrace*) 0x40035009f3d0
    machine code size:1524
    machine code start addr:0x400008cef020
    machine code end addr: 0x40000cef614
    @/path/session.lua:323

    And part of the code of session.lua is as following:
    session.lua:
    323:function _M.GetBusinessinfo(key)
    324: if key then
    325: return self.business[key]
    326: else
    327: return self.bussine
    328: end
    329:end

    So we can be sure that jit compile mode(not interpreter mode) result in this error.
    Then we close the jit as following at the begin of code:

    if jit then
    jit.off();jit.flush()
    end

    The error does not occur.

    Above is our analysis, who can help me to resolve the problem, thank you.

      1. 环境
        openresty1.15,jit使用的是openresty维护的分支(2020年1月份的),硬件是arm64
      2. 问题
        几百台设备,只有几台设备的某个进程会持续报错,attempt to perform arithmetic on local ‘id’(a boolean value),代码的第二行报的错
      3. 代码行(session是一个lua module定义的可以认为全局变量的table,business是session的一个table)
        local id = session.business['id']
        local test = id + 1
      4. gdb分析
        a. 通过gdb分析,session的内存如下:
        lval (TValue)0x40031477b240
        ...
        key:
        string: "business"(len 8 )
        value:
        table: (TValue
        )0x4003169120f8
        ...
        b. 继续打印0x4003169120f8的内容
        lval (TValue*)0x4003169120f8
        ...
        key:
        string: "id"(len 2)
        value:
        number 0
        c. 上面的内存信息,是在报错的时候,打印的(gdb挂断点),也就是说business table里面id存的value还是一个number类型的0,但是取出来的时候,却变成了一个bool的true(几百台设备,只有几台出问题,而且是某个进程,一旦出问题后,会一直报错)