Hello Openresty friends,
My name is Balinder and I am working as a NGINX systems engineer & Lua programmer at News Corp. Our Redirect Router written a few years ago by another chap is super optimised Web Redirect tool for us. It uses Redis hashes to store redirect rules and every EC2 instances with 1GB RAM and 1CPU is running a slave redis to read the redirect rules. Admin adds the rule in redis and NGINX / Lua reads it and deals with it. It can be proxy, redirect, geoip, whitelisting / blocking up range etc. is all done in openresty.
For example http://website1.com/a/b/c rule might be redirected (HTTP 301) to http://website2.com/ using these rules.
We have one issue with the production servers. Some of the machines runs on 100% all the times. We have six machines and traffic is roughly 300requests/sec but it varies. We don't think traffic is that much for highly optimised system and 100% CPU is running due to a loop or something which is not exiting out of loop. For example the single process below log using gdp shows:
#0 0x00007fb48cd0a68c in lj_tab_get (L=L@entry=0x405d6510, t=t@entry=0x41a0f9a0, key=key@entry=0x7fff8521c200) at lj_tab.c:427
#1 0x00007fb48cd0b92b in lj_meta_tget (L=0x405d6510, o=0x405f2d30, k=0x7fff8521c200) at lj_meta.c:142
#2 0x00007fb48cd05583 in lj_vmeta_tgetv () from /usr/local/openresty/luajit/lib/libluajit-5.1.so.2
its doing something with a table..
that's not helping very much but its a start.
I really want to find the culprit loop and find out why it is NOT exiting. We know it is a single process because that worker process never comes out of loop. The lj_tab_get function seems to be calling a while loop but it has MAXACTION of 1000. We don't have loop in our lua code either. So do you have any idea why single process will gets it's knickers in the twist. I really would like to thank you in advance for any insights in this issue. I am cc'ing the function below which is what seems to be called when system is running at 100% CPU
|
function lj_tab_get(t, key) { |
| MAXACTION = 1000 |
| node = hashgcref(t, key) |
| node_key = $node->key->gcr->gcptr32 |
| find_count = 0 |
| |
| while (node_key != key) { |
| node = $node->next->ptr32 |
| node_key = $node->key->gcr->gcptr32 |
| find_count++ |
| if (find_count > MAXACTION) { |
| return 0 |
| } |
| } |
| return node |
| }
|
|