Hello!
On Sun, Jul 27, 2014 at 10:58 PM, rvsw wrote:
> One of the TOD items for the sregex library at
> https://github.com/openresty/sregex is to add support for UTF 8. Is there a
> timeline for this feature?
I wonder what particular UTF-8 regex features are you interested in?
For literal UTF-8 byte sequence match, it already works, for example,
replace_filter '你好' 'hello';
> If not, what is the proposed strategy to
> implement this e.g. do we use libraries like libiconv to convert UTF8 into
> ascii (for english language ). Or will there be native code which will
> support other character sets as well.
For full UTF-8 support, like making "." matching a UTF-8 char instead
of a single octet, or those Unicode groups like "\p{Han}", you need to
change the sregex engine directly. Third party libraries like libiconv
will not really be helpful here.
If you just want to convert the response body data stream from one
char encoding to another, then you can just use the ngx_iconv module:
https://github.com/calio/iconv-nginx-module
Regards,
-agentzh