On Wed, Nov 13, 2013 at 12:47 AM, Yichun Zhang (agentzh)
<age...@gmail.com> wrote:
> Hello!
>
> On Tue, Nov 12, 2013 at 4:51 AM, Rohit Yadav wrote:
>>
>> The issue is with decoding the string at Python end; both the user
>> agent and referrer url strings are not UTF8 encoded as expected, so I
>> tried to use ISO-8859-1 which worked for some URLs but broke for other
>> encoding. What is the default encoding of these strings and how can we
>> force them to be UTF8 in the json encoded string at Lua end?
>
> You didn't mention what JSON library you're using on the Lua side.
Oh, yes that was indeed lua-cjson.
>
> Assuming you're using lua-cjson, you should always feed valid UTF-8
> strings into cjson.encode() yourself. The lua-cjson library does not
> do any encoding conversion here. Also, lua-cjson does not bother
> escaping multi-byte UTF-8 character sequences. See lua-cjson's manual
> for more details:
>
> http://www.kyne.com.au/~mark/software/lua-cjson-manual.html
>
>> Looks
>> like someone else also got such an issue [1].
>>
>> [1] http://forum.nginx.org/read.php?2,243106,243106#msg-243106
>>
>
> Nginx (and ngx_lua) never tries to make sense of the character
> encoding in any input data nor does any encoding conversions unless
> you explicitly configure them to do it. They just offer the raw
> (binary) data from the wire by default. It is up to you to make sense
> of the character encoding actually used in the binary strings.
Thanks for the information, I was confused if nginx forces any default
encoding as some of url strings had the utf8 encoding while others
were ISO-8859-1. Good this clears up.
Regards.
>
> Regards,
> -agentzh.