HTML Parser In Lua

alexl · 2016-04-28T00:09:53+00:00

Stephane <dissimus...@gmail.com> wrote: > I have recently started working with Kubernetes, openResty and redis. I > have a kubernetes cluster en...

HTML Parser In Lua

alexl

Hello again :)

I have a question regarding HTML parser. I need to get a first-level tags from the page, nested inside the <head>. Meaning that if the page looks like this:

<html>
<head>
<title>Welcome to nginx!</title>
<someTag>
<WRONGTAG>
This tag should be ignored
</WRONGTAG>
</someTag>
<style>
body {width: 35em;}
</style>
</head>
<body>
</body>
</html>

I would like to somehow get an array containing <title>, </title>, <someTag>, </someTag>, <style>, </style> only.

Is there an efficient way of doing this?

For now what I do is I string.sub the <head> section of the page, string.gmatch(headOnly, '(<[^<>]*>)') over every tag, only adding first-level tags to a table.

It works, but I'm yet to do some load testing and I'm afraid(among other stuff) that this is very inefficient.

Any ideas? Pattern-Matching can help?

aapo.talvensaari

On Wednesday, 27 April 2016 19:09:53 UTC+3, al...@chameleonx.com wrote:

I have a question regarding HTML parser.

I would use this:

https://github.com/craigbarnes/lua-gumbo