Hello again :)
I have a question regarding HTML parser. I need to get a first-level tags from the page, nested inside the <head>. Meaning that if the page looks like this:
<html>
<head>
<title>Welcome to nginx!</title>
<someTag>
<WRONGTAG>
This tag should be ignored
</WRONGTAG>
</someTag>
<style>
body {width: 35em;}
</style>
</head>
<body>
</body>
</html>
I would like to somehow get an array containing <title>, </title>, <someTag>, </someTag>, <style>, </style> only.
Is there an efficient way of doing this?
For now what I do is I string.sub the <head> section of the page, string.gmatch(headOnly, '(<[^<>]*>)') over every tag, only adding first-level tags to a table.
It works, but I'm yet to do some load testing and I'm afraid(among other stuff) that this is very inefficient.
Any ideas? Pattern-Matching can help?