You're right that each match will generate new strings for the color code and pipes, but because identical strings are re-used in Lua's string pool, this technique doesn't generate nearly as much garbage. It works especially well when the number of unique color codes you might encounter is limited, such as with syntax highlighted tokens.
Lua still has to generate them, create hashes, test the string table for existing hashes and call the callback function. String operations are known to be expensive operations in Lua (compared to other simple operations), not in memory but in CPU time. In comparison even four chained gsubs are more efficient.
Also, you can do this easier, if you are going to use callback functions: You can "fake" branches and get a much simpler function like this
print(str:gsub("|([|cr])(%x?%x?%x?%x?%x?%x?%x?%x?)",function(c, arg)
if c == "r" then
return arg;
elseif c == "c" and #arg == 8 then
return "";
end
end));
I had already thought about such a solution while answering Iroared, but expected a lot worse performance -- Or rather for gsub to perform better. Pretty sure it handles all border cases although I have not tested it thoroughly since, as it is, it takes 15-30% longer to parse input than my current 4 gsubs (yeah, I've profiled it). It works because the "|" are consumed and you don't call gsub a 2nd time.
Of course this also won't work with the more complex escape sequences unless you manage to come up with a monster pattern like Phanx' that covers everything and then test that again in the callback. (I would like to see that code though, should be fun, you'd have to go recursive for link text -- I think.)
If there was a simple replacement function without patterns in Lua though, there would be no challenge ;)
I'm not aware of any API function to remove escape sequences, but it would be trivial to do it yourself:
...
Thanks alot!
I simplify your example and use it like that:
local function unescape(String)
local Result = tostring(String)
Result = gsub(Result, "|c........", "") -- Remove color start.
Result = gsub(Result, "|r", "") -- Remove color end.
Result = gsub(Result, "|H.-|h(.-)|h", "%1") -- Remove links.
Result = gsub(Result, "|T.-|t", "") -- Remove textures.
Result = gsub(Result, "{.-}", "") -- Remove raid target icons.
return Result
end
Also, you can do this easier, if you are going to use callback functions: You can "fake" branches and get a much simpler function like this
I had already thought about such a solution while answering Iroared, but expected a lot worse performance -- Or rather for gsub to perform better. Pretty sure it handles all border cases although I have not tested it thoroughly since, as it is, it takes 15-30% longer to parse input than my current 4 gsubs (yeah, I've profiled it). It works because the "|" are consumed and you don't call gsub a 2nd time.
Of course this also won't work with the more complex escape sequences unless you manage to come up with a monster pattern like Phanx' that covers everything and then test that again in the callback. (I would like to see that code though, should be fun, you'd have to go recursive for link text -- I think.)
If there was a simple replacement function without patterns in Lua though, there would be no challenge ;)
Thanks alot!
I simplify your example and use it like that: