This seems to be the hardest for me to understand so I'm guessing this is one of the most talked about topics. What I'm trying to do is find the last occurrence of ", " in a string. This is my last try:
local ows = "Marie (Patsy), Peter (Trapdoor), Tom (Blackmagé)"
print (ows:sub(ows:find(", [%w%s()]-$")))
and it doesn't work. if I change the "é" to a "e" it's fine. So I'm thinking I'm going about it all the wrong way. Would anyone have any suggestions how I can look for the last ", "?
If all you want is the stuff after the last comma in a comma separated list:
local function helper(...)
local last = ""
for i = 1, select("#", ...) do
last = select(i, ...)
end
return strtrim(last)
end
local ows = "Marie (Patsy), Peter (Trapdoor), Tom (Blackmagé)"
local last = helper(strsplit(",", ows))
print(last)
This seems to be the hardest for me to understand so I'm guessing this is one of the most talked about topics. What I'm trying to do is find the last occurrence of ", " in a string. This is my last try:
local ows = "Marie (Patsy), Peter (Trapdoor), Tom (Blackmagé)"
print (ows:sub(ows:find(", [%w%s()]-$")))
and it doesn't work. if I change the "é" to a "e" it's fine. So I'm thinking I'm going about it all the wrong way. Would anyone have any suggestions how I can look for the last ", "?
Hal
local ows = "Marie (Patsy), Peter (Trapdoor), Tom (Blackmagé)"
print(ows:find(",[^,]-$"))
Special tokens like %w don't "work" in WoW. You'd have to use a more literal expression like "[A-Za-z]" to go this route.
Things like %w depend on the locale/language in use, which for WoW is hardcoded to be American English, more or less.
I'm not sure, but I seem to have problems with accented characters in other areas. Trying to make just the first letter a capital with the rest lower case has causes me grief when the first letter is accented. Would you think this is related?
local function helper(...)
local i = select("#", ...)
return select(i, ...)
end
local ows = "Marie (Patsy), Peter (Trapdoor), Tom (Blackmagé)"
local last = helper(strsplit(",", ows))
print(last)
no need to iterate all the variables just to get the last one.
With respect to names with special characters. They are a PITA.
Here is the code I use in Prat to create the patterns to match them either as "foo" or "Foo":
(Thanks to Mikk, Arrowmaster, and Xinhuan for hacking it up with me one morning)
[php]MULTIBYTE_FIRST_CHAR = "^([\192-\255]?%a?[\128-\191]*)"
function GetNamePattern(name)
local u = name:match(MULTIBYTE_FIRST_CHAR):upper()
if not u or u:len() == 0 then Prat.Print("GetNamePattern: name error", name) return end
local l = u:lower()
local namepat
if u == l then
namepat = name:lower()
elseif u:len() == 1 then
namepat = "["..u..l.."]"..name:sub(2):lower()
elseif u:len() > 1 then
namepat = ""
for i=1,u:len() do
namepat = namepat .. "[" .. u:sub(i,i)..l:sub(i,i).."]"
end
namepat = namepat .. name:sub(u:len()+1)
end
return "%f[%a\192-\255]"..namepat.."%f[^%a\128-\255]"
end
AnyNamePattern = "%f[%a\192-\255]([%a\192-\255]+)%f[^%a\128-\255]"[/php]
If you happen to use Prat. You can call print out the result of Prat.GetNamePattern("Somename")
To capitalize the first letter in a name:
[php]MULTIBYTE_FIRST_CHAR = "^([\192-\255]?%a?[\128-\191]*)"
name = string.gsub(name, MULTIBYTE_FIRST_CHAR, string.upper, 1)[/php]
local function helper(...)
local i = select("#", ...)
return select(i, ...)
end
local ows = "Marie (Patsy), Peter (Trapdoor), Tom (Blackmagé)"
local last = helper(strsplit(",", ows))
print(last)
no need to itterate all the variables just to get the last one.
IIRC, there is a library around to deal with UTF-8 strings in WoW since its LUA implementation is not configured to handle it properly. (Found it in the old wowace svn.)
BTW, Blizzard own code sometimes fails to take this into account, like in battleground grouped join/left messages. :|
Small reminder, Blizzard has patched string.upper and string.lower to be UTF-8 aware. But nothing that uses string pattern matching is UTF-8 aware, just string.upper and string.lower.
So there is no need for utf8lib anymore if you only use it for upper/lower?
If you are only going to string.upper the first character, you still need a pattern/utf8lib to tell what the first char is as the first char might really be 3 bytes long.
4 bytes is max for a character in UTF-8, not sure what actually uses 4 bytes though. The 26 letter English alphabet uses 1 byte, all the additional accented characters used by other languages use 2 bytes, Cyrillic uses 2 bytes, CJK (Chinese Japanese Korean) use 3 bytes.
Unicode characters U+0 to U+7F (127) are encoded in 1 byte.
Unicode characters U+80 to U+7FF (2047) are encoded in 2 bytes.
Unicode characters U+800 to U+FFFF (65535) are encoded in 3 bytes.
Unicode characters U+10000 to U+10FFFF are encoded in 4 bytes.
The encoding accepts more (up to 0x1FFFFF) in 4 bytes, but Unicode has no characters above U+10FFFF. All characters in the BMP (Basic Multilingual Plane) are encoded in at most 3 bytes.
Per the Unicode standard, the Basic Multilingual Plane (BMP, or Plane 0) contains the common-use characters for all the modern scripts of the world as well as many historical and rare characters. By far the majority of all Unicode characters for almost all textual data can be found in the BMP.
Plane 1, or the SMP (Supplementary Multilingual Plane) is dedicated to the encoding of lesser-used historic scripts, special-purpose invented scripts, and special notational systems, which either could not be fit into the BMP or would see very infrequent usage.
Example of characters in Plane 1 includes Music symbols, a lot of archaic languages, lots of supplementary mathematical symbols, Mahjong and Domino tiles too.
For instance, U+1D11E is MUSICAL SYMBOL G CLEF, or ????, and is encoded in UTF-8 like this : "\240\157\132\158".
Edit: I give up. The forum can't handle this unicode character :-)
The following pattern is giving me a headache lately: I would like to match single quotes preceded by whitespace or none (start of string) followed by a string of any characters and finishing with again a single quote followed by whitespace or none. So basically I am trying to capture the longest possible chain between quotes even if they are at the start or at the end of a string while skipping [['s ]]. Something like:
local str = [[1st Kvaldir Vessel 'The Serpent's Maw']]
str = str:match("[ ^.]'.+'[ ^.]")
print(str) ---> 'The Serpent's Maw'
Yet sets like [ ^.] don't work, anchoring with ^ and $ is no good obviously and frontier patterns did not help much either. Any suggestions are greatly appreciated!
This seems to be the hardest for me to understand so I'm guessing this is one of the most talked about topics. What I'm trying to do is find the last occurrence of ", " in a string. This is my last try:
and it doesn't work. if I change the "é" to a "e" it's fine. So I'm thinking I'm going about it all the wrong way. Would anyone have any suggestions how I can look for the last ", "?
Hal
Things like %w depend on the locale/language in use, which for WoW is hardcoded to be American English, more or less.
like this
the [^,] is the complement set to [,] so it will find everything that [,] would not match to
local ows = "Marie (Patsy), Peter (Trapdoor), Tom (Blackmagé)"
print(ows:find(",[^,]-$"))
will print 32, 49. 32 is the number you want.
I'm not sure, but I seem to have problems with accented characters in other areas. Trying to make just the first letter a capital with the rest lower case has causes me grief when the first letter is accented. Would you think this is related?
Here is the code I use in Prat to create the patterns to match them either as "foo" or "Foo":
(Thanks to Mikk, Arrowmaster, and Xinhuan for hacking it up with me one morning)
[php]MULTIBYTE_FIRST_CHAR = "^([\192-\255]?%a?[\128-\191]*)"
function GetNamePattern(name)
local u = name:match(MULTIBYTE_FIRST_CHAR):upper()
if not u or u:len() == 0 then Prat.Print("GetNamePattern: name error", name) return end
local l = u:lower()
local namepat
if u == l then
namepat = name:lower()
elseif u:len() == 1 then
namepat = "["..u..l.."]"..name:sub(2):lower()
elseif u:len() > 1 then
namepat = ""
for i=1,u:len() do
namepat = namepat .. "[" .. u:sub(i,i)..l:sub(i,i).."]"
end
namepat = namepat .. name:sub(u:len()+1)
end
return "%f[%a\192-\255]"..namepat.."%f[^%a\128-\255]"
end
AnyNamePattern = "%f[%a\192-\255]([%a\192-\255]+)%f[^%a\128-\255]"[/php]
If you happen to use Prat. You can call print out the result of Prat.GetNamePattern("Somename")
To capitalize the first letter in a name:
[php]MULTIBYTE_FIRST_CHAR = "^([\192-\255]?%a?[\128-\191]*)"
name = string.gsub(name, MULTIBYTE_FIRST_CHAR, string.upper, 1)[/php]
Why didn't I think of that?!?
You spelled "iterate" wrong, by the way. :p
Now that's handy. I may end up stealing, er, "leveraging", that bit of code the next time I have to deal with player names.
BTW, Blizzard own code sometimes fails to take this into account, like in battleground grouped join/left messages. :|
If you are only going to string.upper the first character, you still need a pattern/utf8lib to tell what the first char is as the first char might really be 3 bytes long.
Unicode characters U+80 to U+7FF (2047) are encoded in 2 bytes.
Unicode characters U+800 to U+FFFF (65535) are encoded in 3 bytes.
Unicode characters U+10000 to U+10FFFF are encoded in 4 bytes.
The encoding accepts more (up to 0x1FFFFF) in 4 bytes, but Unicode has no characters above U+10FFFF. All characters in the BMP (Basic Multilingual Plane) are encoded in at most 3 bytes.
Per the Unicode standard, the Basic Multilingual Plane (BMP, or Plane 0) contains the common-use characters for all the modern scripts of the world as well as many historical and rare characters. By far the majority of all Unicode characters for almost all textual data can be found in the BMP.
Plane 1, or the SMP (Supplementary Multilingual Plane) is dedicated to the encoding of lesser-used historic scripts, special-purpose invented scripts, and special notational systems, which either could not be fit into the BMP or would see very infrequent usage.
Example of characters in Plane 1 includes Music symbols, a lot of archaic languages, lots of supplementary mathematical symbols, Mahjong and Domino tiles too.
For instance, U+1D11E is MUSICAL SYMBOL G CLEF, or ????, and is encoded in UTF-8 like this : "\240\157\132\158".
Edit: I give up. The forum can't handle this unicode character :-)
Yet sets like [ ^.] don't work, anchoring with ^ and $ is no good obviously and frontier patterns did not help much either. Any suggestions are greatly appreciated!