Q: How do you match umlauts in perl?
A: By ASCII value if you need to. Besides, why wouldn't the Perl tokenizer accept extended ASCII characters?
A: Or if you want a RegularExpression to match ordinary letters and umlauts, use PosixCharacterClasses?.such as [:alpha:], [:alphanum:], [:upper:] and [:lower:].
Q: What are the AsciiValuesForUmlauts?
A: Here's a fragment that shows one way to do this. First determine the set of upper- and lower-case characters
require 5.004; use locale; $lc = join '', grep { uc($_) ne $_ } map { chr($_) } 1..255; $uc = join '', grep { lc($_) ne $_ } map { chr($_) } 1..255;next build a regular expression
$link = "((?:[$uc][$lc]+){2,})[^$uc$lc];then
$text =~ m/$link/;will match ShöüldWörk. -- DaveSmith
In MicrosoftWindows, you can type these characters by holding down the <alt> key while typing the numbers on the numeric keypad (Num Lock on). Note: these are in the ISO 8859-1 CharacterSet, which is no longer used on this Wiki as of May 2006. EditHint: The table should be updated for the server's reported UtfEight encoding. See UtfEightValuesForUmlauts.
À = 0192 Á = 0193 Â = 0194 Ã = 0195 Ä = 0196 Å = 0197 Æ = 0198 Ç = 0199 È = 0200 É = 0201 Ê = 0202 Ë = 0203 Ì = 0204 Í = 0205 Î = 0206 Ï = 0207 Ð = 0208 Ñ = 0209 Ò = 0210 Ó = 0211 Ô = 0212 Õ = 0213 Ö = 0214 × = 0215 Ø = 0216 Ù = 0217 Ú = 0218 Û = 0219 Ü = 0220 Ý = 0221 Þ = 0222 ß = 0223 à = 0224 á = 0225 â = 0226 ã = 0227 ä = 0228 å = 0229 æ = 0230 ç = 0231 è = 0232 é = 0233 ê = 0234 ë = 0235 ì = 0236 í = 0237 î = 0238 ï = 0239 ð = 0240 ñ = 0241 ò = 0242 ó = 0243 ô = 0244 õ = 0245 ö = 0246 ÷ = 0247 ø = 0248 ù = 0249 ú = 0250 û = 0251 ü = 0252 ý = 0253 þ = 0254 ÿ = 0255
It might be more stable if Wiki sent out HTML elements, á ä etc
Here's fun: find an AlphabetThatUsesYumlaut