Ascii Values For Umlauts

Q: How do you match umlauts in perl?

A: By ASCII value if you need to. Besides, why wouldn't the Perl tokenizer accept extended ASCII characters?

A: Or if you want a RegularExpression to match ordinary letters and umlauts, use PosixCharacterClasses?.such as [:alpha:], [:alphanum:], [:upper:] and [:lower:].

Q: What are the AsciiValuesForUmlauts?

A: Here's a fragment that shows one way to do this. First determine the set of upper- and lower-case characters

 require 5.004;
 use locale;
 $lc = join '', grep { uc($_) ne $_ } map { chr($_) } 1..255;
 $uc = join '', grep { lc($_) ne $_ } map { chr($_) } 1..255;
next build a regular expression
 $link = "((?:[$uc][$lc]+){2,})[^$uc$lc];
then
 $text =~ m/$link/;
will match ShöüldWörk. -- DaveSmith


In MicrosoftWindows, you can type these characters by holding down the <alt> key while typing the numbers on the numeric keypad (Num Lock on). Note: these are in the ISO 8859-1 CharacterSet, which is no longer used on this Wiki as of May 2006. EditHint: The table should be updated for the server's reported UtfEight encoding. See UtfEightValuesForUmlauts.

 À = 0192 Á = 0193 Â = 0194 Ã = 0195 Ä = 0196 Å = 0197 
 Æ = 0198 Ç = 0199 
 È = 0200 É = 0201 Ê = 0202 Ë = 0203 
 Ì = 0204 Í = 0205 Î = 0206 Ï = 0207 
 Ð = 0208 Ñ = 0209 
 Ò = 0210 Ó = 0211 Ô = 0212 Õ = 0213 Ö = 0214 
 × = 0215 Ø = 0216 
 Ù = 0217 Ú = 0218 Û = 0219 Ü = 0220 
 Ý = 0221 Þ = 0222 ß = 0223 
 à = 0224 á = 0225 â = 0226 ã = 0227 ä = 0228 å = 0229 
 æ = 0230 ç = 0231 
 è = 0232 é = 0233 ê = 0234 ë = 0235 
 ì = 0236 í = 0237 î = 0238 ï = 0239 
 ð = 0240 ñ = 0241 
 ò = 0242 ó = 0243 ô = 0244 õ = 0245 ö = 0246 
 ÷ = 0247 ø = 0248 
 ù = 0249 ú = 0250 û = 0251 ü = 0252 
 ý = 0253 þ = 0254 ÿ = 0255


It might be more stable if Wiki sent out HTML elements, &aacute; &auml; etc


Here's fun: find an AlphabetThatUsesYumlaut


CategoryTable


EditText of this page (last edited July 11, 2006) or FindPage with title or text search