Unicode—character set of character sets
Unicode is a character set that includes the characters from all other character sets. It provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language.
Fundamentally, computers just deal with numbers. They store letters and other characters by assigning a number for each one. Before Unicode was invented, there were hundreds of different encoding systems for assigning these numbers. No single encoding could contain enough characters: for example, the European Union alone requires several different encodings to cover all its languages. Even for a single language like English no single encoding was adequate for all the letters, punctuation, and technical symbols in common use. (www.unicode.org/standard/WhatIsUnicode.html)
In practical application, Unicode is represented in a Unicode transformation format (UTF). Three such formats exist:
Format | Description | Example |
---|---|---|
UTF-8 | Each character occupies from one to four bytes. The single-byte characters are identical to seven-bit ASCII. | XML files |
UTF-16 | Most characters occupy two bytes. Some characters use surrogate pairs and thus occupy four bytes. | Internal coding in Windows programs |
UTF-32 | Each character occupies four bytes. | Internal coding in UNIX programs |
For more information about Unicode, see www.unicode.org.
The following table shows a few of the Unicode characters along with their names and their hexadecimal values in UTF-16 and UTF-8:
UTF-16 | UTF-8 | Char. | Name | UTF-16 | UTF-8 | Char. | Name | |
---|---|---|---|---|---|---|---|---|
00 20 | 20 | space | 00 A0 | C2 A0 | no-break space | |||
00 21 | 21 | ! | exclamation mark | 00 A1 | C2 A1 | ¡ | inverted exclamation mark | |
00 22 | 22 | " | quotation mark | 00 A2 | C2 A2 | ¢ | cent sign | |
00 23 | 23 | # | number sign | 00 A3 | C2 A3 | £ | pound sign | |
00 24 | 24 | $ | dollar sign | 00 A4 | C2 A4 | ¤ | currency sign | |
00 25 | 25 | % | percent sign | 00 A5 | C2 A5 | ¥ | yen sign | |
00 26 | 26 | & | ampersand | 00 A6 | C2 A6 | ¦ | broken bar or broken vertical bar | |
00 27 | 27 | ' | apostrophe or apostropohe-quote | 00 A7 | C2 A7 | § | section sign | |
00 28 | 28 | ( | left parenthesis or opening parenthesis | 00 A8 | C2 A8 | ¨ | diaeresis | |
00 29 | 29 | ) | right parenthesis or closing parenthesis | 00 A9 | C2 A9 | © | copyright sign | |
00 2A | 2A | * | asterisk | 00 AA | C2 AA | ª | feminine ordinal indicator | |
00 2B | 2B | + | plus sign | 00 AB | C2 AB | « | left-pointing double angle quotation mark or left pointing guillemet | |
00 2C | 2C | , | comma | 00 AC | C2 AC | ¬ | not sign | |
00 2D | 2D | - | hyphen-minus | 00 AD | C2 AD | | soft hyphen | |
00 2E | 2E | . | full stop or period | 00 AE | C2 AE | ® | registered sign or registered trade mark sign | |
00 2F | 2F | / | solidus (or slash or virgule) | 00 AF | C2 AF | ¯ | macron | |
00 30 | 30 | 0 | digit 0 | 00 B0 | C2 B0 | ° | ring above, degree sign | |
00 31 | 31 | 1 | digit 1 | 00 B1 | C2 B1 | ± | plus-minus sign | |
00 32 | 32 | 2 | digit 2 | 00 B2 | C2 B2 | ² | superscript two | |
00 33 | 33 | 3 | digit 3 | 00 B3 | C2 B3 | ³ | superscript three | |
00 34 | 34 | 4 | digit 4 | 00 B4 | C2 B4 | ´ | acute accent | |
00 35 | 35 | 5 | digit 5 | 00 B5 | C2 B5 | µ | micro sign | |
00 36 | 36 | 6 | digit 6 | 00 B6 | C2 B6 | ¶ | pilcrow sign or paragraph sign | |
00 37 | 37 | 7 | digit 7 | 00 B7 | C2 B7 | · | middle dot | |
00 38 | 38 | 8 | digit 8 | 00 B8 | C2 B8 | ¸ | cedilla | |
00 39 | 39 | 9 | digit 9 | 00 B9 | C2 B9 | ¹ | superscript 1 | |
00 3A | 3A | : | colon | 00 BA | C2 BA | º | masculine ordinal indicator | |
00 3B | 3B | ; | semicolon | 00 BB | C2 BB | » | right-pointing double angle quotation mark or right pointing guillemet | |
00 3C | 3C | < | less-than sign | 00 BC | C2 BC | ¼ | vulgar fraction one quarter | |
00 3D | 3D | = | equals sign | 00 BD | C2 BD | ½ | vulgar fraction one half | |
00 3E | 3E | > | greater-than sign | 00 BE | C2 BE | ¾ | vulgar fraction three quarters | |
00 3F | 3F | ? | question mark | 00 BF | C2 BF | ¿ | inverted question mark | |
00 40 | 40 | @ | commercial at | 00 C0 | C3 80 | À | Latin capital letter A with grave | |
00 41 | 41 | A | capital letter A | 00 C1 | C3 81 | Á | Latin capital letter A with acute | |
00 42 | 42 | B | capital letter B | 00 C2 | C3 82 | Â | Latin capital letter A with circumflex | |
00 43 | 43 | C | capital letter C | 00 C3 | C3 83 | Ã | Latin capital letter A with tilde | |
00 44 | 44 | D | capital letter D | 00 C4 | C3 84 | Ä | capital letter A with diaeresis | |
00 45 | 45 | E | capital letter E | 00 C5 | C3 85 | Å | Latin capital letter A with ring above | |
00 46 | 46 | F | capital letter F | 00 C6 | C3 86 | Æ | Latin capital letter AE or Latin capital ligature AE | |
00 47 | 47 | G | capital letter G | 00 C7 | C3 87 | Ç | Latin capital letter C with cedilla | |
00 48 | 48 | H | capital letter H | 00 C8 | C3 88 | È | Latin capital letter E with grave | |
00 49 | 49 | I | capital letter I | 00 C9 | C3 89 | É | Latin capital letter E with acute | |
00 4A | 4A | J | capital letter J | 00 CA | C3 8A | Ê | Latin capital letter E with circumflex | |
00 4B | 4B | K | capital letter K | 00 CB | C3 8B | Ë | Latin capital letter E with diaeresis | |
00 4C | 4C | L | capital letter L | 00 CC | C3 8C | Ì | Latin capital letter I with grave | |
00 4D | 4D | M | capital letter M | 00 CD | C3 8D | Í | Latin capital letter I with acute | |
00 4E | 4E | N | capital letter N | 00 CE | C3 8E | Î | Latin capital letter I with circumflex | |
00 4F | 4F | O | capital letter O | 00 CF | C3 8F | Ï | Latin capital letter I with diaeresis | |
00 50 | 50 | P | capital letter P | 00 D0 | C3 90 | Ð | Latin capital letter Eth | |
00 51 | 51 | Q | capital letter Q | 00 D1 | C3 91 | Ñ | Latin capital letter N with tilde | |
00 52 | 52 | R | capital letter R | 00 D2 | C3 92 | Ò | Latin capital letter O with grave | |
00 53 | 53 | S | capital letter S | 00 D3 | C3 93 | Ó | Latin capital letter O with acute | |
00 54 | 54 | T | capital letter T | 00 D4 | C3 94 | Ô | Latin capital letter O with circumflex | |
00 55 | 55 | U | capital letter U | 00 D5 | C3 95 | Õ | Latin capital letter O with tilde | |
00 56 | 56 | V | capital letter V | 00 D6 | C3 96 | Ö | Latin capital letter O with diaeresis | |
00 57 | 57 | W | capital letter W | 00 D7 | C3 97 | × | multiplication sign | |
00 58 | 58 | X | capital letter X | 00 D8 | C3 98 | Ø | Latin capital letter O with stroke | |
00 59 | 59 | Y | capital letter Y | 00 D9 | C3 99 | Ù | Latin capital letter U with grave | |
00 5A | 5A | Z | capital letter Z | 00 DA | C3 9A | Ú | Latin >capital letter U with acute | |
00 5B | 5B | [ | left square bracket or opening square bracket | 00 DB | C3 9B | Û | Latin capital letter U with circumflex | |
00 5C | 5C | \ | reverse solidus or backslash | 00 DC | C3 9C | Ü | Latin capital letter U with diaeresis | |
00 5D | 5D | ] | right square bracket or closing square bracket | 00 DD | C3 9D | Ý | Latin capital letter Y with acute | |
00 5E | 5E | ^ | circumflex accent | 00 DE | C3 9E | Þ | Latin capital letter Thorn | |
00 5F | 5F | _ | low line or spacing underscore | 00 DF | C3 9F | ß | Latin small letter sharp s | |
00 60 | 60 | ` | grave accent | 00 E0 | C3 A0 | à | Latin small letter a with | |
00 61 | 61 | a | small letter a | 00 E1 | C3 A1 | á | Latin small letter a with acute | |
00 62 | 62 | b | small letter b | 00 E2 | C3 A2 | â | Latin small letter a with circumflex | |
00 63 | 63 | c | small letter c | 00 E3 | C3 A3 | ã | Latin small letter a with tilde | |
00 64 | 64 | d | small letter d | 00 E4 | C3 A4 | ä | Latin small letter a with diaeresis | |
00 65 | 65 | e | small letter e | 00 E5 | C3 A5 | å | Latin small letter a with ring above | |
00 66 | 66 | f | small letter f | 00 E6 | C3 A6 | æ | Latin small letter ae or Latin small ligature ae | |
00 67 | 67 | g | small letter g | 00 E7 | C3 A7 | ç | Latin small letter c with cedilla | |
00 68 | 68 | h | small letter h | 00 E8 | C3 A8 | è | Latin small letter e with grave | |
00 69 | 69 | i | small letter i | 00 E9 | C3 A9 | é | Latin small letter e with acute | |
00 6A | 6A | j | small letter j | 00 EA | C3 AA | ê | Latin small letter e with circumflex | |
00 6B | 6B | k | small letter k | 00 EB | C3 AB | ë | Latin small letter e with diaeresis | |
00 6C | 6C | l | small letter l | 00 EC | C3 AC | ì | Latin small letter i with grave | |
00 6D | 6D | m | small letter m | 00 ED | C3 AD | í | Latin small letter i with acute | |
00 6E | 6E | n | small letter n | 00 EE | C3 AE | î | Latin small letter i with circumflex | |
00 6F | 6F | o | small letter o | 00 EF | C3 AF | ï | Latin small letter i with diaeresis | |
00 70 | 70 | p | small letter p | 00 F0 | C3 B0 | ð | Latin small Icelandic letter eth | |
00 71 | 71 | q | small letter q | 00 F1 | C3 B1 | ñ | Latin small letter n with tilde | |
00 72 | 72 | r | small letter r | 00 F2 | C3 B2 | ò | Latin small letter o with grave | |
00 73 | 73 | s | small letter s | 00 F3 | C3 B3 | ó | Latin small letter o with acute | |
00 74 | 74 | t | small letter t | 00 F4 | C3 B4 | ô | Latin small letter o with circumflex | |
00 75 | 75 | u | small letter u | 00 F5 | C3 B5 | õ | Latin small letter o with tilde | |
00 76 | 76 | v | small letter v | 00 F6 | C3 B6 | ö | Latin small letter o with diaeresis | |
00 77 | 77 | w | small letter w | 00 F7 | C3 B7 | ÷ | division sign | |
00 78 | 78 | x | small letter x | 00 F8 | C3 B8 | ø | Latin small letter o with stroke | |
00 79 | 79 | y | small letter y | 00 F9 | C3 B9 | ù | Latin small letter u with grave | |
00 7A | 7A | z | small letter z | 00 FA | C3 BA | ú | Latin small letter u with acute | |
00 7B | 7B | { | left curly bracket or closing curly bracket | 00 FB | C3 BB | û | small letter u with circumflex | |
00 7C | 7C | | | vertical line or vertical bar | 00 FC | C3 BC | ü | Latin small letter u with diaeresis | |
00 7D | 7D | } | right curly bracket or closing curly bracket | 00 FD | C3 BD | ý | Latin small letter y with acute | |
00 7E | 7E | ~ | tilde | 00 FE | C3 BE | þ | Latin small letter thorn | |
00 7F | 7F | [DEL] | delete (control character) | 00 FF | C3 BF | ÿ | Latin small letter y with diaeresis |