Unicode--character set of character sets

Unicode—character set of character sets

Unicode is a character set that includes the characters from all other character sets. It provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language.

Fundamentally, computers just deal with numbers. They store letters and other characters by assigning a number for each one. Before Unicode was invented, there were hundreds of different encoding systems for assigning these numbers. No single encoding could contain enough characters: for example, the European Union alone requires several different encodings to cover all its languages. Even for a single language like English no single encoding was adequate for all the letters, punctuation, and technical symbols in common use. (www.unicode.org/standard/WhatIsUnicode.html)

In practical application, Unicode is represented in a Unicode transformation format (UTF). Three such formats exist:

Format Description Example
UTF-8 Each character occupies from one to four bytes. The single-byte characters are identical to seven-bit ASCII. XML files
UTF-16 Most characters occupy two bytes. Some characters use surrogate pairs and thus occupy four bytes. Internal coding in Windows programs
UTF-32 Each character occupies four bytes. Internal coding in UNIX programs

For more information about Unicode, see www.unicode.org.

The following table shows a few of the Unicode characters along with their names and their hexadecimal values in UTF-16 and UTF-8:

UTF-16UTF-8Char.Name   UTF-16UTF-8Char.Name
00 2020 space 00 A0C2 A0 no-break space
00 2121! exclamation mark 00 A1C2 A1¡inverted exclamation mark
00 2222" quotation mark 00 A2C2 A2¢ cent sign
00 2323# number sign 00 A3C2 A3£pound sign
00 2424$ dollar sign 00 A4C2 A4¤currency sign
00 2525% percent sign 00 A5C2 A5¥yen sign
00 2626& ampersand 00 A6C2 A6¦broken bar or
broken vertical bar
00 2727' apostrophe or
apostropohe-quote
00 A7C2 A7§section sign
00 2828( left parenthesis or
opening parenthesis
00 A8C2 A8¨diaeresis
00 2929) right parenthesis or
closing parenthesis
00 A9C2 A9©copyright sign
00 2A2A* asterisk 00 AAC2 AAªfeminine ordinal indicator
00 2B2B+ plus sign 00 ABC2 AB«left-pointing double angle quotation mark
or left pointing guillemet
00 2C2C, comma 00 ACC2 AC¬not sign
00 2D2D- hyphen-minus 00 ADC2 AD­soft hyphen
00 2E2E. full stop or
period
00 AEC2 AE®registered sign or
registered trade mark sign
00 2F2F/ solidus (or slash or virgule) 00 AFC2 AF¯macron
00 30300 digit 0 00 B0C2 B0°ring above, degree sign
00 31311 digit 1 00 B1C2 B1±plus-minus sign
00 32322 digit 2 00 B2C2 B2²superscript two
00 33333 digit 3 00 B3C2 B3³superscript three
00 34344 digit 4 00 B4C2 B4´acute accent
00 35355 digit 5 00 B5C2 B5µmicro sign
00 36366 digit 6 00 B6C2 B6pilcrow sign or
paragraph sign
00 37377 digit 7 00 B7C2 B7·middle dot
00 38388 digit 8 00 B8C2 B8¸cedilla
00 39399 digit 9 00 B9C2 B9¹superscript 1
00 3A3A: colon 00 BAC2 BAºmasculine ordinal indicator
00 3B3B; semicolon 00 BBC2 BB»right-pointing double angle quotation mark
or right pointing guillemet
00 3C3C< less-than sign 00 BCC2 BC¼vulgar fraction one quarter
00 3D3D= equals sign 00 BDC2 BD½vulgar fraction one half
00 3E3E> greater-than sign 00 BEC2 BE¾vulgar fraction three quarters
00 3F3F? question mark 00 BFC2 BF¿inverted question mark
00 4040@ commercial at 00 C0C3 80ÀLatin capital letter A with grave
00 4141A capital letter A 00 C1C3 81ÁLatin capital letter A with acute
00 4242B capital letter B 00 C2C3 82ÂLatin capital letter A with circumflex
00 4343C capital letter C 00 C3C3 83ÃLatin capital letter A with tilde
00 4444D capital letter D 00 C4C3 84Äcapital letter A with diaeresis
00 4545E capital letter E 00 C5C3 85ÅLatin capital letter A with ring above
00 4646F capital letter F 00 C6C3 86ÆLatin capital letter AE or
Latin capital ligature AE
00 4747G capital letter G 00 C7C3 87ÇLatin capital letter C with cedilla
00 4848H capital letter H 00 C8C3 88ÈLatin capital letter E with grave
00 4949I capital letter I 00 C9C3 89ÉLatin capital letter E with acute
00 4A4AJ capital letter J 00 CAC3 8AÊLatin capital letter E with circumflex
00 4B4BK capital letter K 00 CBC3 8BËLatin capital letter E with diaeresis
00 4C4CL capital letter L 00 CCC3 8CÌLatin capital letter I with grave
00 4D4DM capital letter M 00 CDC3 8DÍLatin capital letter I with acute
00 4E4EN capital letter N 00 CEC3 8EÎLatin capital letter I with circumflex
00 4F4FO capital letter O 00 CFC3 8FÏLatin capital letter I with diaeresis
00 5050P capital letter P 00 D0C3 90ÐLatin capital letter Eth
00 5151Q capital letter Q 00 D1C3 91ÑLatin capital letter N with tilde
00 5252R capital letter R 00 D2C3 92ÒLatin capital letter O with grave
00 5353S capital letter S 00 D3C3 93ÓLatin capital letter O with acute
00 5454T capital letter T 00 D4C3 94ÔLatin capital letter O with circumflex
00 5555U capital letter U 00 D5C3 95ÕLatin capital letter O with tilde
00 5656V capital letter V 00 D6C3 96ÖLatin capital letter O with diaeresis
00 5757W capital letter W 00 D7C3 97×multiplication sign
00 5858X capital letter X 00 D8C3 98ØLatin capital letter O with stroke
00 5959Y capital letter Y 00 D9C3 99ÙLatin capital letter U with grave
00 5A5AZ capital letter Z 00 DAC3 9AÚLatin >capital letter U with acute
00 5B5B[ left square bracket or
opening square bracket
00 DBC3 9BÛLatin capital letter U with circumflex
00 5C5C\ reverse solidus or
backslash
00 DCC3 9CÜLatin capital letter U with diaeresis
00 5D5D] right square bracket or
closing square bracket
00 DDC3 9DÝLatin capital letter Y with acute
00 5E5E^ circumflex accent 00 DEC3 9EÞ Latin capital letter Thorn
00 5F5F_ low line or
spacing underscore
00 DFC3 9FßLatin small letter sharp s
00 6060` grave accent 00 E0C3 A0à Latin small letter a with
00 6161a small letter a 00 E1C3 A1áLatin small letter a with acute
00 6262b small letter b 00 E2C3 A2âLatin small letter a with circumflex
00 6363c small letter c 00 E3C3 A3ãLatin small letter a with tilde
00 6464d small letter d 00 E4C3 A4äLatin small letter a with diaeresis
00 6565e small letter e 00 E5C3 A5åLatin small letter a with ring above
00 6666f small letter f 00 E6C3 A6æLatin small letter ae or
Latin small ligature ae
00 6767g small letter g 00 E7C3 A7ç Latin small letter c with cedilla
00 6868h small letter h 00 E8C3 A8èLatin small letter e with grave
00 6969i small letter i 00 E9C3 A9éLatin small letter e with acute
00 6A6Aj small letter j 00 EAC3 AAêLatin small letter e with circumflex
00 6B6Bk small letter k 00 EBC3 ABëLatin small letter e with diaeresis
00 6C6Cl small letter l 00 ECC3 ACìLatin small letter i with grave
00 6D6Dm small letter m 00 EDC3 ADíLatin small letter i with acute
00 6E6En small letter n 00 EEC3 AEîLatin small letter i with circumflex
00 6F6Fo small letter o 00 EFC3 AFïLatin small letter i with diaeresis
00 7070p small letter p 00 F0C3 B0ðLatin small Icelandic letter eth
00 7171q small letter q 00 F1C3 B1ñLatin small letter n with tilde
00 7272r small letter r 00 F2C3 B2òLatin small letter o with grave
00 7373s small letter s 00 F3C3 B3óLatin small letter o with acute
00 7474t small letter t 00 F4C3 B4ôLatin small letter o with circumflex
00 7575u small letter u 00 F5C3 B5õLatin small letter o with tilde
00 7676v small letter v 00 F6C3 B6öLatin small letter o with diaeresis
00 7777w small letter w 00 F7C3 B7÷division sign
00 7878x small letter x 00 F8C3 B8ø Latin small letter o with stroke
00 7979y small letter y 00 F9C3 B9ùLatin small letter u with grave
00 7A7Az small letter z 00 FAC3 BAúLatin small letter u with acute
00 7B7B{ left curly bracket or
closing curly bracket
00 FBC3 BBûsmall letter u with circumflex
00 7C7C| vertical line or
vertical bar
00 FCC3 BCüLatin small letter u with diaeresis
00 7D7D} right curly bracket or
closing curly bracket
00 FDC3 BDýLatin small letter y with acute
00 7E7E~ tilde 00 FEC3 BEþLatin small letter thorn
00 7F7F[DEL]delete (control character) 00 FFC3 BFÿLatin small letter y with diaeresis

Make a free website with Yola