Understanding Unicode

The computers understand everything by numbers. Each character is represented as a number, which is finally drawn as a character in the screens.It has been a major problem for the legacy systems to write programs for languages other than English. The primary reason being the non-availability of enough characters in ASCII encoding. So obviously Internationalization of applications becomes a big issue.

ASCII and CodePage mechanism:
ASCII format had one byte or 8 bits for each character. This means that, it can have 2^8 or 256 different characters. So if a program is to be written in a different language, the entire character set is to be replaced with a different one. Windows initially had a scheme called CodePage. For each language, it had a different codepage. If it needs its version of Windows in chinese, then it will use the Chinese Code page. The problem here is that, at a time it can support only one language. So if a person in Europe connects to US server, he'll see only English characters and vice-versa.

Multi-Byte Character Set or Double-Byte Character Set:
One solution proposed for the above problem was to have a multiple byte character set. In this schema, a character might be represented as a single byte or a double byte. If it is a double byte schema, the lead byte will have the information about its double byte status. So the applications have to check the lead byte status always. The VC++ provides an API "isleadbyte(int c)" to check if a character is a lead byte.

Unicode:
Finally all big companies have joined together and decided to invent a new strategy for this issue. A new character encoding scheme was deduced with 16 bits. Now this 16 bit character set can support 2^16 or 65536 characters. This standards of Unicode are hosted at Unicode. Although original goal of this unicode consortium was to produce a 16 bit encoding standard, it produced 3 different standards.
UTF-8: This is a 8 bit encoding standard. The advantage in this schema is that the unicode characters in/transformed into UTF-8 are compatible with the existing softwares.
UTF-16: This is the original planned standard using 16-bit characters.
UTF-32: This is used where memory is not a constraint.

All 3 forms of data can be transformed into one another without any loss of data. All of them use a common repertoire of characters.
Note:
Windows NT/2000/XP use unicode as their character set. So even if a program uses data in ASCII, it internally gets converted to unicode, processed, reconverted to ASCII and returned.
Most of the times some programs will need conversions from MBCS/DBCS to unicode. If anybody needs to learn the conversion procedures, please follow the link at Microsoft MSDN. You can get all the information you need about this(ofcourse, if the page is not moved to a different location).

Infact a common goal expected to be achieved out of the whole effort is to gain internationalization. But having a common character set solves only a part of the whole issue. The other issues like Date, Time, Numbers, Currencies and conventions also among other things to be taken care of.

Close    To Top
  • Prev Article-Programming:
  • Next Article-Programming:
  • Now: Tutorial for Web and Software Design > Programming > cplus > Programming Content
    Photoshop Tutorial
     

    Special Effect

      3D Effect
      Photoshop Articles
    Programming Tutorial
     

    C/C++ Tutorial

      Visual Basic
      C# Tutorial
    Database Tutorial
     

    MySQL Tutorial

      MS SQL Tutorial
      Oracle Tutorial
    Geek Tutorial
     

    Blogging Tutorial

      RSS Tutorial
      Podcasting Tutorial
    Graphic Design Tutorial
      Coreldraw Tutorial
      Illustrator Tutorial
      3D Tutorials
    Webmaster Articles
     

    Domain Service

      Web Hosting
      Site Promotion
    Java Tutorial/ Articles
     

    Java Servlets

      JavaEE Tutorial
     

    JavaBeans Tutorial

    XML Tutorial/ Articles
     

    XML Style

      AJAX Tutorial
      XML Mobile
    Flash Tutorial/ Articles
     

    Flash Video

      Action Script
      Flash Articles
    OS Tutorial/ Articles
      Linux Tutorial
      Symbian Tutorial
      MacOS Tutorial
    Personal Tech
      Hardware Tutorial
      Software Tutorial
      Online Auction