Home What's New
User Manual
|
Chapter 8. Unicode Support
What is Unicode
From Unicode.org: "Computers ... store letters and other characters by assigning
a number for each one. Before Unicode was invented, there were hundreds of different encoding systems
for assigning these numbers. No single encoding could contain enough characters...
Unicode provides a unique number for every character, no matter what the platform,
no matter what the program, no matter what the language."
For example, the basic Latin letter "A" has the code Hex 0041 (65),
the Russian letter has the code Hex 0416 (1046),
and the Chinese character has the code Hex 32A5 (12965).
For more information on Unicode, visit www.unicode.org.
Unicode characters in the range Hex 0000 to 007F are encoded simply
as bytes 00 to 7F. This means that files and strings which contain only
7-bit ASCII characters have the same encoding under both ASCII and UTF-8.
Therefore, the Unicode 0041 ("A") in UTF-8 is Hex 41.
Unicode characters in the range Hex 0080 to 07FF
are encoded as a sequence of two bytes: 110xxxxx 10xxxxxx (The xxx bit positions
are filled with the bits of the character code number in binary representation.)
For example, the Unicode 0416 (),
or Binary 00000100 00010110, is encoded as 11010000 10010110,
or Hex D0 96.
Unicode characters in the range Hex 0800 to FFFF are encoded
as a sequence of three bytes: 1110xxxx 10xxxxxx 10xxxxxx.
For example the Unicode 32A5 () is
encoded as Hex E3 8A A5.
If you anticipate using Unicode characters in text data or the names
of files you are uploading, you should instruct your browser
to POST all the information in the UTF-8 format. This is done by including the
following tag in the header of your page:
<HEAD>
On the AspUpload side, you must enable UTF-8 translation by setting the property
Upload.CodePage to 65001 (a Win32-defined value for CP_UTF8):
The Upload.CodePage property can also be set to valid code page values such as 1251 (Cyrillic),
1255 (Hebrew), 1256 (Arabic), etc. Every time the CodePage property
is set, AspUpload will attempt to translate the text data
and file names into Unicode using the specified code page
by invoking the Win32 function MultiByteToWideChar.
The code samples unicode.asp and unicode_upload.asp
demonstrate AspUpload's Unicode support. Both files are shown here:
unicode.asp
<h3>File and Text Items</h3>
unicode_upload.asp
' Enable UTF-8 translation
Files:<BR>
<P>
Note that this script uses Server.HTMLEncode on file names and text items.
This converts Unicode strings to a format understandable by
a browser, such as Персиц.
Click the link below to run this code sample:
http://localhost/aspupload/08_unicode/unicode.asp
Copyright © 1998 - 2001 Persits Software, Inc. All Rights Reserved AspUpload® is a registered trademark of Persits Software, Inc. Questions? Comments? Write us! |