Working with Character Sets

From Documentation
Jump to: navigation, search

Working with Character Sets

In the past, most computers used the same character set to represent upper and lowercase English language letters, number characters, and punctuation characters. This character set is known as ASCII. However, ASCII is a very limited character set, unable to support a variety of alphabets. To accommodate computer users worldwide, different character sets were developed. These new character sets are often identified by a number, such as code page 850 or ISO-8859-1.

Character sets are composed of code points, which are the numbers assigned to characters that the computer uses to identify the character. For example, in ASCII, when you type a capital A, the computer sees its code point, which is the number 65; if you type a B, the computer sees a 66. Both the code page 850 and the ISO-8859-1 character sets include accented characters, but 850 uses the code point 130 for the character é, and 8859-1 uses 233 for the same character. To eliminate this confusion, an effort is underway to create a universal character set that includes every character from every language. This character set is called Unicode.

Characters display correctly in NetObjects Fusion because they are stored in Unicode. When NetObjects Fusion publishes or previews a Web page, it converts the text from Unicode to the character set selected for the website or for the individual Web page.

Suppose you type Greek characters on a Web page, set the page character set to Western European (ISO-8859-1), and preview the Web page. Because their particular code points do not have equivalents in the Western European character set, the Greek characters may appear as question marks. If you want to guarantee that the Greek characters on the NetObjects Fusion Web page display correctly when you preview or publish, you should choose a character set that includes Greek characters. This character set is then inserted in the charset parameter in the generated HTML META tag, which tells the browser how to interpret and display the characters.

If you have a Web page that contains languages that use different character sets, for example, English on the right and Greek on the left, to guarantee that all characters will be interpreted correctly by the browser, you can use Unicode (UTF-8) or two-byte Unicode (UCS-2) as a character set for the Web page. Remember that Unicode is evolving; it is not complete yet, but it does include code points for most characters in languages commonly used on computers today.

Available Character Sets

The following character sets are included with NetObjects Fusion:

Baltic (CP-1257) Central European (ISO-8859-2) Central European (Windows-1250)
Chinese Simplified (GB2312) Chinese Traditional (BIG5) Cyrillic (ISO-8859-5)
Cyrillic (KO18-R) Cyrillic (Windows-1251) Greek (ISO-8859-7)
Greek (Windows-1253) Japanese (EUC-JP) Japanese (ISO-2022-JP)
Japanese (SHIFT_JIS) Korean (KSC5601) Turkish (ISO-8859-9)
Turkish (Windows-1254) Unicode (UCS-2) Unicode (UTF-7)
Unicode (UTF-8) Western European (CP437) Western European (CP850)
Western European (ISO-8859-1) Western European (ISO-8859-15) Western European (Windows-1252)

The following character sets are not included with NetObjects Fusion, but are supported once installed in your operating system. You can download language kits at www.microsoft.com or install them from your Windows installation disks.

Western European (CP-437) Western European (CP-850) Central European (CP-852)
Cyrillic (CP-866) Greek (CP-869) Greek (CP-737)
Cyrillic (KO18-R) Cyrillic (Windows-1251) Greek (ISO-8859-7)
Turkish (CP-857)

Setting the Site's Character Set

  1. In any view, from the Tools menu, choose Options > Current Site.
  2. On the General tab of the Current Site Options dialog, select a character set from the Character set drop-down list.

To ensure that characters display properly:

  1. Load a localized operating system. For example, for a Cyrillic site, you must use a Cyrillic OS
  2. Set the site's character set.
  3. Change the font for each SiteStyle element on the Graphics tab. This includes banner, buttons, and so on.
  4. Publish the site to see the correct characters in NetObjects Fusion and the browser.

If you type characters that are not included in the selected character set, when you preview or publish the page, you might see question marks in place of unknown characters.

Setting the Character Set for an Individual Page

You can override the default website character set and choose a different character set for individual Web pages.

  1. In Page or Site view, right-click on the page and select Page Character Set from the shortcut menu.
  2. Select a character set for the page from the Page Character Set drop-down. This character set will be applied only to this Web page.

Setting the Character Set for a Section

  1. Go to Site view and select a section.
  2. Right-click the parent page of the section and select Section Character Set from the shortcut menu.
  3. Select a character set for the section from the Section Character Set drop-down.

URL and Page Name Character Requirements

The characters in a URL are limited to those represented in lower ASCII, which includes uppercase and lowercase English letters, numbers, and common English punctuation. You cannot use accented characters or other special characters in a URL.

You can, however, name pages using accented characters. The names on the banner and buttons display in the correct international form. In Publish view, however, file names, page names, and folders change to names with unaccented characters.

For Japanese, NetObjects Fusion uses the ASCII characters that are normally converted to a Japanese character. For other Asian languages, a numeric file name is assigned.