UniClinic : Internationalization and Unicode Tutorial

XenCraft: Your Source for International and Unicode Training

This tutorial, created by Tex Texin and Richard Gillam, two leaders in software and Web internationalization, is available to be presented at your site. UniClinic can be customized specifically for your organization and its development environment.

AGENDA: Internationalization and Unicode Tutorial

Networking and Objective Setting
Attendees will introduce themselves and state their goals in attending the tutorial. Speakers will introduce themselves and review objectives, customizations, and logistics for the tutorial.

Introduction- What are the business drivers for internationalization?

  • Business Without Borders
  • Opportunities Internationally
  • Opportunities on the Web
  • Business and Economic Forces at Work
  • ROI

Technological drivers for Unicode and Internationalization

  • In Software applications
  • On the World Wide Web
  • Multilingual applications

World Tour: Regional Customs Affecting Software Design and Implementation and Efficient Solutions

  • Graphics
  • Data Formats (Calendars, Dates, Times, Numbers, Currency, Addresses, etc.)
  • Linguistic Software Requirements (Externalization, Argument Substitution, Text expansion, word order, Collation, etc)
  • Rendering, Fonts, Writing directions (Bidirectional Vertical)
  • Input methods

Writing Systems Around the World
A survey of languages and writing systems including ideographic, bidirectional, and complex scripts. (e.g. Chinese, Japanese, Korean, Thai, Indic, Hebrew, Arabic, and others.)

Models of Character Encoding

  • Character Sets and Character Encodings- What are they, What problems do they create?
  • Unicode and its Repertoire
  • Character-Glyph Model
  • Combining Characters
  • Unicode Encoding Model and it's encodings - Scalar Values, CEF, CES, UTF-8, UTF-16, Surrogates, UTF-32, BOM, etc.
  • Character properties (alphabetic, numeric, direction, case, etc.)

Design Decisions

  • Choosing the right UTF-n
  • Migration to Unicode- programming changes for Unicode-enabling
  • Transcoding- Converting legacy encodings to Unicode
  • Typical problems with encoding conversions
  • Characters that look alike- How to choose the right character

Unicode Algorithms - Part I

  • Bidirectional Algorithm
  • Line-Breaking
  • Regular Expressions and Unicode

Unicode Algorithms - Part II

  • UCA- Unicode Collation Algorithm
  • Tailoring collations
  • Canonical Forms and Normalization
  • When is normalization required or important?
  • Choosing a normalization form
  • Private Use Area, Gaiji Characters
  • Unicode compression
  • Comparing compression approaches
  • Working in small spaces: Efficient storage for Unicode tables

Migration Techniques

  • Migration tools
  • Estimating migration to Unicode projects
  • Unicode footprint requirements (disk, memory, etc.)
  • Unicode and Databases (data types, field widths, indexes, queries, collation, database drivers, etc.)
  • Multilingual text processing and issues

Unicode on the Wire

  • Protocols and Standards on the internet and the Web (e-mail, URLs, etc.) HTTP, IRI, IDN, Mail (MIME)
  • HTML, XML, XHTML
  • Encoding declarations and encoding negotiation
  • Unicode versus Markup
  • Reference Processing model

Unicode in Programming Languages

  • identifiers
  • parsers
  • SQL
  • Java
  • C/C++/
  • C#
  • Perl
  • Debugging Tips, tools

Localization with Unicode
Tools, Globalization Management Systems (GMS), translation memory supporting Unicode

Unicode and Real World issues

  • Surrogates on Windows
  • GB18030
  • Oracle, SQL Server
  • Security