A Technical Overview of ISO/IEC 14651:2016 — International String Ordering and Comparison (CAN/CSA‑ISO/IEC 14651‑18)

From the Common Template Table to Tailorable Collation: A Guide for Implementers and Standards Compliance

Consistent and predictable ordering of character strings is a cornerstone of internationalized software, database systems, and electronic data interchange. Without a universally accepted method, sorting multilingual text can produce contradictory results across platforms, undermining user trust and data integrity. ISO/IEC 14651:2016 — Information technology — International string ordering and comparison — Method for comparing character strings and description of the common template tailorable ordering — provides the authoritative framework for achieving global consistency in string comparison. This standard has been adopted in many jurisdictions, including Canada as CAN/CSA‑ISO/IEC 14651‑18, which is identical to the international edition.

Scope of the Standard

ISO/IEC 14651 defines a universal method for comparing character strings based on a multi‑level weighting scheme. Its primary goal is to enable deterministic ordering of text in any language that can be represented by the Universal Coded Character Set (UCS), which is specified in ISO/IEC 10646 (and which is essentially identical to the Unicode standard). The standard applies to all characters in the UCS, including letters, digits, symbols, and ideographic characters, covering the vast majority of scripts used worldwide.

The core deliverable of ISO/IEC 14651 is the Common Template Table (CTT), a reference table of collation weights that serves as a neutral, language‑independent starting point. Because no single ordering can satisfy every linguistic or cultural requirement, the standard expressly permits tailoring — the modification of the CTT to meet the needs of a specific locale (e.g., Swedish, Japanese, or Slovak). The standard also specifies the conformance requirements for both minimal and enhanced implementations.

Technical Requirements and Collation Model

Multi‑Level Comparison

String comparison in ISO/IEC 14651 proceeds through a series of independent weighting levels. At each level, characters (or sequences of characters) are compared using their assigned weights from the CTT or a tailored variant. The levels are evaluated in order: a difference at an earlier level determines the final order, and later levels are consulted only when the earlier levels produce a tie.

Level Name Basis of Comparison Example of Distinction
L1 Base letters Alphabetic or syllabic base characters (ignores case, diacritics, punctuation) “abc” equals “ABC”
L2 Diacritics Accents and diacritical marks “côte” precedes “coté”
L3 Case Uppercase vs. lowercase distinctions “a” precedes “A” (or vice‑versa, depending on locale)
L4 Punctuation & special Punctuation marks, symbols, and other supplementary characteristics “a‑b” precedes “a b” (hyphen vs. space)

Each character in the string is assigned a four‑level weight tuple (or a sequence of tuples for complex scripts). The comparison engine consumes these weights level by level, character by character, effectively producing a numeric ordering that can be lexicographically compared.

Common Template Table (CTT)

The CTT is the heart of ISO/IEC 14651. It provides a minimal, widely acceptable set of collation weights for every character in the UCS based on the alphabetic order of the English alphabet, supplemented with diacritic and case ordering that follows common European conventions. The table itself is composed of a set of entries, each containing the character code followed by its collation code (a sequence of one or more unsigned integer values) that define the four levels. The CTT is deliberately simple so that it can be tailored to meet more specific linguistic rules.

Tailoring

Tailoring is the process of deriving a locale‑specific collation from the CTT. For example, in Swedish the letter Å should order after Z, whereas in German Ö is often treated as a variant of O. Common tailoring operations include:

  • Reordering characters (e.g., moving accented letters to a different position).
  • Changing the relative weights of diacritics or case.
  • Adding contractions (e.g., Spanish “ch” treated as a single letter).
  • Ignoring certain punctuation for specific applications.

Tailored collations that claim conformance to ISO/IEC 14651 must still adhere to the general structure of the multi‑level comparison method and must document the deviations from the CTT.

Implementation Tip: When implementing the CTT in software, store the weight tuples in a compressed array indexed by character code. Many production libraries pre‑compute tailored versions using a rule file that modifies the CTT at load time. Use the latest version of the standard (including any amendments) to ensure full support for newly added Unicode characters.

Implementation Highlights

Integration into Software Systems

ISO/IEC 14651 forms the basis for string comparison in numerous operating systems (e.g., Linux’s glibc locale framework) and relational databases (e.g., PostgreSQL’s ICU collation support). The standard is often realized through the Unicode Collation Algorithm (UCA) defined in Unicode Technical Standard #10, which is fully aligned with ISO/IEC 14651. Therefore, any implementation that conforms to the UCA with a common tailoring will also conform to ISO/IEC 14651.

Performance and Storage

Computing collation keys for strings can be computationally intensive. Implementers often precompute sort keys for persistent storage, particularly in databases. The total size of a collation key is proportional to the cumulative number of weight tuples across all characters; for typical Latin‑script strings this is roughly 4–8 times the string length. Optimizations include using 16‑bit or 8‑bit compression for each level when the weights fit within a smaller integer range.

Caution: Tailoring introduces the risk of inconsistency if two systems use different modifications of the CTT for the same locale. Always document the precise tailoring rules and, whenever possible, use standardized locale data repositories such as the Common Locale Data Repository (CLDR).

Compliance and Adoption

CAN/CSA‑ISO/IEC 14651‑18 is the Canadian adoption of the international standard, published by the Canadian Standards Association (CSA). This document is identical to ISO/IEC 14651:2016 and is an important resource for software vendors and organizations operating within Canada that require a domestically recognized version of the standard for procurement or regulatory purposes. Similar adoptions exist in other countries under their national standards bodies.

Conformance to ISO/IEC 14651 can be claimed at two levels:

  • Baseline conformance — a system that compares strings according to the CTT without any tailoring.
  • Tailored conformance — a system that uses a tailored ordering but follows the multi‑level method and documents the differences from the CTT.

Because the standard is aligned with the Unicode Collation Algorithm, many existing collation implementations—such as those in the ICU library (International Components for Unicode)—already meet the technical requirements. Compliance testing typically involves comparing sort orders against the reference examples provided in the standard’s annexes.

Why It Matters: Adopting ISO/IEC 14651 ensures that multilingual data exchanged between applications and jurisdictions will order consistently, facilitating easier data merging, reporting, and search. Its wide acceptance in industry and government reduces the friction of internationalization.
Common Pitfall: Never implement a custom sorting routine based on raw Unicode code point values. Such ordering is rarely linguistically correct and will almost certainly fail conformance tests. Always rely on a standard collation engine that implements ISO/IEC 14651 or the equivalent UCA.

Frequently Asked Questions

Q: How does ISO/IEC 14651 relate to the Unicode Collation Algorithm (UCA)?
A: ISO/IEC 14651 and the UCA (Unicode Technical Standard #10) specify essentially the same collation method. The UCA is more detailed and contains the actual Default Unicode Collation Element Table (DUCET), which is a concrete implementation of the CTT. Any UCA‑compliant implementation also conforms to ISO/IEC 14651, provided the weights used are aligned with the CTT.
Q: What is the Common Template Table, and how often is it updated?
A: The Common Template Table is the reference weight table provided in the standard. It is updated in harmony with new editions of ISO/IEC 10646 and Unicode. The current edition (ISO/IEC 14651:2016 plus its amendments) covers characters up to Unicode 8.0. Newer versions of the UCA extend coverage further, and these updates are typically incorporated into the next revision of the international standard.
Q: Can I tailor the collation for a language that is not yet supported?
A: Yes. The standard is designed to be tailorable. You can create a custom collation for any language by modifying the CTT, as long as you follow the multi‑level comparison rules and document the deviations. It is recommended to contribute such tailoring to the Common Locale Data Repository (CLDR) to promote interoperability.
Q: Is ISO/IEC 14651 relevant for legacy character sets that are not Unicode?
A: The standard is defined in terms of characters from ISO/IEC 10646 (UCS). For other encoding schemes, the characters must first be mapped to the corresponding UCS code points before applying the comparison method. The collation weights themselves are defined for UCS characters only.


Technical Article – 2026. This content is provided for informational purposes and does not replace the official text of ISO/IEC 14651:2016 or CAN/CSA‑ISO/IEC 14651‑18.

📥 Standard Documents Download

🔒
Please wait 10 seconds, the download links will appear after the ad loads

Leave a Reply

Your email address will not be published. Required fields are marked *