Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Consistent and predictable ordering of character strings is a cornerstone of internationalized software, database systems, and electronic data interchange. Without a universally accepted method, sorting multilingual text can produce contradictory results across platforms, undermining user trust and data integrity. ISO/IEC 14651:2016 — Information technology — International string ordering and comparison — Method for comparing character strings and description of the common template tailorable ordering — provides the authoritative framework for achieving global consistency in string comparison. This standard has been adopted in many jurisdictions, including Canada as CAN/CSA‑ISO/IEC 14651‑18, which is identical to the international edition.
ISO/IEC 14651 defines a universal method for comparing character strings based on a multi‑level weighting scheme. Its primary goal is to enable deterministic ordering of text in any language that can be represented by the Universal Coded Character Set (UCS), which is specified in ISO/IEC 10646 (and which is essentially identical to the Unicode standard). The standard applies to all characters in the UCS, including letters, digits, symbols, and ideographic characters, covering the vast majority of scripts used worldwide.
The core deliverable of ISO/IEC 14651 is the Common Template Table (CTT), a reference table of collation weights that serves as a neutral, language‑independent starting point. Because no single ordering can satisfy every linguistic or cultural requirement, the standard expressly permits tailoring — the modification of the CTT to meet the needs of a specific locale (e.g., Swedish, Japanese, or Slovak). The standard also specifies the conformance requirements for both minimal and enhanced implementations.
String comparison in ISO/IEC 14651 proceeds through a series of independent weighting levels. At each level, characters (or sequences of characters) are compared using their assigned weights from the CTT or a tailored variant. The levels are evaluated in order: a difference at an earlier level determines the final order, and later levels are consulted only when the earlier levels produce a tie.
| Level | Name | Basis of Comparison | Example of Distinction |
|---|---|---|---|
| L1 | Base letters | Alphabetic or syllabic base characters (ignores case, diacritics, punctuation) | “abc” equals “ABC” |
| L2 | Diacritics | Accents and diacritical marks | “côte” precedes “coté” |
| L3 | Case | Uppercase vs. lowercase distinctions | “a” precedes “A” (or vice‑versa, depending on locale) |
| L4 | Punctuation & special | Punctuation marks, symbols, and other supplementary characteristics | “a‑b” precedes “a b” (hyphen vs. space) |
Each character in the string is assigned a four‑level weight tuple (or a sequence of tuples for complex scripts). The comparison engine consumes these weights level by level, character by character, effectively producing a numeric ordering that can be lexicographically compared.
The CTT is the heart of ISO/IEC 14651. It provides a minimal, widely acceptable set of collation weights for every character in the UCS based on the alphabetic order of the English alphabet, supplemented with diacritic and case ordering that follows common European conventions. The table itself is composed of a set of entries, each containing the character code followed by its collation code (a sequence of one or more unsigned integer values) that define the four levels. The CTT is deliberately simple so that it can be tailored to meet more specific linguistic rules.
Tailoring is the process of deriving a locale‑specific collation from the CTT. For example, in Swedish the letter Å should order after Z, whereas in German Ö is often treated as a variant of O. Common tailoring operations include:
Tailored collations that claim conformance to ISO/IEC 14651 must still adhere to the general structure of the multi‑level comparison method and must document the deviations from the CTT.
ISO/IEC 14651 forms the basis for string comparison in numerous operating systems (e.g., Linux’s glibc locale framework) and relational databases (e.g., PostgreSQL’s ICU collation support). The standard is often realized through the Unicode Collation Algorithm (UCA) defined in Unicode Technical Standard #10, which is fully aligned with ISO/IEC 14651. Therefore, any implementation that conforms to the UCA with a common tailoring will also conform to ISO/IEC 14651.
Computing collation keys for strings can be computationally intensive. Implementers often precompute sort keys for persistent storage, particularly in databases. The total size of a collation key is proportional to the cumulative number of weight tuples across all characters; for typical Latin‑script strings this is roughly 4–8 times the string length. Optimizations include using 16‑bit or 8‑bit compression for each level when the weights fit within a smaller integer range.
CAN/CSA‑ISO/IEC 14651‑18 is the Canadian adoption of the international standard, published by the Canadian Standards Association (CSA). This document is identical to ISO/IEC 14651:2016 and is an important resource for software vendors and organizations operating within Canada that require a domestically recognized version of the standard for procurement or regulatory purposes. Similar adoptions exist in other countries under their national standards bodies.
Conformance to ISO/IEC 14651 can be claimed at two levels:
Because the standard is aligned with the Unicode Collation Algorithm, many existing collation implementations—such as those in the ICU library (International Components for Unicode)—already meet the technical requirements. Compliance testing typically involves comparing sort orders against the reference examples provided in the standard’s annexes.
Technical Article – 2026. This content is provided for informational purposes and does not replace the official text of ISO/IEC 14651:2016 or CAN/CSA‑ISO/IEC 14651‑18.