Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
The standard IEC 15475-3-04 (technically identical to ISO/IEC 15475-3:2004 and adopted as CAN/CSA-ISO/IEC 15475-3:04) specifies the binding of the Ada programming language to the Universal Multiple-Octet Coded Character Set (UCS). UCS is defined by ISO/IEC 10646 and is fully equivalent to the Unicode standard. This binding allows Ada programs to handle text in virtually all of the world’s writing systems. This article details the scope, key technical requirements, implementation considerations, and compliance notes of this important international standard.
The standard is part of a multi‑part series (ISO/IEC 15475) that defines language bindings to UCS for several programming languages. Part 3 specifically targets Ada, providing both normative requirements and informative guidance for compiler vendors, tool developers, and application programmers who need to build internationalized Ada software.
IEC 15475-3-04 covers the following aspects of Ada’s integration with UCS:
Wide_Wide_Character and Wide_Wide_String as 32‑bit types capable of holding any UCS code point (U+0000 to U+10FFFF).Ada.Text_IO, Ada.Wide_Text_IO, and Ada.Wide_Wide_Text_IO) to support UCS‑encoded text streams.The standard mandates a set of concrete requirements that Ada implementations must satisfy to claim conformance. The table below summarises the principal normative provisions:
| Feature | Normative Requirement | Remarks |
|---|---|---|
Wide_Wide_Character range | Must represent all UCS code points from U+0000 through U+10FFFF (i.e., the whole 21‑bit Unicode code space). | Ada’s type Wide_Wide_Character is a 32‑bit modular type; the standard prohibits using it for codes beyond U+10FFFF. |
String type Wide_Wide_String | Must be an array of Wide_Wide_Character with full UCS semantics. | Operations on Wide_Wide_String (e.g., indexing, concatenation) must preserve the integer code point values. |
| Source encoding | The implementation shall accept at least one of UTF‑8, UTF‑16, or UTF‑32; if the source encoding is not capable of representing a character, a compile‑time error must be raised. | Some implementations may accept mixed encodings; the standard encourages a single, well‑documented default encoding. |
| Identifier characters | Identifiers may contain any UCS letter (L* categories in Unicode), any digit (Nd category), and the underscore, following Ada’s rules for identifiers (first character must be a letter). | The set of allowed identifier characters is determined by the Ada language standard (ISO/IEC 8652:1995 with amendments) and further clarified by this binding. |
| Text file I/O | Wide‑wide text files (type Wide_Wide_Text_File) shall support reading and writing of UCS‑encoded text; the default encoding can be locale‑dependent, but the implementation must provide a mechanism to specify UTF‑8 or UTF‑16 explicitly. | This requirement is crucial for cross‑platform portability of Ada applications dealing with multilingual text. |
| Environment access | The package Ada.Command_Line and Ada.Environment_Variables shall return Wide_Wide_String values for arguments and environment variables, using the system’s native UCS encoding. | On systems that use a non‑UCS native encoding (e.g., legacy 8‑bit code pages), the implementation must convert transparently. |
Wide_Character (16‑bit) support and will not conform to this standard without additional packages provided by the vendor. To make use of UCS facilities, developers should ensure that the compiler can process Ada source files containing non‑ASCII characters. Many Ada 2005‑compliant compilers (e.g., GNAT, Janus/Ada, ObjectAda) provide command‑line switches such as -gnatW8 for UTF‑8 source encoding. The standard does not prescribe a specific pragma, but recommends that implementations document how the source encoding is determined (e.g., by a BOM at the beginning of the file, via a project file setting, or a command line option).
Conformant run‑time libraries must supply the packages Ada.Strings.Wide_Wide_Unbounded, Ada.Strings.Wide_Wide_Bounded, and Ada.Strings.Wide_Wide_Fixed, all operating on Wide_Wide_String. Programmers should be aware that these packages treat characters as whole code points; surrogate pairs or combining sequences are not specially handled unless the application explicitly processes them.
The standard references the Ada language’s Ada.Characters.Handling package; however, the classification functions (Is_Letter, Is_Digit, etc.) must be extended to work for the full UCS set. Implementors typically use the Unicode Character Database (UCD) to derive the necessary category tables. Performance‑critical applications may need to pre‑load these tables.
| Scenario | Potential Issue | Mitigation |
|---|---|---|
| File names with non‑Latin characters | Systems with legacy file systems may not support UCS in file names; Ada.Directories may return garbage. | Use only ASCII fallback names when interacting with such systems, or rely on the implementation’s conversion layer. |
| Source code encoding mismatches | A file saved as UTF‑8 but read with a UTF‑16 encoding can cause compile‑time errors or wrong identifier recognition. | Always specify the source encoding explicitly in the project configuration and ensure all tools (editor, compiler) agree. |
| Mixing Wide_String and Wide_Wide_String | Conversion between these types is not always defined for code points above U+FFFF. | Use explicit conversion functions provided by Ada.Strings.UTF_Encoding or the packages defined in the implementation. |
Wide_Wide_String as your default string type for all user‑facing text. This ensures that your application can handle any language without later rewriting. The memory overhead is generally acceptable given modern hardware. A compiler or runtime library is considered conformant to IEC 15475-3-04 if it satisfies all the mandatory requirements listed in the standard’s normative clauses. The standard does not provide a formal conformance test suite, but implementors should be prepared to demonstrate:
Wide_Wide_Character type and its operations as defined in the Ada 2005 language standard, including all string manipulation packages operating on Wide_Wide_String.The standard is harmonized with other parts of the Ada language standards and with the general UCS / Unicode framework. A declaration of conformance should cite the exact version of the Ada language standard (e.g., Ada 2005) and the UCS version (typically ISO/IEC 10646:2003).
This article is provided for informational purposes and reflects the author’s understanding of the standard as of 2026. Always refer to the official published text of IEC 15475-3-04 (CAN/CSA-ISO/IEC 15475-3:04) for authoritative requirements.