IEC 15475-3-04: Technical Guide to Ada Binding for the Universal Coded Character Set

Scope, Requirements, and Compliance for Ada Programming Language Integration with ISO/IEC 10646 (UCS)

The standard IEC 15475-3-04 (technically identical to ISO/IEC 15475-3:2004 and adopted as CAN/CSA-ISO/IEC 15475-3:04) specifies the binding of the Ada programming language to the Universal Multiple-Octet Coded Character Set (UCS). UCS is defined by ISO/IEC 10646 and is fully equivalent to the Unicode standard. This binding allows Ada programs to handle text in virtually all of the world’s writing systems. This article details the scope, key technical requirements, implementation considerations, and compliance notes of this important international standard.

The standard is part of a multi‑part series (ISO/IEC 15475) that defines language bindings to UCS for several programming languages. Part 3 specifically targets Ada, providing both normative requirements and informative guidance for compiler vendors, tool developers, and application programmers who need to build internationalized Ada software.

Scope

IEC 15475-3-04 covers the following aspects of Ada’s integration with UCS:

  • Character and string types – Defines the semantics and representation of Wide_Wide_Character and Wide_Wide_String as 32‑bit types capable of holding any UCS code point (U+0000 to U+10FFFF).
  • Source code representation – Specifies that Ada source files can be encoded in UTF‑8, UTF‑16, or any other encoding that maps to the logical character set of the implementation, provided that the mapping is lossless for all permitted characters.
  • Identifier rules – Allows identifiers to contain letters from any script supported by UCS, including Latin, Cyrillic, Arabic, Han, etc., subject to the Ada language rules for identifier composition.
  • Literals – Character literals and string literals can contain any UCS character, with the restriction that the actual representation in source code must be allowed by the encoding used.
  • Standard input/output – Extends the Ada input/output libraries (Ada.Text_IO, Ada.Wide_Text_IO, and Ada.Wide_Wide_Text_IO) to support UCS‑encoded text streams.
  • Environment interface – Defines how operating system file names, command line arguments, and environment variables that may contain UCS characters are mapped to Ada string types.

Technical Requirements

The standard mandates a set of concrete requirements that Ada implementations must satisfy to claim conformance. The table below summarises the principal normative provisions:

FeatureNormative RequirementRemarks
Wide_Wide_Character rangeMust represent all UCS code points from U+0000 through U+10FFFF (i.e., the whole 21‑bit Unicode code space).Ada’s type Wide_Wide_Character is a 32‑bit modular type; the standard prohibits using it for codes beyond U+10FFFF.
String type Wide_Wide_StringMust be an array of Wide_Wide_Character with full UCS semantics.Operations on Wide_Wide_String (e.g., indexing, concatenation) must preserve the integer code point values.
Source encodingThe implementation shall accept at least one of UTF‑8, UTF‑16, or UTF‑32; if the source encoding is not capable of representing a character, a compile‑time error must be raised.Some implementations may accept mixed encodings; the standard encourages a single, well‑documented default encoding.
Identifier charactersIdentifiers may contain any UCS letter (L* categories in Unicode), any digit (Nd category), and the underscore, following Ada’s rules for identifiers (first character must be a letter).The set of allowed identifier characters is determined by the Ada language standard (ISO/IEC 8652:1995 with amendments) and further clarified by this binding.
Text file I/OWide‑wide text files (type Wide_Wide_Text_File) shall support reading and writing of UCS‑encoded text; the default encoding can be locale‑dependent, but the implementation must provide a mechanism to specify UTF‑8 or UTF‑16 explicitly.This requirement is crucial for cross‑platform portability of Ada applications dealing with multilingual text.
Environment accessThe package Ada.Command_Line and Ada.Environment_Variables shall return Wide_Wide_String values for arguments and environment variables, using the system’s native UCS encoding.On systems that use a non‑UCS native encoding (e.g., legacy 8‑bit code pages), the implementation must convert transparently.
Important – The standard does not mandate the behavior of string concatenation or character classification across full UCS range in all packages; implementors should consult the Ada 2005 Core Language Reference Manual (ISO/IEC 8652:1995/Amd 1:2007) together with this binding for full details. Some older Ada 95 compilers may have only Wide_Character (16‑bit) support and will not conform to this standard without additional packages provided by the vendor.

Implementation Considerations

Compiler Support and Pragmas

To make use of UCS facilities, developers should ensure that the compiler can process Ada source files containing non‑ASCII characters. Many Ada 2005‑compliant compilers (e.g., GNAT, Janus/Ada, ObjectAda) provide command‑line switches such as -gnatW8 for UTF‑8 source encoding. The standard does not prescribe a specific pragma, but recommends that implementations document how the source encoding is determined (e.g., by a BOM at the beginning of the file, via a project file setting, or a command line option).

Runtime Library Extensions

Conformant run‑time libraries must supply the packages Ada.Strings.Wide_Wide_Unbounded, Ada.Strings.Wide_Wide_Bounded, and Ada.Strings.Wide_Wide_Fixed, all operating on Wide_Wide_String. Programmers should be aware that these packages treat characters as whole code points; surrogate pairs or combining sequences are not specially handled unless the application explicitly processes them.

Character Classification

The standard references the Ada language’s Ada.Characters.Handling package; however, the classification functions (Is_Letter, Is_Digit, etc.) must be extended to work for the full UCS set. Implementors typically use the Unicode Character Database (UCD) to derive the necessary category tables. Performance‑critical applications may need to pre‑load these tables.

Portability Pitfalls

ScenarioPotential IssueMitigation
File names with non‑Latin charactersSystems with legacy file systems may not support UCS in file names; Ada.Directories may return garbage.Use only ASCII fallback names when interacting with such systems, or rely on the implementation’s conversion layer.
Source code encoding mismatchesA file saved as UTF‑8 but read with a UTF‑16 encoding can cause compile‑time errors or wrong identifier recognition.Always specify the source encoding explicitly in the project configuration and ensure all tools (editor, compiler) agree.
Mixing Wide_String and Wide_Wide_StringConversion between these types is not always defined for code points above U+FFFF.Use explicit conversion functions provided by Ada.Strings.UTF_Encoding or the packages defined in the implementation.
Tip – For new Ada projects, consider using Wide_Wide_String as your default string type for all user‑facing text. This ensures that your application can handle any language without later rewriting. The memory overhead is generally acceptable given modern hardware.

Compliance Notes

A compiler or runtime library is considered conformant to IEC 15475-3-04 if it satisfies all the mandatory requirements listed in the standard’s normative clauses. The standard does not provide a formal conformance test suite, but implementors should be prepared to demonstrate:

  • Complete support for the Wide_Wide_Character type and its operations as defined in the Ada 2005 language standard, including all string manipulation packages operating on Wide_Wide_String.
  • Correct reading and writing of at least one UCS encoding (typically UTF‑8) in source files, with appropriate diagnostic messages when illegal or incomplete sequences are encountered.
  • Proper handling of UCS characters in identifiers, as defined by the Ada 2005 language rules (the binding does not extend the set of allowed characters beyond what Ada 2005 already permits; it merely clarifies the mapping to UCS).
  • Documentation of the default source file encoding, the mechanism to change it, and any restrictions (e.g., maximum number of characters in an identifier).
  • Proof that the I/O packages can produce and consume text files that adhere to UCS encoding conventions (e.g., using the BOM when required).

The standard is harmonized with other parts of the Ada language standards and with the general UCS / Unicode framework. A declaration of conformance should cite the exact version of the Ada language standard (e.g., Ada 2005) and the UCS version (typically ISO/IEC 10646:2003).

Compliance Benefit – Adopting a conformant Ada implementation gives you confidence that your software can be localized for markets using Latin, Greek, Cyrillic, Arabic, Hebrew, Devanagari, CJK scripts, and many others. This dramatically reduces the cost of bringing products to global markets and simplifies the maintenance of multilingual code.
Important – Non‑conformant implementations may silently truncate characters above U+FFFF, misinterpret source code identifiers, or produce garbled output. Always verify that your compiler explicitly states support for ISO/IEC 15475-3:2004 before relying on full UCS capabilities.

Frequently Asked Questions

Q: What is the relationship between IEC 15475-3-04 and Unicode?
A: The standard references ISO/IEC 10646, which is essentially identical to Unicode. Therefore, any Unicode character valid in a given version is also a valid UCS character. Ada code conforming to this standard can seamlessly use Unicode characters in strings, identifiers, and files, provided the underlying platform supports them.
Q: Does IEC 15475-3-04 apply only to Ada 2005 or also to later versions (Ada 2012, Ada 2022)?
A: The standard was published in 2004 and targets Ada 2005. However, Ada 2012 and Ada 2022 incorporate the same wide‑wide character support and enhance the language libraries. The binding provisions from this standard remain valid for those later versions; in fact, most commercial Ada compilers now follow the later standards but still comply with the binding requirements defined here.
Q: Are there any plans to create a new edition of this standard?
A: As of 2026, no revision of ISO/IEC 15475-3 has been published. The core requirements are considered stable, as Unicode updates mainly add characters without changing the fundamental model. Users should monitor the ISO/IEC and CSA websites for any amendments or revisions.

This article is provided for informational purposes and reflects the author’s understanding of the standard as of 2026. Always refer to the official published text of IEC 15475-3-04 (CAN/CSA-ISO/IEC 15475-3:04) for authoritative requirements.

📥 Standard Documents Download

🔒
Please wait 10 seconds, the download links will appear after the ad loads

Leave a Reply

Your email address will not be published. Required fields are marked *