Technical Guide to CAN/CSA-ISO/IEC 13249-2:04 (ISO/IEC 13249-2): SQL/MM Full-Text Data Types and Routines

Scope of CAN/CSA-ISO/IEC 13249-2:04

The international standard ISO/IEC 13249, Information technology — Database languages — SQL multimedia and application packages (SQL/MM), is a multi-part framework extending the SQL database language (ISO/IEC 9075) to handle complex multimedia data types. Part 2, Full-Text, specifically defines the packages for managing, searching, and analyzing textual documents at a granular linguistic level. In Canada, this is formally adopted as CAN/CSA-ISO/IEC 13249-2:04.

Unlike standard SQL pattern matching (LIKE operator), ISO/IEC 13249-2 introduces abstract data types (ADTs), user-defined routines (UDRs), and information schema views tailored for robust text analysis. The standard covers stemming, stop-word handling, thesaurus expansion, and relevance ranking. It is designed to provide a database-portable method for implementing enterprise full-text search features without relying entirely on vendor-specific proprietary syntax.

Tip: The Full-Text package is language-aware. Correctly configuring the LANGUAGE attribute on the FULLTEXT data type allows the built-in routines to apply appropriate stemming algorithms and stop-word filters, dramatically improving search accuracy over simple tokenization approaches.

Technical Architecture and Core Data Types

The Full-Text Abstract Data Type

The central component of the standard is the FULLTEXT SQL structured type. This ADT is designed to encapsulate a document’s content along with its linguistic properties, including language identifier, character encoding, and metadata properties. The standard defines a formal type hierarchy that allows for subtypes to be created for specific document formats (e.g., plain text, HTML, XML).

Routines, Predicates, and Functions

ISO/IEC 13249-2 standardizes a suite of routines to interact with the FULLTEXT type. Key operations include evaluating text containment, computing relevance scores, and managing thesauri. These are defined as abstract routines with specific signatures, allowing vendors to implement optimized internal algorithms while maintaining syntactic and semantic conformance.

Component / Feature	Description	ISO/IEC 13249-2 Clause
`FULLTEXT` Data Type	SQL structured type for encapsulating text documents with linguistic metadata	5.1
`CONTAINS` Predicate	Returns TRUE if a full-text query expression matches the target document	6.3
`RANK` Function	Computes a relevance score (e.g., TF-IDF or BM25) for ranking results	6.4
`THESAURUS` Routines	Data type and management routines for hierarchical synonym and term expansion	7.2
`STARTS_WITH` Predicate	Evaluates prefix matches on full-text columns	6.5

Warning: While the standard defines the abstract functional signatures, implementations often extend the full-text query language used within CONTAINS. Portability is best achieved by limiting logic to the core operators (AND, OR, AND NOT, NEAR) defined in Clause 6.3. Proprietary syntax for fuzzy or regex searches generally breaks cross-DBMS compatibility.

Implementation Highlights and Usage

Query Language and Text Operators

The standard defines a rich full-text query language that integrates directly into the SQL WHERE clause. The CONTAINS predicate accepts a character string literal representing the query expression. This expression supports Boolean operators, proximity operators (NEAR), phrase searching, and wildcard prefixes. The RANK function then calculates a relevance score, typically utilizing term frequency and inverse document frequency algorithms.

-- Conceptual SQL compliant with SQL/MM Full-Text SELECT doc_id, title, RANK(document_body, '"international standard" AND (database OR SQL)') AS relevance FROM documents WHERE CONTAINS(document_body, '"international standard" AND (database OR SQL)') AND language = 'ENGLISH' ORDER BY relevance DESC;

Schema and Metadata Compliance

The standard includes full definitions for information schema views specific to full-text processing. This allows administration tools and applications to introspect the database to discover which tables have FULLTEXT columns, what languages are configured, and what thesauri are available. Conformant implementations expose this metadata, which is critical for cross-platform management utilities.

Best Practice: For high-compliance enterprise deployments, enforce the use of the THESAURUS data type for domain-specific vocabulary expansion. This allows queries to automatically include broader terms, narrower terms, and synonyms, enabling robust semantic search without external keyword expansion libraries.

Compliance and Certification Notes

Compliance with CAN/CSA-ISO/IEC 13249-2:04 is achieved through conformance to the SQL/MM Full-Text package within a DBMS. The standard defines specific feature tables and conformance requirements. Vendors typically claim support at a feature level (e.g., T011, T012 for core full-text features in SQL:2011+).

Core Conformance: Support for the FULLTEXT data type and the CONTAINS predicate.
Enhanced Conformance: Support for RANK, THESAURUS, and advanced text operations.
Testing: Conformance testing generally involves verifying the specific schemas and routines defined in the standard’s test annexes.

Caution: CAN/CSA-ISO/IEC 13249-2:04 specifically adopts the text of ISO/IEC 13249-2:2003. Later amendments or revisions (e.g., the 2011 edition) are not reflected in this adoption. Always verify the exact revision level a DBMS claims for conformance against the specific 2003 base text to ensure consistent behavior.

Modern relational database systems such as IBM Db2, Oracle Database, and MariaDB implement significant portions of the SQL/MM Full-Text standard. However, strict compliance across all defined features is uncommon. The standard remains a critical reference for designing portable full-text database schemas and applications.

Frequently Asked Questions

Q: How does the RANK function differ from simple counting or sorting of results?
A: RANK is defined by a standardized abstract algorithm that considers term frequency (TF), inverse document frequency (IDF), and document length normalization. This provides a mathematically grounded relevance score, unlike simple alphabetical or metadata-based sorting. The exact internal algorithm is vendor-specific but must conform to the functional signature and semantic requirements of ISO/IEC 13249-2 Clause 6.4.

Q: Is the THESAURUS data type simply a lookup table?
A: No. The THESAURUS type specified in Clause 7 defines hierarchical relationships (Broader Term, Narrower Term, Related Term) and supports multi-lingual thesauri. Queries using CONTAINS can automatically expand search terms using these hierarchies, enabling powerful semantic search capabilities without external keyword expansion services or complex application-level logic.

Q: What is the relationship between CAN/CSA-ISO/IEC 13249-2:04 and later versions of the SQL standard?
A: CAN/CSA-ISO/IEC 13249-2:04 is strictly the Canadian adoption of ISO/IEC 13249-2:2003. While subsequent SQL standards (SQL:2008 through SQL:2023) have incorporated some core full-text capabilities into the main SQL framework, the specific FULLTEXT ADT, thesaurus routines, and structured type hierarchy remain uniquely defined within the SQL/MM package structure of this specific standard.

© 2026 Standards Council of Canada / Conseil canadien des normes. This article is an independent technical overview for informational purposes and is not an official publication of ISO, IEC, or the CSA Group.

📥 Standard Documents Download

🔒

Please wait 10 seconds, the download links will appear after the ad loads