Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
The international standard ISO/IEC 13249, Information technology — Database languages — SQL multimedia and application packages (SQL/MM), is a multi-part framework extending the SQL database language (ISO/IEC 9075) to handle complex multimedia data types. Part 2, Full-Text, specifically defines the packages for managing, searching, and analyzing textual documents at a granular linguistic level. In Canada, this is formally adopted as CAN/CSA-ISO/IEC 13249-2:04.
Unlike standard SQL pattern matching (LIKE operator), ISO/IEC 13249-2 introduces abstract data types (ADTs), user-defined routines (UDRs), and information schema views tailored for robust text analysis. The standard covers stemming, stop-word handling, thesaurus expansion, and relevance ranking. It is designed to provide a database-portable method for implementing enterprise full-text search features without relying entirely on vendor-specific proprietary syntax.
LANGUAGE attribute on the FULLTEXT data type allows the built-in routines to apply appropriate stemming algorithms and stop-word filters, dramatically improving search accuracy over simple tokenization approaches. The central component of the standard is the FULLTEXT SQL structured type. This ADT is designed to encapsulate a document’s content along with its linguistic properties, including language identifier, character encoding, and metadata properties. The standard defines a formal type hierarchy that allows for subtypes to be created for specific document formats (e.g., plain text, HTML, XML).
ISO/IEC 13249-2 standardizes a suite of routines to interact with the FULLTEXT type. Key operations include evaluating text containment, computing relevance scores, and managing thesauri. These are defined as abstract routines with specific signatures, allowing vendors to implement optimized internal algorithms while maintaining syntactic and semantic conformance.
| Component / Feature | Description | ISO/IEC 13249-2 Clause |
|---|---|---|
FULLTEXT Data Type | SQL structured type for encapsulating text documents with linguistic metadata | 5.1 |
CONTAINS Predicate | Returns TRUE if a full-text query expression matches the target document | 6.3 |
RANK Function | Computes a relevance score (e.g., TF-IDF or BM25) for ranking results | 6.4 |
THESAURUS Routines | Data type and management routines for hierarchical synonym and term expansion | 7.2 |
STARTS_WITH Predicate | Evaluates prefix matches on full-text columns | 6.5 |
CONTAINS. Portability is best achieved by limiting logic to the core operators (AND, OR, AND NOT, NEAR) defined in Clause 6.3. Proprietary syntax for fuzzy or regex searches generally breaks cross-DBMS compatibility. The standard defines a rich full-text query language that integrates directly into the SQL WHERE clause. The CONTAINS predicate accepts a character string literal representing the query expression. This expression supports Boolean operators, proximity operators (NEAR), phrase searching, and wildcard prefixes. The RANK function then calculates a relevance score, typically utilizing term frequency and inverse document frequency algorithms.
-- Conceptual SQL compliant with SQL/MM Full-Text SELECT doc_id, title, RANK(document_body, '"international standard" AND (database OR SQL)') AS relevance FROM documents WHERE CONTAINS(document_body, '"international standard" AND (database OR SQL)') AND language = 'ENGLISH' ORDER BY relevance DESC;
The standard includes full definitions for information schema views specific to full-text processing. This allows administration tools and applications to introspect the database to discover which tables have FULLTEXT columns, what languages are configured, and what thesauri are available. Conformant implementations expose this metadata, which is critical for cross-platform management utilities.
THESAURUS data type for domain-specific vocabulary expansion. This allows queries to automatically include broader terms, narrower terms, and synonyms, enabling robust semantic search without external keyword expansion libraries. Compliance with CAN/CSA-ISO/IEC 13249-2:04 is achieved through conformance to the SQL/MM Full-Text package within a DBMS. The standard defines specific feature tables and conformance requirements. Vendors typically claim support at a feature level (e.g., T011, T012 for core full-text features in SQL:2011+).
FULLTEXT data type and the CONTAINS predicate.RANK, THESAURUS, and advanced text operations.Modern relational database systems such as IBM Db2, Oracle Database, and MariaDB implement significant portions of the SQL/MM Full-Text standard. However, strict compliance across all defined features is uncommon. The standard remains a critical reference for designing portable full-text database schemas and applications.
RANK function differ from simple counting or sorting of results?RANK is defined by a standardized abstract algorithm that considers term frequency (TF), inverse document frequency (IDF), and document length normalization. This provides a mathematically grounded relevance score, unlike simple alphabetical or metadata-based sorting. The exact internal algorithm is vendor-specific but must conform to the functional signature and semantic requirements of ISO/IEC 13249-2 Clause 6.4. THESAURUS data type simply a lookup table?THESAURUS type specified in Clause 7 defines hierarchical relationships (Broader Term, Narrower Term, Related Term) and supports multi-lingual thesauri. Queries using CONTAINS can automatically expand search terms using these hierarchies, enabling powerful semantic search capabilities without external keyword expansion services or complex application-level logic. FULLTEXT ADT, thesaurus routines, and structured type hierarchy remain uniquely defined within the SQL/MM package structure of this specific standard. © 2026 Standards Council of Canada / Conseil canadien des normes. This article is an independent technical overview for informational purposes and is not an official publication of ISO, IEC, or the CSA Group.