ISO 25964-1:2011 — Thesauri for Information Retrieval

Comprehensive guide to thesaurus construction, maintenance, and interoperability

1. Understanding Thesaurus Structures for Information Retrieval

ISO 25964-1:2011 provides comprehensive guidance on the construction, maintenance, and management of thesauri for information retrieval systems. A thesaurus is a controlled vocabulary arranged in a known order with clearly displayed relationships between concepts. This standard replaces ISO 2788 and ISO 5964, unifying monolingual and multilingual guidelines. Three fundamental relationship types are defined: hierarchical (BT/NT), associative (RT), and equivalence (USE/UF). The hierarchical relationship subdivides into generic (is-a), whole-part, and instance relationships.

Always begin thesaurus design with facet analysis to identify fundamental concept categories for comprehensive coverage and logical consistency.
Relationship Tag Example Rule
Generic BTG/NTG Vehicles to Cars Every NTG is a BTG instance
Whole-Part BTP/NTP Europe to France NTP is part of BTP
Instance BTI/NTI Planets to Mars NTI is named instance
Associative RT Diagnosis/Treatment Non-hierarchical link
Equivalence USE/UF Cars UF Automobiles Synonym control

2. Vocabulary Control and Term Selection

Term selection is critical in thesaurus construction. ISO 25964-1 provides guidance on grammatical form of terms, capitalization, punctuation, and special characters. Compound terms receive special attention with decision trees for determining pre-coordination versus post-coordination. The equivalence relationship includes quasi-synonyms for pragmatic vocabulary size management. Clause 8 details this approach. Cross-language equivalence in multilingual thesauri requires careful handling of different semantic structures. The standard also covers scope notes for clarifying term meaning and usage, reciprocal scope notes for paired terms, and disambiguation techniques for homographs using parenthetical qualifiers.

Avoid excessively long compound terms. Post-coordination improves indexing consistency and search flexibility while reducing maintenance overhead significantly.

3. Data Model, Exchange Formats, and Interoperability

Clause 15 introduces a formal data model in relational table form and XML Schema. Key entities include Concept, Term, Relationship, and Note supporting multilingual environments. The model is compatible with SKOS for linked data applications. The standard covers presentation formats including alphabetical, systematic, and graphical displays. Clause 16 addresses integration with indexing and searching applications. For multilingual thesauri, guidance is provided on Unicode encoding, sorting orders, and cross-language equivalence representation. Clauses 17-18 cover exchange formats and protocols including Z39.50 for distributed search environments.

Adopting the ISO 25964-1 data model reduces cost of system migrations and data exchanges. SKOS compatibility is valuable for semantic web applications.

Frequently Asked Questions

Q: Difference between ISO 25964-1 and ISO 2788?
A: ISO 25964-1 supersedes both ISO 2788 and ISO 5964 with unified guidance including facet analysis, formal data models, and XML exchange formats.
Q: Can it build taxonomies or ontologies?
A: Principles apply to taxonomies. For ontologies, ISO 25964-2 provides OWL interoperability guidance.
Q: How are homographs handled?
A: Parenthetical qualifiers disambiguate. Each is a separate concept with unique identifier and scope note.
Q: What software supports this standard?
A: Synaptica, PoolParty, VocBench support the data model with SKOS/RDF export.

Leave a Reply

Your email address will not be published. Required fields are marked *