Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
At the heart of IEC 62605 is a hierarchical data model organizing the e-dictionary into multi-level container structures. The top level is the Dictionary Body, which contains one or more Entry elements. Each entry contains a Headword (the lexical form being looked up) and zero or more Body sections containing the lexical content. The body elements are further subdivided into:
| Data Element | Description | Attributes / Features |
|---|---|---|
| Headword | The primary search term of the entry, including spelling and possible variant forms | Spelling, phonetic transcription, syllable division, part of speech, etymology |
| Pronunciation | Pronunciation information for the headword | IPA transcription, audio file reference, pronunciation variants |
| Sense | A specific meaning or usage of the word | Definition, example sentence, register label, subject label |
| Translation | The target-language equivalent in bilingual dictionaries | Target word, grammatical information, usage constraints |
| Example | A sentence demonstrating usage in context | Example text, translation, corpus source |
| Collocation | Word combinations that frequently co-occur with the headword | Collocate word, collocation type (verb+object, modifier, etc.) |
| Cross-reference | Link to another entry within the dictionary | Reference type (see also, compare, synonym, antonym) |
| Multimedia | Image or audio resources associated with the entry | File reference, media type, usage context |
The standard uses an XML namespace mechanism to extend the lexical data representation. The core namespace defines the base dictionary structural elements, while optional extension namespaces enable specialized semantic tagging for domain-specific lexicons (medical, legal, technical, scientific).
Every IEC 62605-compatible dictionary file includes a Dictionary Metadata header that provides identifying and usage information for the dictionary:
| Metadata Field | Description | Example Value |
|---|---|---|
| Dictionary identifier | Unique identifier for the dictionary | ISBN 978-0-19-957112-3 |
| Source language | Language of the headwords | en-GB (British English) |
| Target language | Language of translations (bilingual) | zh-CN (Simplified Chinese) |
| Dictionary type | Classification of the dictionary type | monolingual / bilingual / thesaurus / encyclopedic |
| Total entries | Approximate number of headwords | 150,000 |
| Version | Version number of the dictionary content | 2.1.0 |
| Copyright and license | Intellectual property information | Creative Commons BY-NC-SA 4.0 |
IEC 62605 uses XML Schema (XSD) to define the dictionary format. A typical entry XML structure (simplified):
<entry id="eng-run">
<headword>run</headword>
<pronunciation>
<ipa>/rʌn/</ipa>
<audio src="run_us.wav" />
</pronunciation>
<sense n="1">
<definition>to move using your legs, faster than walking</definition>
<example>She runs every morning.</example>
</sense>
<sense n="2">
<definition>to operate or control a machine or system</definition>
<example>He runs the printing press.</example>
</sense>
</entry>
IEC 62605 is designed to enable e-dictionaries to be ported seamlessly across different devices and operating systems. The standard achieves this through three main mechanisms:
In practice, creating an IEC 62605-format dictionary involves the following workflow:
Q1: How does IEC 62605 differ from LMF (ISO 24613 Lexical Markup Framework)?
A: Both use XML schemas for lexical data representation, but serve different purposes. ISO 24613 (LMF) is an NLP and computational linguistics standard developed by ISO/TC 37, focusing on detailed morphological, syntactic, and semantic information markup for computational lexicons. IEC 62605 is a consumer electronics standard developed by IEC TC 100, focusing on the exchange of dictionary content and display between consumer device platforms (e-readers, mobile apps). The key distinction: LMF is optimized for machine processability, while IEC 62605 is optimized for content renderability and navigation.
Q2: How does IEC 62605 support multilingual character sets?
A: Because it uses XML with UTF-8/UTF-16 encoding, IEC 62605 natively supports all Unicode character sets, including Latin, Chinese, Arabic, Cyrillic, Devanagari, and all other writing systems. Entries can contain right-to-left text (e.g., Arabic) or vertical text (e.g., traditional Japanese), though support on the rendering side depends on the reader platform. For CJK languages, the standard supports Zhuyin/Pinyin/Kana pronunciation annotations.
Q3: Can IEC 62605 dictionaries use Digital Rights Management (DRM)?
A: The standard itself does not include DRM mechanisms, but dictionary metadata can carry copyright and license information that consuming software can use to enforce usage restrictions. Actual DRM encryption and access control is implemented at the distribution channel level (such as app stores or content servers), not at the dictionary file format level. Encryption of XML content can be performed at transport or rest using standard XML encryption mechanisms.
Q4: Can I export only a subset of a dictionary for a specific purpose?
A: Yes. The IEC 62605 data model supports selective extraction of entry subsets through XPath/XQuery queries. For example, all entries tagged with the “computer science” subject label can be extracted to create a specialized terminology glossary. The standard also defines the concept of profiles, which allow packaging a subset of a large dictionary as a lightweight version optimized for specific usage contexts (e.g., a 5,000-core-vocabulary version for embedded devices or entry-level learners).