Locales

Version: 1.0.0
Date: 2005-12-12
Changes:Initial version.

Abstract

Overview of the design and implementation of ePublisher locales XML file.

1   Introduction

The "locales.xml" file provides the following information:

  1. Strings and messages.
  2. Search indexing parameters.
  3. Index grouping and display order information.

1.1   Strings and messages

Strings specified in the <Strings> section of "locales.xml" are formatted using C#/.Net string replacement monikers {0}, {1}, {2}, etc.

<Locales>
 <Locale name="de" lcid="0x0407" codepage="1252">
  <Strings>
   <String name="MissingFile" value="File '{0}' is missing." />
  </Strings>
 </Locale>
</Locales>

1.2   Search indexing parameters

To create a usable search index for generated content, most search engines require users to specify a minimum word length and a list of stop words.

The minimum search word length insures that words less then N number of characters will be disregarded. For languages such as Japanese and Chinese, the minimum search word length is often 1.

The stop word list consists of any words that are commonly used in the target language/locale and therefore should be excluded from search indicies. For English, the list includes "in the but not". Exact translation of stop word lists is not possible between languages. Each language and locale must be reviewed by a native language speaker.

<Locales>
 <Locale name="de" lcid="0x0407" codepage="1252">
  <Search>
   <MinimumWordLength value="3" />
   <StopWords>
    in the but not
   </StopWords>
  </Search>
 </Locale>
</Locales>

1.3   Index grouping and display order information

Every language has conventions for directing users to alternate index entries and grouping index entries.

<Locales>
 <Locale name="de" lcid="0x0407" codepage="1252">
  <Index>
   ...
  </Index>
 </Locale>
</Locales>

Directing users to alternate index entries requires knowledge of text patterns in the target language/locale. Consider the following index:

Oceans
  Atlantic
  Indian
  Pacific

Seas
  Carribian
  Aegean
  See Also Oceans

The last entry, "See Also Oceans" hints users to look in another section. For online help systems, ePublisher must convert this hint into an active hyperlink. ePublisher supports "SeeAlsoPrefix" elements where a match on the leading entry characters determine which index entries should be treated as See Also entries.

<Index>
 <SeeAlsoExpressions>
  <SeeAlsoPrefix value="See Also " />
 </SeeAlsoExpressions>
</Index>

Creating index entry groupings requires the ability to define both index sections and groups within a given section. For English, consider this index:

Numerics
  1
  2

A
  apple

B
  ball

Symbols
  &
  ?

To specify the top level sections and their order in "locales.xml", create <Section> elements with position attributes.

<Index>
 <Sections>
  <Section position="1">
  </Section>
  <Section position="2">
  </Section>
  <Section position="3">
  </Section>
 </Sections>
</Index>

Next, define the default group name for each section with a <DefaultGroup> element:

<Index>
 <Sections>
  <Section position="1">
   <DefaultGroup name="Numerics" />
  </Section>
  <Section position="2">
   <DefaultGroup name="Letters" />
  </Section>
  <Section position="3">
   <DefaultGroup name="Symbols" />
  </Section>
 </Sections>
</Index>

If the original index were generated right now, it would look something like this:

Numerics
  &
  ?
  apple
  ball
  1
  2

Letters

Symbols

The <DefaultGroup> name is placed at the top of each section and the first section without a <Members> element becomes the default section for all unmatched index entries.

To identify certain characters that must appear in certain sections, use the <Members> element:

<Index>
 <Sections>
  <Section position="1">
   <DefaultGroup name="Numerics" />
   <Members>
    <Member match="1" />
    <Member match="2" />
   </Members>
  </Section>
  <Section position="2">
   <DefaultGroup name="Letters" />
  </Section>
  <Section position="3">
   <DefaultGroup name="Symbols" />
   <Members>
    <Member match="&" />
    <Member match="?" />
   </Members>
  </Section>
 </Sections>
</Index>

Generating the index at this point would yield:

Numerics
  1
  2

Letters
  apple
  ball

Symbols
  &
  ?

The index entries "1" and "2" match <Member> elements for the "Numerics" group. The index entires "&" and "?" match <Member> elements for the "Symbols" group. Finally, "apple" and "ball" match no <Member> elements and therefore are pushed into the first <Section> without a <Members> element, "Letters".

So how does one create additional groups within a section? With <Group> elements:

<Index>
 <Sections>
  <Section position="1">
   <DefaultGroup name="Numerics" />
   <Members>
    <Member match="1" />
    <Member match="2" />
   </Members>
  </Section>
  <Section position="2">
   <DefaultGroup name="Letters" />
   <Group name="A" sort="A" />
   <Group name="B" sort="B" />
  </Section>
  <Section position="3">
   <DefaultGroup name="Symbols" />
   <Members>
    <Member match="&" />
    <Member match="?" />
   </Members>
  </Section>
 </Sections>
</Index>

This yields:

Numerics
  1
  2

Letters

A
  apple

B
  ball

Symbols
  &
  ?

ePublisher then silently drops any empty groups or sections from the generated index:

Numerics
  1
  2

A
  apple

B
  ball

Symbols
  &
  ?

Note the <Group> element has both a name attribute and a sort attribute. This is used in case the group name should not be used to determine sort order. For example:

<Index>
 <Sections>
  <Section position="1">
   <DefaultGroup name="Numerics" />
   <Members>
    <Member match="1" />
    <Member match="2" />
   </Members>
  </Section>
  <Section position="2">
   <DefaultGroup name="Letters" />
   <Group name="First Letter" sort="A" />
   <Group name="Awesome Letters" sort="B" />
  </Section>
  <Section position="3">
   <DefaultGroup name="Symbols" />
   <Members>
    <Member match="&" />
    <Member match="?" />
   </Members>
  </Section>
 </Sections>
</Index>

allows users to have complex group names:

Numerics
  1
  2

First Letter
  apple

Awesome Letters
  ball

Symbols
  &
  ?

If one of the <Group> elements is removed, then the <DefaultGroup> is used. Removing "First Letter" group as in:

<Index>
 <Sections>
  <Section position="1">
   <DefaultGroup name="Numerics" />
   <Members>
    <Member match="1" />
    <Member match="2" />
   </Members>
  </Section>
  <Section position="2">
   <DefaultGroup name="Letters" />
   <Group name="First Letter" sort="A" />
   <Group name="Awesome Letters" sort="B" />
  </Section>
  <Section position="3">
   <DefaultGroup name="Symbols" />
   <Members>
    <Member match="&" />
    <Member match="?" />
   </Members>
  </Section>
 </Sections>
</Index>

generates:

Numerics
  1
  2

Letters
  apple

Awesome Letters
  ball

Symbols
  &
  ?

DevCenter/Documentation/Locales (last edited 2008-02-13 06:18:28 by localhost)