Improve Interpretation of Word Character Styles
Summary
WWeP 2010.2 incorrectly reports Word character styles as overrides. It inserts redundant span/@style attributes in the HTML, inflating the size of the output. This behavior should be fixed.
Detailed Description
I have noticed that WWeP frequently misinterprets Word character styles as style overrides. The result is that WWeP inserts redundant CSS style attributes in the <span> tags. In documents that contain many character styles, this behavior can greatly inflate the output size.
I found a workaround, but is there a way to fix the problem permanently?
Use Cases
In a large CHM project, the Word 2003 source documents contain approximately 15,000 instances of a character style call "autolink char". In Word, the style is defined as "default paragraph font + italic". I used the Word Formatting in Use tool to verify that the documents contain pure style formatting. They do not contain even a single instance of direct character or paragraph formatting.
In WWeP, I observed the following behavior:
1. The Styles Report lists every instance of the character style as an override.
2. In the *.wif files, each TextRun element that refers to a character style contains a Style element:
<TextRun id="7000009" stylename="autolink char">
<Style>
<Attribute name="font-family" value="Times New Roman" /> <Attribute name="font-size" value="10pt" /> <Attribute name="font-weight" value="normal" /> <Attribute name="font-style" value="italic" /> <Attribute name="font-variant" value="normal" /> <Attribute name="text-transform" value="none" /> <Attribute name="text-decoration-underline" value="none" /> <Attribute name="text-decoration-line-through" value="none" /> <Attribute name="vertical-align" value="baseline" />
</Style> <Text value="Some Text" />
</TextRun>
3. In the generated CSS file, the style is formatted as follows:
span.autolink_char {
- font-style: italic; font-variant: normal; font-weight: normal; text-transform: none; vertical-align: baseline;
}
4. In the generated HTML code, every one of the 15,000 instances is surrounded by <span> tags, like this:
<span class="autolink_char" style="font-variant: normal; font-weight: normal; text-transform: none; vertical-align: baseline">Some Text</span>
The span/@style attribute appears to be completely redundant. It contains a subset of the style definition that exists in the CSS file.
WORKAROUND
I can get rid of the span/@style attribute by setting the offending CSS properties to Do Not Emit. When I applied this solution to the "autolink char" style, the CHM output size was reduced by 24%.