XML Transform Changes
Table of Contents
1 Migration Issues
For the 9.2 release of ePublisher Pro, AutoMap, and Express, we have moved from version 1.1 of Microsoft's .NET platform to version 2.0. .NET 2.0 includes many improvements related to execution speed and overall application memory usage.
Unfortunately, certain XSL transform behaviors have changed from .NET 1.1 to .NET 2.0. As a result, existing ePublisher 9.0 and 9.1 formats may not behave as expected.
2 Formats and Override Resolution
ePublisher 9.2 introduces new ways to work with formats.
- Project Target Overrides
- Shared Transform Overrides
- Project Formats
- Standalone Stationery
- User Format Limitations
2.1 Project Target Overrides
In versions 9.0 and 9.1, ePublisher allowed users to override files, images, and transforms on a per target basis. This was implemented with a "Formats" folder in the project directory.
For 9.2, the current "Formats" folder has been renamed "Targets" to accurately reflect that these overrides are applied on a per target basis. The "Formats" folder still exists, but now applies to per format overrides. See Project Formats below.
2.3 Project Formats
As hinted above, projects may now contain a "Formats" folder. Any files in the project "Formats" folder override underlying User and Application format files. For products such as Express, this is important as Express does not support User and Application formats.
Users can even override "format.wwfmt" files at this level to introduce new stages and pipelines or modify existing ones.
2.4 Standalone Stationery
Stationery which includes a "Formats" folder becomes a self-contained snapshot of a given format and all dependant transforms, including those in the "Shared" directory.
2.5 User Format Limitations
To avoid issues with user formats overriding application formats, user formats must have unique names. If an application format is found with a certain name, say "WebWorks Help 5.0", then any user formats with that same name will not be available in ePublisher projects.
3 XSL and .NET
As ePublisher relies on the .NET XSL transform engine, changes in .NET can alter the behavior and performance of ePublisher.
3.1 XSL Behavior Changes
3.1.1 Node Fragments
3.1.1.1 .NET 1.1
In .NET 1.1, the XSL runtime often blurred the line between XML node fragements and XML node sets. So the following code worked in .NET 1.1 XSL transforms:
<xsl:variable name="VarResult"> <html:html> <html:head> <html:title>Wow!</html:title> </html:head> <html:body> <html:h2>Wow!</html:h2> </html:body> </html:html> </xsl:variable> <xsl:variable name="VarWriteDocument" select="wwexsldoc:Document($VarResult, 'C:\test.html', 'utf-8')" />
3.1.1.2 .NET 2.0
The XSL 1.1 specification states that node fragments are distinct from node sets and XSL processors should not allow them to be used interchangably. The .NET 2.0 XSL runtime strictly enforces this policy. Therefore, XSL developers must explicitly convert node fragments to node sets via the Microsoft msxsl:node-set() method.
3.1.2 Empty Element Handling
3.1.2.1 .NET 1.1
In .NET 1.1, XML elements were always written literally. So, XSL code to emit a non-empty element:
<xsl:element name="script" namespace="http://www.w3.org/HTML/1999"> <xsl:attribute name="src"> <xsl:value-of select="'scripts/webworks.js'" /> </xsl:attribute> </xsl:element>
results in:
<script src="scripts/webworks.js"></script>
This meant that <img> elements could be problematic because they must be empty elements:
<xsl:element name="img" namespace="http://www.w3.org/HTML/1999"> <xsl:attribute name="src"> <xsl:value-of select="'images/pretty.jpg'" /> </xsl:attribute> </xsl:element>
results in:
<img src="images/pretty.jpg"></img>
This is not appropriate for HTML or XHTML output. Therefore, two work-arounds were used in ePublisher 9.0 and 9.1:
Value Attribute Templates:: <xsl:variable name="$VarSrc" select="'images/pretty.jpg'" /> <img src="{$VarSrc}" /> wwexsldoc:MakeEmptyElement() extension method:: <xsl:variable name="VarImgAsXML"> <xsl:element name="img" namespace="http://www.w3.org/HTML/1999"> <xsl:attribute name="src"> <xsl:value-of select="'images/pretty.jpg'" /> </xsl:attribute> </xsl:element> </xsl:variable> <xsl:variable name="VarImg" select="msxsl:node-set($VarImgAsXML)/*" /> <xsl:copy-of select="wwexsldoc:MakeEmptyElement($VarImg)" />
Either method produces the required empty element:
<img src="images/pretty.jpg" />
3.1.2.2 .NET 2.0
In .NET 2.0, XML elements are automatically optimized by the XSL/XML runtime. So there is no need to use work-arounds such as Value Attribute Templates and wwexsldoc:MakeEmptyElement():
<xsl:element name="img" namespace="http://www.w3.org/HTML/1999"> <xsl:attribute name="src"> <xsl:value-of select="'images/pretty.jpg'" /> </xsl:attribute> </xsl:element>
results in:
<img src="images/pretty.jpg" />
Unfortunately, this is a problem for elements such as <script>, which is required to be a non-empty element in HTML and XHTML:
<xsl:element name="script" namespace="http://www.w3.org/HTML/1999"> <xsl:attribute name="src"> <xsl:value-of select="'scripts/webworks.js'" /> </xsl:attribute> </xsl:element>
results in:
<script src="scripts/webworks.js" />
There is no way to force a non-empty element without introducing white space or text. While this might work for <script> elements, it is not a suitable work-around for <a> elements.
3.1.3 White Space Preservation
3.1.3.1 .NET 1.1
XSL allows pretty printing of XML output as long as raw text is not emitted within a given context. So, consider the following XSL code:
<html:div> <html:span style="color: red;"> <xsl:value-of select="'F'" /> </html:span> <html:span style="color: blue;"> <xsl:value-of select="'irst'" /> </html:span> <html:span style="color: black;"> <xsl:value-of select="' letter and word are different.'" /> </html:span> </html:div>
When emitted to XML, HTML, or XHTML, the result is:
<div> <span>F</span> <span>irst</span> <span>letter and word are different.</span> </div>
While this is perfectly valid XML, where white space significance has different rules, this will not render properly in web browsers:
F irst letter and word are different.
To side-step this issue, for .NET 1.1 we emitted an empty text block to prevent the XSL/XML runtime from pretty printing in such a situation:
<html:div> <html:span style="color: red;"> <xsl:value-of select="'F'" /> </html:span> <xsl:text></xsl:text> <html:span style="color: blue;"> <xsl:value-of select="'irst'" /> </html:span> <xsl:text></xsl:text> <html:span style="color: black;"> <xsl:value-of select="' letter and word are different.'" /> </html:span> </html:div>
which results in the expected:
First letter and word are different.
3.1.3.2 .NET 2.0
In .NET 2.0, the same code that optimizes empty elements also strips out empty text blocks. So the above work-around no longer works since:
<html:div> <html:span style="color: red;"> <xsl:value-of select="'F'" /> </html:span> <xsl:text></xsl:text> <html:span style="color: blue;"> <xsl:value-of select="'irst'" /> </html:span> <xsl:text></xsl:text> <html:span style="color: black;"> <xsl:value-of select="' letter and word are different.'" /> </html:span> </html:div>
results in:
F irst letter and word are different.
There is no way to modify this behavior without introducing invalid markup into the output short of disabling all XML/XSL pretty print features.
3.2 XSL Behavior Solution
To address XSL behavior changes for ePublisher 9.2 and to guard against future incompatibilities, ePublisher fully controls the output behavior of all XML/XSL output instead of deferring to the .NET runtime.
3.2.1 Node Fragments
There is no simply workaround for the node set requirement. The only solution is to ensure all possible node fragments are converted to node sets before further processing is performed:
<xsl:variable name="VarResultAsXML"> <html:html> <html:head> <html:title>Wow!</html:title> </html:head> <html:body> <html:h2>Wow!</html:h2> </html:body> </html:html> </xsl:variable> <xsl:variable name="VarResult" select="msxsl:node-set($VarResultAsXML)" /> <xsl:variable name="VarWriteDocument" select="wwexsldoc:Document($VarResult, 'C:\test.html', 'utf-8')" />
3.2.2 Empty Element Handling
Empty element handling is dependant upon the type of output being generated by a style sheet. Standard XSL defines output formats for 'xml', 'html', and 'text'. XHTML is conspicously absent from this list. This leads to problems such as:
<script src="scripts/webworks.js"></script>
being emitted as:
<script src="scripts/webworks.js" />
Perfectly valid XML, but not valid XHTML.
To avoid this issue in the future, ePublisher now support an 'xhtml' output method. This is accessible with the wwexsldoc:Document() method.
wwexsldoc:Document($VarResult, 'C:file.out', 'utf-8', 'html'); wwexsldoc:Document($VarResult, 'C:file.out', 'utf-8', 'text'); wwexsldoc:Document($VarResult, 'C:file.out', 'utf-8', 'xhtml'); wwexsldoc:Document($VarResult, 'C:file.out', 'utf-8', 'xml');
For each method, the following behaviors are defined:
- HTML:
Force the following elements to always emit open/close elements:
<html></html> <head></head> <body></body> <script></script> <a></a>- Text:
- Unchanged from the standard .NET behavior.
- XHTML:
Allow only the following elements to emit as empty elements:
<area /> <br /> <hr /> <img /> <link /> <meta /> <param />- XML:
- Unchanged from the standard .NET behavior.
3.2.3 White Space Preservation
To enable control over white space between XML elements, ePublisher 9.2 introduces a <wwexsldoc:NoBreak /> element:
<html:div> <html:span style="color: red;"> <xsl:value-of select="'F'" /> </html:span> <wwexsldoc:NoBreak /> <html:span style="color: blue;"> <xsl:value-of select="'irst'" /> </html:span> <wwexsldoc:NoBreak /> <html:span style="color: black;"> <xsl:value-of select="' letter and word are different.'" /> </html:span> </html:div>
which results in the expected:
First letter and word are different.
4 URI Escaping
In .NET 1.1, it was possible to work with URIs without escaping encoding all characters. For example, a call to determine a relative URI such as:
<xsl:value-of select="wwuri:GetRelativeTo('file:///C:/With A Space.html', 'http://www.webworks.com')" />
results in:
With A Space.html
In .NET 2.0, URIs are always escape encoded, so the same call above returns:
With%20A%20Space.html
By and large, this is a good improvement. However, this causes issues with the current Java based formats, JavaHelp 1.1.3, JavaHelp 2.0, and Oracle Help.
Quadralay will update the Java base formats to workaround this issue, but it might cause side-effects in your own projects.
Utility methods to encode/decode URIs have been added to the "wwuri" extension name space:
EscapeUri(string unescapedUri) EscapeData(string unescapedString) Unescape(string escapedString)