YAWC Pro Administration Guide
Overview
XML is a highly versatile mark-up language that can be used to describe a variety of data for a large variety of applications and implementations. Consequently, YAWC Pro is highly customizable. It can be configured to output valid XML for any DTD, as well as post-process your XML using a custom XSLT style sheet. YAWC can verify your document structure according to your specifications and can insert arbitrary meta-data about your document.
In this document we will discuss how YAWC works with your system and interacts with your applications. We will summarize the files and libraries that YAWC uses, what they do and where you can get them.
We will also detail the YAWC's configuration files and settings and how you can customize these to work for you and your organization.
Architecture
YAWC takes the form of a Dynamically Linked Library (DLL). This means that there is no actual application (EXE) associated with YAWC. The program is 'linked' from within other applications and is executed as if it is completely integrated with the application. In most cases, YAWC is run from within Word using Word's scripting language (VBA), and takes the form of an extra menu. YAWC can also be run from within stand-alone applications, such as yawcBatch.exe (see Appendix A), and XMetal (see the Programmers Reference Guide).
This approach means that YAWC can be integrated into the current authoring environment. We will discuss this later in this guide. First we will look at what software and libraries YAWC uses and where it stores its configuration files.
Requirements
Software Recommendations
Although YAWC may be run from an application other than Word, YAWC requires that Word is installed and operational. We recommend that Word is open before YAWC is invoked from another application. Naturally, this is implied if you are running YAWC from within Word.
YAWC requires MS-Word 97 or later. YAWC has been tested with Word 97 and 2000. Word 2000 is preferred for such features such as HTML importing. YAWC is also compatible with Word XP, but has not been extensively tested.
YAWC has been tested mainly with Windows 98, NT 4 and 2000, but also runs with Windows 95 and Windows XP. YAWC also runs correctly under non-English versions of Word and Windows.
Libraries Used
The following libraries are required to run YAWC. The setup program should automatically install these into C:\Windows\System\1.
YawcCOM.dll
This is the actual YAWC program. Naturally, this is required to run YAWC.
TidyCom.dll
A utility to convert HTML to XHTML, based on Dave Raggetts' HTMLTidy program.
Standard Libraries
Visual Basic 6.0 Runtime Files.
http://download.microsoft.com/download/vb60pro/Redist/sp4/win98/EN-US/VBRun60sp4.exe
File Locations
YAWC normally stores its files in C:\Program Files\YAWC\2. Here is a summary of the files in the YAWC folder.
Yawc.log
YAWC writes details of each document conversion to the log file yawc.log. For each operation, the following information is recorded:
- the full filename of the current document;
- the version of YAWC, including the version number, edition and debug mode indicator
- the full filename of the configuration file;
- the time at which the operation was started;
- the type of operation.
The conversion operation also appends extra useful information:
- the full filenames of the XSLT stylesheet(s);
- the intermediate XML filename;
- the output XML filename;
- the conversion time.
Each operation also logs whether it succeeded or failed and why. This is a good place to check to see why YAWC did not complete an operation. The log file is especially useful for debugging conversion errors. Often, errors that are reported here are in more detail than in the YAWC progress window, particularly with errors whilst applying the XSLT stylesheet(s). Debug messages and warnings are also recorded here.
Yawc.dot
This file is a copy of Word's normal.dot template file, and is used in converting tables, as the normal.dot file can become corrupt and prevent successful conversion. Do not edit, modify or delete this file.
YawcBatch.exe
This program is a batch conversion utility, which can be used to convert a number of files together. This is documented in Appendix A.
Target DTD Folders
The sub-folders of YAWC's folder contain the set of settings for each individual conversion type. For example, there may be sub-folders such as 'docbook', 'wml', 'html' or something relevant to your organization. In each of these are the configuration files relevant to that type of conversion.
YAWC Pro supplies Word to XML mappings for two target DTDs, HTML and Simplified DocBook. Each mapping is contained in its own folder, and consists of a number of different files.
- YAWC Configuration file
- Word template file
- 1 or more XSLT post-processing scripts
- A sample Word document
- A generic HTML template (in the HTML folder only)
- A target DTD file
Configuration file (yawcXXXX.ini)
YAWC uses a configuration file to describe the style to element mapping, and other conversion parameters, for each target DTD. The default name of the configuration file is yawc.ini, but this can be over-ridden by setting an alternative value for the yawcConfigFile document property in the Word file for conversion.
Word Template File
This template should be attached to any Word documents that you wish to convert to this DTD. A shortcut to this file should be placed in Word's template directory. This allows the user to easily create a new document based on this template by clicking File->New in Word.
The template contains the YAWC toolbar and menu, and sets up a number of default document property values for documents based on the template, which are particularly relevant to HTML, as they include Dublin Core Metadata Element values.
XSLT Files
There should be at least one XSLT stylesheet in the folder. This stylesheet is used to post-process each XML file after it has been converted by YAWC. This is known as the 'pass 1' stylesheet. Depending on your edition of YAWC, there may be more XSLT files for subsequent passes and/or an import stylesheet for importing HTML into Word. The names of these files are specified in the configuration file.
HTML Files
If you are using YAWC to output HTML, there may be one or more HTML template files in this directory. These files specify the layout and navigation surrounding the content HTML generated by YAWC. These filenames are specified in the configuration file and in the XSLT files.
We will not be covering HTML conversion in this guide. Instead you should refer to the Webmasters Guide.
Manual Installation
The YAWC setup program will usually install YAWC correctly. However there may be times when a manual installation is needed. For example, for non-English versions of Word or for customized configurations. This section details the steps needed to install YAWC manually. You may need to download the relevant libraries from the YAWC website, http://www.yawcpro.com/ or to create the appropriate files by hand.
Installing TidyCom
YAWC needs TidyCOM (a utility to convert HTML to XHTML). Download TidyCOM from http://perso.wanadoo.fr/ablavier/TidyCOM/TidyCOM.zip. Use Winzip (http://www.winzip.com) or a similar program to extract the 'TidyCOM.dll' file into your Windows system directory. Use the supplied 'regsvr32.exe' program to register the dll file. To do this, click 'Start' -> 'Run' and type the following: "regsvr32 c:\winnt\system32\TidyCOM.dll" (or wherever you placed the TidyCOM.dll file.
Installing MSXML
YAWC needs MSXML Version 3.0. To find out what version of MSXML you have installed, there is a utility at http://www.bayes.co.uk/xml/index.xml called 'MSXML Sniffer'.
MSXML v3.0 can be downloaded from http://msdn.microsoft.com/xml/ or http://download.microsoft.com/download/xml/Install/3.0/WIN98Me/EN-US/msxml3.exe. Full installation instructions are provided with the package.
Creating a Word Template
To create a new Word template for use in converting Word documents into your own DTD, copy the existing YAWC HTML Word template (HTML\HTML-en.dot) and place it in a new directory. This ensures that the new template already has the YAWC toolbar and menu. Edit the user-defined styles to suit your own requirements, and add any default text and styles you require. Create a shortcut to the file, and move the shortcut to the Word template directory (C:\Program Files\Microsoft Office\Templates in the english edition), to make it easy to access for users.
Testing the Template
Close this new template. Open a new document by clicking 'File' -> 'New', and select the shortcut to your template. Type some sample text using different styles and save the document. Run YAWC>Export, and check that content has been converted and placed in a file in the same directory as the saved document, with a .xml or .htm suffix.
Customization
While there are already pre-defined conversion sets for major formats such as Docbook, WML and HTML, YAWC can be configured to output any XML format, simply by creating a new set of configuration files.
This section focuses on how to configure YAWC to convert your existing and new Word documents into XML useful to you and your organisation. We assume that you already have a XML format defined and, optionally, a DTD that YAWC can validate the XML against.
We will first examine the Docbook conversion set, which comes bundled with the default installation of YAWC, then demonstrate a new, custom conversion type and show how this can be integrated with Word.
Configuration File
The configuration file is stored in each of the subfolders of YAWC. The configuration file can theoretically have any name, but, to avoid confusion, we recommend calling it yawcXXXX.ini where XXXX describes the target DTD, e.g. yawcSDocBook.ini. If the filename is not yawc.ini, you must set the yawcConfigFile document property in your Word template to the name of the file. If this property is undefined, YAWC defaults to yawc.ini.
The configuration file is made up of sections detailing distinct options. Each section is described below, and contains either a block of text (in the case of the prolog and epilog) or a name/value property pair on a single line. The name of each property is the first part of the line, and the value is the second part. A single tab character must separate the property name and value. The property name, which may be a user-defined style name, may contain spaces. Any whitespace in the value part of the attribute is treated as part of the value.
Prolog and Epilog
The prolog contains fixed XML content, which is prepended at the beginning of the initial XML file output. Similarly, the epilog contains fixed XML content, which is appended at the end of the initial XML file output.
These sections allow you to place fixed markup around your document content. The epilog and prolog determine the header and footer of the initial XML file, i.e. the temporary XML file that is generated directly by YAWC from your Word document. So, depending on your XSLT transformation, this may or may not be in the final XML file.
The prolog and epilog must contain the start and end tags of the root element. Failure to do this will cause YAWC to output non-well-formed XML. The prolog should also contain the XML declaration and specify the character encoding.
The special processing instruction <?yawc custom-properties?>, can also be included in the prolog. This will be replaced by all user-defined document properties with non-nullvalues, during the conversion process.
[prolog]
<?xml version="1.0" encoding="US-ASCII"?>
<article>
[/prolog]
Element Mappings
A large portion of the configuration file concerns itself with mapping Word styles into XML elements or tags. This is evidently the most important part of the configuration file, and is crucial to the correct output of your XML. It is divided into four separate groups.
Sections
This section defines the mapping between Word heading styles and XML hierarchical container elements. YAWC uses the predefined Word Heading styles to identify different sections and levels of nesting. This is not required for HTML, but is for DocBook, and most other structured DTDs.
The YAWC keywords yawcLevel1ElementName, yawcLevel2ElementName, yawcLevel3ElementName and yawcLevel4ElementName specify the names of the XML container elements that YAWC should wrap around each section.
yawcLevel1ElementName part
yawcLevel2ElementName chapter
yawcLevel3ElementName sect1
List Styles
This section defines the mapping between Word list styles and XML elements. Lists require special handling in YAWC in order to wrap the appropriate list container element around the list item elements and to support nested lists.
Note that the Word styles 'List', 'List 2', etc, and 'List Continue', 'List Continue 2' etc, are not treated as lists, and are handled in the next section. The XSLT stylesheet may also be required to do some amount of post-processing in order to make the 'List Continue' styles work correctly, by embedding them as paragraphs in the previous list item.
The YAWC keywords yawcBulletedListElementName and yawcNumberedListElementName map the list styles to XML elements.
YawcBulletedListElementName itemizedlist
Paragraph Styles
This section defines the mapping between Word paragraph-level styles and XML elements. The Word styles can be Word built-in styles (denoted by a prefix of wdStyle) or user-defined styles. If a style is not listed here, then YAWC maps the style to the element specified in the wdStyleNormal property, with a class attribute specifying the name of the style. Eg <p class="user-style">.
wdStyleNormal para
Acknowledgement ackno
Character Styles
This section defines the mapping between Word character-level styles and XML elements. The special YAWC keywords yawcItalicElement, yawcBoldElement, yawcUnderlineElement, yawcSuperscriptElement, yawcSubscriptElement and yawcBookmarkElement are use to map the standard unnamed character styles to elements.
yawcItalicElement emphasis
yawcBoldElement emphasis role="strong"
Metadata
Options for configuring the document's meta-data output are contained in the 'METAInformation' section. Each line in this section represents a single metadata element, i.e. its name and its value. Each metadata element may be either built-in or custom.
The built-in metadata elements are those defined in the Dublin Core Metadata Element Set, and their value can be changed on a per-document basis by theYAWC>Edit Metadata... command. The following are valid built-in meta-tags: DC.Title, DC.Publisher, DC.Date.Created, DC.Identifier, DC.Description, DC.Creator, DC.Type, DC.Subject, DC.Date.Modified, DC.Contributor, DC.Format, DC.Source, DC.Language, DC.Relation, DC.Coverage, DC.Rights and DC.Keywords.
Note: the examples in this section are taken from a typical HTML configuration file.
DC.Format text/html
Note: The DC.Type and DC.Creator metadata element values are slightly unusual in that they describe the contents of the dropdown list box to be displayed in the dialog box, rather than default values. These property values take the form of square parentheses with a vertical bar delimiting each value choice. The first value is the default should none be chosen in the dialog box.
DC.Creator [Creator-1|Creator-2|Creator-3|Creator-4]
The second type are custom metadata elements. These can take any name, but cannot be modified by the user in the standard dialog box (although you could create a special dialog box to do so). An example of the use of custom meta-tags is for storing organisation-wide data, which should be contained in every document, e.g. a postal address or phone number.
Organisation.Telephone 555-1234
Both built-in and custom metadata elements can contain either static text or a YAWC variable or a combination of the two. A variable is a keyword contained in curly parentheses, and is replaced by normal text when the document is converted. The variables are usually information about the version of YAWC, the user or the document, e.g. "{createdDate}". A full list of these variables is available in the next section under "Customizing Metadata".
A useful way for the user to overwrite a custom meta-tag is to define a custom property in the document with the same name as the meta-tag. YAWC will use this value instead of the value in the configuration file. This also applies to built-in properties, which are in fact stored internally in this way. An alternative method for outputting custom metadata is to define a custom property with that name in the document template, so each document created using this template will inherit this custom property and therefore output this metadata.
Conversion Options
The "ConversionOptions" section relates to how YAWC applies XSLT stylesheets to your document. Depending on your version of YAWC, you can have either one or many transformations. The options in this section are specified using an XML data format, and each child element ("yawcXSLTTransformation") describes an individual transformation. The "pass" attribute of this element dictates the order in which the stylesheets are applied to your XML file. The yawcXSLTTransformation element uses the following children to describe the transformation.
<yawcXSLTTransformation pass="1">
...
</yawcXSLTTransformation>
Note: If the conversion options section has no yawcXSLTTransformation elements, the intermediate XML file (in your temporary folder) will simply be copied to the destination folder.
yawcStylesheet
This is the name of the stylesheet that is applied to the original XML file to form the new XML file. The filename may be an absolute path or a filename relative to the conversion type folder.
<yawcStylesheet>yawcSDocBook.xsl</yawcStylesheet>
yawcProcessor
The XSLT processor to use to apply this XSLT transformation. Currently, the only supported processor is "MSXML3".
<yawcProcessor>MSXML3</yawcProcessor>
yawcOutputFileExtension
The suffix to replace .doc in the name of the output file after transformation. The name will then be the same as the document name, but with this extension. The default is .xml for DocBook and .htm for HTML. Depending on your version of YAWC, you may not be able to change this value.
<yawcOutputFileExtension>.xml</yawcOutputFileExtension>
yawcStripHTMLContentType
Removes the "http-equiv" meta-tag that is automatically generated by html transformations. May be "yes" or "no".
<yawcStripHTMLContentType>yes</yawcStripHTMLContentTypeStrip>
yawcKeepTempFile
Specifies whether to remove the temporary XML file (in your temporary folder) after conversion is completed. May be "yes" or "no", and defaults to "no". It may be useful not to delete this file while you are developing your XSLT stylesheet. This option can be used in any transformation, but it makes sense to use it in the pass 1 transformation.
<yawcKeepTempFile>no</yawcKeepTempFile>
param
One or more elements describing the parameters to be passed to the XSLT stylesheet. These are empty elements that take the "name" and "select" attributes.
<param name="deleteEmptyParagraphs" select="'yes'" />
Verification Options
YAWC supports a limited form of checking of the Word document, which is not equivalent to XML validation. Two checks are carried out. A structural check ensures that heading styles are nested correctly. A sequence check ensures that the first styles used in the document conform to the order defined in the yawcVerifySequence property. This enables YAWC to enforce an order of styles that must be contained in the document. If the document does not meet the criteria described by this option, then YAWC will not convert it. The YAWC>Verify command (not available in YAWC Lite) will display the errors to the user.
The syntax of this property is similar to a DTD content model description: a comma-separated list of style names, either word built-in styles or user-defined styles, separated by commas. The order of styles here dictates the order that the styles must be in the document. A "?" character at the end of a style name indicates that this style is optional. A "%" character at the end indicates that any style may be next.
yawcVerifySequence wdStyleTitle,%
The XSLT Stylesheet
Ordinarily, YAWC will output well-formed XML directly from word, which may immediately meet your requirements. Usually, however, there may be a need to perform some post-processing on this XML file. For example, the Docbook XSLT stylesheet will merge "List Continue" blocks into correct hierarchical list blocks. You may also want to consolidate data from the document into document headers, for example the document title. You may also want to rename the meta-information to something else, instead of the YAWC default meta elements.
If you are happy with the output from YAWC and do not need to apply any XSLT stylesheet, then don't define the yawcPass1Stylesheet option in the configuration file. Alternatively, use your stylesheet to perform an 'identity transform', i.e. simply copy all the nodes and attributes from the original XML file to the new XML file. This may be useful for improving the appearance of the final XML file, e.g. by setting the 'indent' XSLT option.
Document Properties
Word document properties are used by YAWC to store metadata about a document. This meta-information is then transformed into meta elements during conversion. However, there are two important custom properties that are not related to the document's meta-information.
yawcConfigFile
Specifies the configuration file that YAWC uses. This filename may be an absolute path or a relative filename. YAWC will look in the conversion type folder for the configuration file with this name. If this custom property is omitted, YAWC will assume the file is called yawc.ini.
yawcHTMLTemplate
Specifies the HTML template to be used in the HTML XSLT transformation. Simply said, this property overrides the HTML template param option in the configuration file. YAWC also recognises the custom property yawcHTMLTemplateFile as having the same function, but this notation is deprecated.
These two custom properties should be stored in your document template for your organisation. Having multiple Word templates each with different values of these attributes allows for greater flexibility in the document conversions. For example each section of your website may have a different appearance and may need a different yawcHTMLTemplate. In fact, each individual document may define these custom properties.
Creating a New Conversion Set
To create a new conversion set, make a new folder in your YAWC folder, with a name relevant to your organisation or conversion type. This folder will contain the configuration file(s), the XSLT file(s), the Word template(s) and the HTML template(s), if any. The easiest way to create your configuration file is to copy it from another conversion set and modify to your needs. Rename this file to something relevant, ie. yawcXXXX.ini, where XXXX is the name of the folder that it is in. If you have any XSLT files or HTML templates, place them in this directory and ensure that they have the same filenames as specified in your configuration file. Finally, create your Word template as described in the next section.
Creating Word Templates
The easiest way to create a Word template for your conversion set is to copy an existing YAWC Word template (eg. DocBook-en.dot). In this way you do not need to set up the menus and macro stubs. Copy the Word template into your new folder and rename it. The filename must have the same name as the folder.
For convenience sake, it is a good idea to place a Windows shortcut to this file in Word's template directory. This allows the user to create a new document based on this template by clicking File->New in Word.
After this is done, we need to customize this template to we useful for our new conversion set. To do this, first open the template in Word.
Document Properties
Use the command YAWC>Options to change the values of the yawcConfigFile and perhaps the yawcHTMLTemplate properties for this template.
Customizing Menus
You may need or wish to modify the YAWC menu to suit your needs. To do this, click on the 'Tools' menu in Word then the 'Customize' item and finally 'Commands'. You may need to select your document template from the 'Save In' drop down list in order to display the YAWC menu. To rename or remove the menu or menu items, simply right click on the item and select delete or type in the new name.
Advanced YAWC Options
While the level of YAWC configuration provided by the configuration file and custom properties is adequate for most YAWC administrators, it is possible to set more advanced YAWC options, which control the way that YAWC operates. These options are set by using Word's Visual Basic Editor.
To set these options, open your template in Word and press open the Visual Basic Editor (Alt-F11). To the left of this window, you will see a list of modules for the Template. Click on "yawcVBA" and you will see some Visual Basic code. You may notice the reference to the YAWC object is called yawcObj. This object contains a reference to another object, yawcOptions. By setting attributes of this object, it is possible to set some advanced YAWC options.
For example, if I didn't want YAWC to write a log file, I would place the following line before the "yawcObj.Convert" line:
yawcObj.yawcOptions.writeLog = false
A full list of advanced YAWC options is available in the YAWC Programmer's Guide.
Customizing Metadata
As we said before, it is possible to have YAWC variables in the 'METAInformation' section of your configuration file, eg '{createdDate}'. These variables are used to output various attributes of YAWC, the user or the document in the form of meta-tags. The following lists shows what variables are available to you to use in your configuration file and a brief description of what they do.
YAWC Variables
Note: some of these options depend on your operating system, and whether your computer is connected to a network or not.
- {applicationBrandname}
The application name set by the advanced option of the same name. Default is "YAWC". - {computerName}
The hostname of the computer that this copy of YAWC is running on. - {errMessages}
Deprecated. - {exeName}
The name of the YAWC application. This should be 'yawccom.dll' in most cases. - {generator}
The YAWC generator string used by the standard "generator" meta-tag. This is normally "YAWC" followed by YAWC's website's URL. - {iniFile}
The configuration file used by YAWC to convert this document. - {logFile}
The file to which YAWC logged this conversion. - {originalDocument}
The full file name of the Word document that this file was converted from. - {os}
The operating system that this documents was converted on. - {userDomain}
The domain name of the network that this computer is connected to. - {username}
The Windows NT username of the user that converted this document. - {windowsDirectory}
The Windows directory path. Usually "C:\windows" or "C:\winnt". - {xmlFilename}
The XML file name of the final XML document. - {yawcCompanyName}
The company name that this computer was registered to. - {yawcEdition}
The edition of YAWC, ie Lite, Plus, Pro or Enterprise. - {yawcVersion}
The unique version string that identifies this copy of YAWC. Consists of the string "YAWC", the application brandname, the version number (major.minor.revision), the edition (see above), the name of the application and a debug mode identifier.
Document Variables
Note:these properties apply to the original Word document and some may be inaccurate when the document is converted to XML.
- {appname}
The name of the application that created this document. Normally "MSWORD". - {attachedTemplate}
The file name of the template attached to this document. - {bytes}
The size (in bytes) of this Word document. - {category}
The category or this document, as described by the document properties dialog. - {characters}
The number of characters in the document. - {charactersWithSpaces}
The number of characters (including spaces) in the document. - {comments}
Comments about this document, as described by the document properties dialog. - {company}
The company that owns the document, as described by the document properties dialog. - {createdDate}
The date this document was created. - {format}
The format of this document, ie DOC, HTML, etc. - {hyperlinkBase}
The hyperlink base property, as described by the document properties dialog. - {keywords}
Keywords for this document, as described by the document properties dialog. - {lastAuthor}
The name of the last person to edit this document. - {lastPrintedDate}
The date that this document was last printed. - {lines}
The number of lines in the document. - {manager}
The manager of this document, as described by the document properties dialog. - {modifiedDate}
The date this document was last modified. - {pages}
The number of pages in the document. - {paras}
The number of paragraphs in the document. - {revision}
The revision number of this document. Ie, how many times it has been saved. - {security}
The security of this document. - {subject}
The subject of the document, as described by the document properties dialog. - {title}
The title of the document, as described by the document properties dialog. - {totalEditingTime}
The total editting time for this document. - {words}
The number of words in the document. - {titleFromDocument}
The title of the document taken from the document's first paragraph with the 'Title' style (if any).
Troubleshooting
|
Problem |
Cause |
Solution |
|
Tables do not convert under Word 97. |
Word's HTML converter is not installed. |
Install the converter using the Microsoft Office setup program. |
|
Tables to not convert under Word 97 or Word 2000. |
Yawc.dot is corrupt. |
Reinstall YAWC. |
|
Error message: "Could not access Word document". |
Word is not running or is not installed. |
Install or run Word. |
|
Error message: "Could not load xxx.dll". |
xxx.dll is not installed or is corrupt. |
Download the library package from http://www.yawcpro.com/. |
|
Error message "Could not find INI file". |
INI file is missing or a file has been named incorrectly. |
|
|
I can't convert to HTML using YAWC Lite. |
YAWC Lite is an XML converter only. |
Upgrade to YAWC Pro or YAWC Enterprise. |
|
YAWC enters the 'yawcInfo' and 'generator' meta-tags into my document even though they don't exist in my configuration file. |
You are using YAWC Lite. YAWC Lite inserts these automatically. |
Upgrade to YAWC Pro or YAWC Enterprise. |
|
I can't find my YAWC log file. It's not in the usual place. |
YAWC may not be able to find its home folder. |
Check your temporary folder for 'yawc.log'. |

