If you ever converted a document from an unstructured format, like MS Word or unstructured FrameMaker, to DITA, you know that most of conversion tools are style-based.

They all require from you to map the styles of the unstructured document to DITA elements. Your conversion mapping may look something like this:

MS Word Style DITA Element Wrap into Parent Element
Heading1, Heading2, Heading3 <title> <concept> or <task>
BodyText <p>  
Bullet1 <li> <ul>
Numbered <cmd> <step>
Bold <uicontrol>  

As long as your documents are consistently and properly styled, you are probably in a good shape provided that you’ve invested enough time to create an accurate and precise conversion mapping. However, too often it’s not the case. Too often, documents contain manual ad hoc formatting along with repetitive and inconsistently used styles.

This means that defining the mapping won’t be enough. You’ll have to do a quite extensive cleanup before or after the conversion to get a good looking DITA.

Before I got involved into the development of DITAToo DITA CMS, I’ve been in a consultancy business for many years. I know very well what a nightmare it can be to prepare a document for conversion. So when we were working on DITAToo, we thought: “Does the conversion have to be styled-based? What if we emulate the behavior of a human to recognize content?” Indeed, we, as humans, can say that this is a bullet or number just by looking at the piece of text. We don’t need a special MS Word style to be applied to understand that this is a list item, step in a procedure, or a paragraph. We just see it.

As a pure experiment, we’ve created a Word-to-DITA conversion feature that was based on a visual representation of the content rather than on styles and added to the very first release of DITAToo DITA CMS. You could take any MS Word document regardless of whether or not it was properly styled, upload it to the DITAToo content repository, and DITAToo automatically converted it to DITA without requiring from you anything. DITAToo automatically recognized the information type of the original content and converted it to DITA concepts and tasks.

It worked pretty well, but quite honestly, we didn’t expect what’s happened then. As DITAToo was getting traction, in addition to customers who were buying DITAToo to manage DITA, we began to receive requests from those companies that had a DITA CMS in place already or were not ready yet to a DITA CMS, but still wanted to migrate content from MS Word to DITA.

They were ready to buy DITAToo just for the conversion. They were impressed by the time they could save by eliminating the need to build a conversion mapping and wanted just this conversion feature. So we thought: “We have the conversion algorithm already, and it seems to have a value on its own. Why won’t we productize this particular feature and make it a stand-alone product?”

This is how ConverToo was born.

We’ve added a very simple user interface, added some settings that allowed you to fine tune the conversion, and made the conversion algorithm more sophisticated.

For about a year, ConverToo was in the stealth mode. We were offering it primarily to our implementation partners and some other consultancy firms to let them convert legacy MS Word documents of their customers in a much more efficient way. They became our early adopters who were giving us extremely valuable feedback that it’s hard to overestimate.

Based on this feedback, we have been constantly improving the conversion algorithm and adding new features until today, when we decided that ConverToo is mature enough to make it available to everyone.

So now, whether you are a company moving to DITA, a consultancy firm converting legacy content for your customers, or an independent contractor migrating your client to DITA, you can use ConverToo to speed up your conversion work and make it more efficient.

We’ve recorded this short video to show you ConverToo in action. If you want to use ConverToo, just drop us a line at info@intuillion.com


Leave a Reply

Your email address will not be published. Required fields are marked *