Skip to content

alexdd/XMLDifferExperiment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

XMLDifferExperiment

Experimental XML Diffing with XSLT

One of UWE’s key features will be managing and comparing document versions. Whereas commercial solutions employ a rather scientific approach, see e.g. here and/or a rather complex one, see here … so called “diffing feature” in UWE will be implemented in a very simple way. UWE’s WYSIWYG editor is the only place where you can modify UWE documents. Thus if we assign an unique ID on each element that we insert (paragraphs, lists, tables, images, …) we will be able to use the following algorithm in order to mark changes when comparing two different versions of one document:

FIRST STEP: analyze versions

if there is an ID in the new version which does not exist in the old version, then mark the element with this ID as NEW
if there is an ID in the old version which does not exist in the new version, then mark the element with this ID as DELETED
if there is an ID which exists in both versions, then compare text content of both versions, and if content changed then mark element with this ID as CHANGED otherwise mark as UNCHANGED

At this point we have marked elements in both versions. But what we want to have is one single document in which all marked elements will be merged in correct order. Thus the next step will be merging old and new version. Actually this step reassembles to copying elements which have been marked as DELETED from the old version into the new version. The tricky part is putting these elements into the right place, but with some magic XPATH selectors we have successfully been coping with this problem.

SECOND STEP: merging

traverse new version and if preceding-sibling of identical element (same ID) in old version is marked as DELETED then copy all direct preceding siblings which are marked as DELETED from old version into new version just before the current element.

when traversing new version: if all following-sibling elements of the current element are marked as DELETED in the old version, then copy this block of DELETED elements just after the current element

Now we have one document with all elements marked. Everything could have been done using XSLT stylesheets .

THIRD STEP: copy old text of CHANGED elements into merged document in order to use Python’s difflib

After this step each CHANGED element will occur twice in the merged document. like this:

<elem diffing-status=”changed” diffing-version=”old”>[...] some deleted text [...]</elem>
<elem diffing-status=”changed” diffing-version=”new”>[...] [...]</elem>

FOURTH STEP: use Python’s difflib in XSLT stylesheet extension call on merged document

After this step each CHANGED element will occur only once and will contain tags inserted by Python extension call, like so:

bla bla <del>some deleted text</del> bla bla

FIFTH STEP: a simple XML to HTML transformation will visualize all changes: red colored and crossed through text for deleted elements and green colored text for added elements.so far. But when detecting atomic text changes we will need to use Python’s difflib

is licensed under the GNU Lesser General Public License, see file license.txt. Chinese group labels were extrachted from the Unicode Website, see UNICODE, INC. LICENSE AGREEMENT - DATA FILES AND SOFTWAREib

XMLDifferExperiment is licensed under the GNU Lesser General Public License, see file license.txt.

About

Experimental XML Diffing with XSLT

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published