Xml Subtractor

A few weeks ago, had to work with two xmls where the first xml was a superset of the second xml and had to figure out only the xml nodes in first xml that were not in the second xml.

Mathematically expressed,

Given, xml2 Ε xml1
Find: xml1 – xml2

I had written a small program to do that and forgotten about it. Well today faced the same issue but with two substantially large xmls. Glad had the program earlier and it helped save a ton of time.

Groovy makes xml processing very simple and perhaps create an “xml arithmetic library”, using Groovy Categories et al, for what its worth.

void xmlMinus() {
String xmlText1 = new File("filename1.xml").text
String xmlText2 = new File("filename2.xml").text

def xml1 = new XmlParser().parseText(xml1)
def xml2 = new XmlParser().parseText(xml2)

xml2.each {
def uniqueId = it.'@uniqueId'
println "finding $uniqueId
def node = xml1.find { it.'@uniqueId' == uniqueId }
xml1.children().remove(node)
}

java.io.PrintWriter fw = new java.io.PrintWriter(new FileWriter("xml1-minus-xml2.xml"))
XmlNodePrinter xnp = new XmlNodePrinter(fw)
xnp.preserveWhitespace = true
xnp.print(xml1)
fw.close()
}

Example:

File List: filename1.xml

<Item uniqueId="1">
  <Details />
</Item>
<Item uniqueId="2">
  <Details />
</Item>
<Item uniqueId="3">
  <Details />
</Item>
<Item uniqueId="4">
  <Details />
</Item>

File Listing: filename2.xml

<Item uniqueId="3">
  <Details />
</Item>
<Item uniqueId="1">
  <Details />
</Item>

File Listing: xml1-minus-xml2.xml

<Item uniqueId="2">
  <Details />
</Item>
<Item uniqueId="4">
  <Details />
</Item>