welcome/
java-mcmc/
software/
papers/
links/
email me

Changing the structure of an XML file

Besides cat(1), one of the most useful shell commands for interactive use is head(1), which truncates its input after a few lines. There are multiple generalizations of this idea for XML documents.

The xml-head(1) command has three main switches. The switch -t truncates the tags, ie displays only the first few tags (but still generates well formed XML). The -c switch truncates the text fields, ie displays only the first few characters wherever text is present, but leaves the tags as is, and the -n switch tuncates lines, so that each text field does not exceed a certain number of lines. All three main switches can be combined.


% xml-head -t 3 People.xml
<?xml version="1.0"?>
<People>
	<Person Name="Fred Davis">
		<Address>
			<LineOne>4 Bushy Street</LineOne>
</Address>
</Person>
</People>
% xml-head -c 2 People.xml
<?xml version="1.0"?>
<People>
	<Person Name="Fred Davis">
		<Address>
		<LineOne>4 </LineOne>
		<LineTwo>Gr</LineTwo>
		<County>Ma</County>
		<Country>Ir</Country>
		</Address>
		<TelNo>+3</TelNo>
	</Person>
</People>

Another way to modify the structure of an XML file is with xml-cut(1). In traditional Unix, the cut(1) command prints columns from an input file that is viewed as a table (the exact meaning of a column is determined by switches). To understand xml-cut(1), think of a fully indented XML file, where each level of indentation is printed in its own column:


         0           | 1  | 2  | 3  | 4
----------------------------------------
<?xml version="1.0"?>|    |    |    |
                     |<a> |    |    |
                     |    |<b> |    |
                     |    |    |<c> |
                     |    |    |    |xyz
                     |    |    |</c>|
                     |    |</b>|    |
                     |</a>|    |    |

Now we can print only the columns 2 and 4 as follows:


% xml-echo -e '[a/b/c]xyz' | xml-cut -t 2,4
<?xml version="1.0"?>
<root>

	<b>
			xyz
		</b>

</root>

Note that the closing tag </b> in this example is out of alignment. This makes sense, once you realize that the "xyz" text field really begins with the first newline after <c> and contains all the whitespace before </c>. As usual, xml-fmt(1) can be used to align the tags if necessary.

Structural surgery can also be performed using xml-rm(1), xml-cp(1) and xml-mv(1). These commands remove, copy, and move entire subtrees of an XML document.


% xml-rm food.xml :/products/product[2]
<products>

  <product price="3">Chicken</product>
  
  <product price=".20">Apple</product>
  <product price="1.09">Milk (2 litres)</product>

</products>
% xml-cp food.xml :/products/product[2]/ \
        People.xml ://TelNo/
<?xml version="1.0"?>
<People>
	<Person Name="Fred Davis">
		<Address>
			<LineOne>4 Bushy Street</LineOne>
			<LineTwo>Green Road</LineTwo>
			<County>Mayo</County>
			<Country>Ireland</Country>
		</Address>
		<TelNo>Lobster</TelNo>
	</Person>
</People>
% xml-mv food.xml :/products/product[3] \
        food.xml :/products/product[1]/
<products>

  <product price="3"><product price=".20">Apple</product></product>
  <product price="11.50">Lobster</product>
  
  <product price="1.09">Milk (2 litres)</product>

</products>