email me

Opening and viewing existing XML files

We will work below with a simple XML file called food.xml. What can we do with it? The simplest thing we can do is send it to the terminal for display:

% xml-cat food.xml
<?xml version="1.0"?>

  <product price="3">Chicken</product>
  <product price="11.50">Lobster</product>
  <product price=".20">Apple</product>
  <product price="1.09">Milk (2 litres)</product>


Of course this is trivial, and we don't need a special command just to do this. We could simply use the ordinary cat(1) command. However, there are some differences here.

The first difference is that xml-cat(1) also checks the file food.xml for integrity, and whereas cat(1) prints whatever it finds in the file as-is, here xml-cat(1) will print an error (and refuse to continue) as soon as it finds that food.xml is not well formed XML. It actually is well formed, so we don't see an error message in this case.

Thus we get an implicit guarantee from xml-cat(1), that whatever it allows to be printed will be suitable for another XML processor to consume. The guarantee is weak, however, and is not a full validity guarantee, only a well formedness guarantee. All the xml-coreutils(7) commands process well formed XML documents and always ignore validity. This is because they are likely to be used on XML fragments, which don't usually carry their own validation specs.

The second difference between cat(1) and xml-cat(1) is at first surprising: the existing top level element (called <products>) in the food.xml file is discarded, and replaced with a generic <root> tag. Why does this occur?

Just like with cat(1), the main task of xml-cat(1) is concatenation, ie taking two or more XML files as input and creating a single XML file which contains them all as output. But a well formed XML file must only contain a single top level tag, and therefore xml-cat(1) does the simplest thing it can to satisfy this constraint (as well as a few others we won't mention here): it removes the top level tag from each input file, and wraps the output in a single <root> tag. You'll see this in action below. The generic root tag is also a handy reminder that the output is no longer associated with a DTD.

Although xml-cat(1) is nice for inspecting small XML files, for larger files a specialized viewer is essential. The xml-coreutils(7) include such a viewer, called xml-less(1). This is a terminal based interactive viewer, which is inspired by less(1), but with some extra advantages: because it understands the structure of XML files, it can do things that less(1) cannot, such as folding (press the TAB key), word wrapping (press the W key), showing or hiding attributes (press the A key), etc. You can try it out as follows:

% xml-less food.xml

One more command should be discussed straight away, and that is xml-fixtags(1). This command takes an XML file which is not necessarily well formed, and repairs it so that it becomes well formed XML. It can be used to fix small problems, and can even convert an HTML file into XML. However, be warned that the repairs are "dumb", and will probably not be as expected.

Aside from xml-fixtags(1), all the other xml-coreutils(7) commands expect their input XML files to be well formed, or will signal an error. This follows the XML standard modus operandi, and also prevents duplication of functionality.

% xml-fixtags food.xml | xml-less
% xml-fixtags --html xml_coreutils_tutorial.html | xml-fmt