Ruh-roh, XML! Introducing grex

I’m being haunted by XML. Or I must be, because this is the second Halloween that I’ve spent writing a little CLI tool to wrangle it. Last year it was xpath-cli, which evaluates XPath expressions on XML or HTML documents. It’s simple (just a tiny wrapper around libxml2), but it has seen me through many perils.

This year, the apparitions are back, and to fend them off I wrote grex. grex flattens an XML document into a line-oriented format that’s easy to work on with UNIX tools like grep, sed, awk, diff, cat, and so on. Afterwards you can use the --ungrex mode to convert back to XML (if you dare!). It’s directly inspired by gron which does the same thing for JSON data.

Here’s an example. Given this XML document:

<mystery id="014" episode="Go Away Ghost Ship" location="Harbor">
  <suspect>Ghost of Redbeard</suspect>
  <clues>
    <clue location="Ghost Ship">Dry ice in storeroom</clue>
    <clue location="Ghost Ship">Wire-operated sword</clue>
    <clue location="Hidden Cave">Paper pirate hat</clue>
  </clues>
  <culprit>C.L. Magnus</culprit>
  <motive>Insurance fraud on own ships</motive>
</mystery>

grex produces:

/mystery/@episode = Go Away Ghost Ship
/mystery/@id = 014
/mystery/@location = Harbor
/mystery/suspect/text() = Ghost of Redbeard
/mystery/clues/clue[1]/@location = Ghost Ship
/mystery/clues/clue[1]/text() = Dry ice in storeroom
/mystery/clues/clue[2]/@location = Ghost Ship
/mystery/clues/clue[2]/text() = Wire-operated sword
/mystery/clues/clue[3]/@location = Hidden Cave
/mystery/clues/clue[3]/text() = Paper pirate hat
/mystery/culprit/text() = C.L. Magnus
/mystery/motive/text() = Insurance fraud on own ships

Each line starts with an XPath expression that describes a location in the XML tree, followed by an = sign and the value at that location. This format is easy to search and manipulate. Want to list all locations? Use grep '@location' | sed 's/.* = //' | sort -u. Want to see what changed between two files? Use diff <(grex before.xml) <(grex after.xml).

The transformation is reversible.1 You can flatten XML with grex, modify it with standard UNIX tools, then use grex --ungrex to convert back to XML. For example, say you want to rearrange the culprit and motive elements into a new container element:

$ grex mystery.xml \
  | sed 's:/culprit/:/culprit/name/:' \
  | sed 's:/motive/:/culprit/motive/:' \
  | grex --ungrex

…and you’ll get out something that looks like this:

<mystery episode="Go Away Ghost Ship" id="014" location="Harbor">
  <suspect>Ghost of Redbeard</suspect>
  <clues>
    <clue location="Ghost Ship">Dry ice in storeroom</clue>
    <clue location="Ghost Ship">Wire-operated sword</clue>
    <clue location="Hidden Cave">Paper pirate hat</clue>
  </clues>
  <culprit>
    <name>C.L. Magnus</name>
    <motive>Insurance fraud on own ships</motive>
  </culprit>
</mystery>

The grex format also lets you merge documents by concatenating them. To add a new clue to the case file, just append some lines and ungrex:

$ cat <<EOF > new_clue.grex
/mystery/clues/clue[4]/@location = Magnus Apartment
/mystery/clues/clue[4]/text() = Bear-skin rug
EOF

$ grex mystery.xml | cat - new_clue.grex | grex --ungrex

I wrote grex for the same reason I wrote xpath-cli: I work with XML files a lot and wanted simpler ways to inspect and modify them from the comfort of my shell. I’ve used tools like jq and gron for a long time when working with JSON, and now with xpath and grex I can do a lot of the same workflows with XML.

The code is available on GitHub. If you want to try grex, you can build it from source:

git clone https://github.com/jake-low/grex
cd grex
cargo install --path .

If you encounter any ghosts bugs, please open an issue.


  1. Well, mostly reversible; comments and CDATA are dropped when roundtripping