UDT User Overview

Q. In sys-admin english, what is UDT?

A. UDT serves the same purpose as XML (system-independant structured data exchange) but it feels a lot more like a tar archive.

Q. Why not use XML?

A. UDT has a compact binary encoding, and doesn't suffer the bloat of XML. The data in UDT is strongly-typed, so you can get valuable information about each element such as type name, encoding, and whatever other metadata the writer included. No benchmarks have been run at this point (since only the Java library is available) but once it is written in C, I expect UDT to outperform XML.

Q. Is it equivalent to XML?

A. Not really, but it is equally flexible. UDT could be represented as XML (with a significant amount of bloat) and XML could be represented in UDT (with a possibility of the file getting smaller) However, UDT data is composed of Types, Data elements, and Metadata. The types are somewhat high-level, like you would find in modern theoretical programming languages. XML data is composed of elements, text nodes, attributes, and comments. XML has no type system, unless you add a schema. XML Schemas are not similar to the UDT Type system.

Q. So it isn't text... How do I use it?

A. As with other binary formats used on UNIX, you will need tools to work with it. For viewing, I have written "udt2text" and "udt2html", which perform a one-way conversion from UDT to a readable form. For editing, I was originally planning some sort of terminal-based editor, like a tree-based vim-style editor, or a plugin for emacs. However, my current solution is to use a small Java-ish scripting language to perform edits. Thus, much like the "awk" tool, you write a small script and pass it to the editor along with a UDT stream on STDIN, and it writes the result tree on STDOUT.

Q. I don't use XML much anyway. Why should I care?

A. UDT could also replace the good old ASCII text format, some day. UNIX commands like 'ls', 'who', 'df', 'ps' and just about every other command currently output text on stdout, when what they really ought to do is spit out tables of data. While traditional text cutting/parsing on this output is usable, it would be easier if the data were in a format that has more structure to it. For instance, in order to get the file name and file date from an 'ls -l' listing you have to do something like tell the cut command to break each line apart on the whitespace and pull the columns you are interested in. This is usually complicated by things like spaces in filenames and date locales. If the data never left its binary format, and were separated into fields already, this task would be trivial. If a new shell were written that naturally understood UDT, and would automatically convert UDT streams to text if they were being sent to a terminal, and had udted's scripting available as part of the command syntax, it would be an extremely powerful tool.