So, What is it?

The "Universal data tree" (UDT) project is the combination of a file format / communication protocol, a library that reads it, and tools that use the library to perform common tasks. The file format is designed to be flexible in representing data of just about any kind, the library should allow programmers to easily read and write this format, and the tools are meant to encourage people to use this format for all their data storage needs by making it easy to see/modify/use the data inside.

Who is this for?

UDT is for programmers, mainly. In a broader sense, it might be for users, in the same way that HTML is used by the general population, or in the way that Windows’ regedit.exe is for users. I also hope that in time, UNIX systems will adopt it as a replacement for text data, such as printing 'ls' output in UDT and adding UDT support to shells so that they can display it automagically.

Why should I care?

Well, assuming you are the indended audience, you have probably needed to store data in a file and then read it back out. If you are any kind of an application developer, you have needed to do this on a large scale in probably every project you have ever worked on. Some simple examples are things like storing "last window location" settings, or writing a list of macros into the settings file, or reading a slew of user preferences from their personal config file. If you have developed anything that runs exclusively on windows, you may have discovered how addictively convenient it is to stuff this junk into the registry rather than invent yet another file format or write boring tedious code to smash binary data into a text format like INI or XML (and then get it back out). I have gotten rather attached to the registry, but its three main weaknesses are that

But doesn’t XML do this already?

Yes, but slowly, inefficiently, and crudely. XML is great, as long as your data is almost exclusively text with little formatting. Storing any general data types, like integers, floating point numbers, image data, sound data, or complex data types with named members like "struct Rectangle { int top, bottom, left, right; };" can be downright painful. In memory on a 32-bit sytem, that struct would occupy 16 bytes. The following string of XML takes 57 bytes, not to mention that if you have to embed it within another XML document without having it interpreted as part of that document (like I did in writing this page) it requires a significant amount of escaping, bringing the total up to 103 characters. <rectangle top="23" bottom="800" left="100" right="1024"> So, given that UDT alows me to define custom types and then use them in a binary fasion, that rect would only use 17 bytes in a UDT file. UDT has no escaping requirements, so embedding UDT data is no effort or cost at all.

But java has serialization, why not just use that?

Java serialization is exactly what I wanted, except that it dies if you change your classes in the slightest way, and it is also extremely reliant on the java language. Any general purpose use of serialization would require the same version of class files and a java interpreter on each end of a communication link. This is totally unacceptable for general use. Essentially, UDT is like having the important size, type, and structure data from a class file embedded in the data stream. Also, it is completely unrelated to Java so there aren’t any licenses to run into, other than the LGPL :-)

Well, nobody likes binary protocols because you can’t edit them with vi/vim/emacs/notepad

Heaven forbid that we ever leave the era of line-based ASCII data formats. I mean really, people, parsing through a sequence of characters looking for the newline is most certianly less efficient than knowing the length of the line in advance. I think its time we moved to some new tools that enable editing of binary data trees. If the right people tackle this project, we could end up with treemax, or tvim, or something. Also, for those of you who are nervous about loosing things like sed or cat or cut, I should mention that I’m already planning for equivalents to those commands in the standard set of UDT tools. Also, for the graphically inclined, I’m planning to do the visual tree editor in Java before the others (since the java lib will be done first). I might make a native windows version later, once the delphi library is finished. (and then probably port it to kylix).

Ok, you’ve got my attention, so what are the features?

Thought you’d never ask :-) So, for starters,

So how about some details on the file format?

All the details of the file format can be found in the protocol documentation. I’ll warn you though, its dry stuff. Skip to the examples section if you want to see the bytes dance.