Copying and Metadata Redux, or Apple's Folly

This week, I’ve taken a very close look at how various Mac utilities deal with the plethora of metadata available in MacOS X. I took a look at file archiving and compression programs, backup and synchronization software, and also released a set of test files for other people to verify my own tests and run their own.

In addition to what I’ve published, I’ve corresponded with various closed and open source developers to learn more about what’s going on under the hood, and have come to a few conclusions which I think are worth sharing.

Transferring and backing up

As of the writing of this article, I am aware of only three utilities that will successfully maintain all your file’s information upon copy. The Finder, SuperDuper! and ChronoSync. No other utility that I’ve tested will maintain all your metadata, extended attributes, and resource forks when copying or transferring your files!

Furthermore, when it comes to compressing and archiving your documents on foreign file systems, your options become even more limited. Only one encoding format will keep your file intact when decoded, and that’s the Interarchy Backup Format. All other compression and archiving schemes, including many created specifically for the Mac, lose some measure of your files’ metadata.

The good news is that the Interarchy Backup Format is free and open-source. If you want your own copy (and don’t want to build it out of the source code), you can just download a demo of Interarchy, and you can find it in the application’s bundle in Interarchy/Contents/Resources/FileConverters/Backup and can move that wherever you like to use it for your other files.

I’d be very pleased to see some aspiring programmers give that format a little extra love and turn it into a more robust tool than it is today.

It is also worth noting that there are no real backup programs that can restore your files reliably and also track rolling archives, versions, etc. All we have are sync utilities.

Metadata is hard to keep track of, but vitally important to save

All of this metadata is very well hidden from the user. Aside from certain Finder flags, dates, and Spotlight Comments, the user can go about their day blithely unaware of the resource forks, BSD flags, extended attributes, and complex permission systems working under the surface.

However, when some of that data is lost, the results can range from highly confusing to catastrophic, especially for the active geek!

If you are accustomed to using smart folders, tagging your files with third party utilities, using the Spotlight Comments to further refine your searches and smart folder hierarchies, you will find that nearly everything you do to your files from the command-line will break your delicate system. Even less-geeky users will find thumbnail previews missing from some of their files if the resource fork is lost, or will be unable to find that article they wrote two years ago (because it now says it was created last week).

And third parties are already making heavy use of extended attributes to add an extra level of polish and ease of use to their applications. I’m sure we aren’t far off from seeing “real” information stored in those attributes, rather than just system-specific metadata.

Apple really screwed up, bad

The worst part is that Apple built an OS with this rich metadata layer, and they’re also responsible in large part for how fragile and therefore useless it is.

First off, there’s just too many ways to store this sort of information. Old-style resource forks, new style resource forks, invisible .DS_Store files, extended attributes, type/creator codes, and more. Apple really needs to pick one scheme for metadata and stick to it. (My choice would be Extended Attributes, as they’re by far the most flexible.)

Secondly, Apple needs to provide developers APIs to use to manipulate files that just plain work. As it is, not a single API nor command-line-utility that’s provided by Apple will successfully copy a file and keep its metadata intact. Apple brags that a whole host of command-line programs have been updated to support resource forks and other HFS+ goodies, but in truth, none of them work as well as a drag & drop copy in the Finder. And comments… don’t get me started! Apple hasn’t even provided an API to manage comments at all!

So those few developers who put in the effort to be respectful of our data and move/copy it with precision are forced to roll their own file copying algorithms. These algorithms are slower than a normal copy, and are necessarily idiosyncratic and have to be updated as Apple adds features and functionality to the operating system.

Conclusion

In the end, I’m putting this all on Apple’s doorstep (not that they’re listening to their customers on these points – rsync has been broken to the point of uselessness for the past 3 or 4 point updates of MacOS X). They need to set the standard and give us all the tools to reliably move our files and benefit from the functionality this metadata provides. It’s unreasonable to force users to choose among tools based on which one is most respectful of their files. This should be easy, and it isn’t.

Until then, I hope this guide is informative as you look to the right tools to do what you need.

Twitter, Facebook

Written on March 16, 2007