metadata

Backup doesn't get better than CrashPlan 3

I wrote some time ago about metadata and the state of metadata management with existing file copying and backup utilities. At that time, ChronoSync and SuperDuper were the only games in town for automated backup and copying -- and neither are proper backup/archiving programs.

However, the lovely folks at CrashPlan have released a new version that gets through the metadata test suite with flying colors. In fact, the only failure it had against Nate Gray's Backup Bouncer suite was that it failed to preserve extended attributes on symbolic links. (I'm fairly certain nothing currently stores metadata on symlinks, but just in case, I've submitted a support request with CrashPlan)

I'm using CrashPlan (and the online "CrashPlan Central" service -- similar to Carbonite/Mozy/BackBlaze/etc.) for a few weeks now, and I like it a lot. It manages backups from computer-to-computer (even over the Internet!), to hard disks or folders, network drives, and to their aforementioned online service. It does a great job, provides block level, de-duplicated backups, too, so your years of archives take only a minuscule amount of additional space. (Unlike, say, Time Machine, which keeps an extra copy of a file each time you change it -- hence the trouble with VMWare images and Entourage's mail archive.) It also keeps differentials going back forever, has "real time backup" for constant backups (more or less like Time Machine), encrypts the bejeezus out of everything (yes, you can have your own private encryption keys and passwords without sharing ANYTHING with CrashPlan), and (thank goodness!) automatically checks the fidelity of the backup. In other words, enterprise-grade backup for $25/year. (Or free if you don't mind missing a few features and can put up with ads)

So now I have to give my nod to a single backup tool to use for anything other than imaging. CrashPlan cannot make a bootable clone of a drive, nor can it handle a "bare metal" restore. For that, you'll want something like SuperDuper, Carbon Copy Cloner or (my favorite) ChronoSync. CrashPlan also cannot back up to optical media or drive sets, just logical drives. Given the price of drives and the size of files, I don't see this as a problem. Internet backup is quickly taking the place of shuttling around optical media.

Copying and Metadata Redux, or Apple's Folly

This week, I’ve taken a very close look at how various Mac utilities deal with the plethora of metadata available in MacOS X. I took a look at file archiving and compression programs, backup and synchronization software, and also released a set of test files for other people to verify my own tests and run their own.

In addition to what I’ve published, I’ve corresponded with various closed and open source developers to learn more about what’s going on under the hood, and have come to a few conclusions which I think are worth sharing.

Transferring and backing up

As of the writing of this article, I am aware of only three utilities that will successfully maintain all your file’s information upon copy. The Finder, SuperDuper! and ChronoSync. No other utility that I’ve tested will maintain all your metadata, extended attributes, and resource forks when copying or transferring your files!

Furthermore, when it comes to compressing and archiving your documents on foreign file systems, your options become even more limited. Only one encoding format will keep your file intact when decoded, and that’s the Interarchy Backup Format. All other compression and archiving schemes, including many created specifically for the Mac, lose some measure of your files’ metadata.

The good news is that the Interarchy Backup Format is free and open-source. If you want your own copy (and don’t want to build it out of the source code), you can just download a demo of Interarchy, and you can find it in the application’s bundle in Interarchy/Contents/Resources/FileConverters/Backup and can move that wherever you like to use it for your other files.

I’d be very pleased to see some aspiring programmers give that format a little extra love and turn it into a more robust tool than it is today.

It is also worth noting that there are no real backup programs that can restore your files reliably and also track rolling archives, versions, etc. All we have are sync utilities.

Metadata is hard to keep track of, but vitally important to save

All of this metadata is very well hidden from the user. Aside from certain Finder flags, dates, and Spotlight Comments, the user can go about their day blithely unaware of the resource forks, BSD flags, extended attributes, and complex permission systems working under the surface.

However, when some of that data is lost, the results can range from highly confusing to catastrophic, especially for the active geek!

If you are accustomed to using smart folders, tagging your files with third party utilities, using the Spotlight Comments to further refine your searches and smart folder hierarchies, you will find that nearly everything you do to your files from the command-line will break your delicate system. Even less-geeky users will find thumbnail previews missing from some of their files if the resource fork is lost, or will be unable to find that article they wrote two years ago (because it now says it was created last week).

And third parties are already making heavy use of extended attributes to add an extra level of polish and ease of use to their applications. I’m sure we aren’t far off from seeing “real” information stored in those attributes, rather than just system-specific metadata.

Apple really screwed up, bad

The worst part is that Apple built an OS with this rich metadata layer, and they’re also responsible in large part for how fragile and therefore useless it is.

First off, there’s just too many ways to store this sort of information. Old-style resource forks, new style resource forks, invisible .DS_Store files, extended attributes, type/creator codes, and more. Apple really needs to pick one scheme for metadata and stick to it. (My choice would be Extended Attributes, as they’re by far the most flexible.)

Secondly, Apple needs to provide developers APIs to use to manipulate files that just plain work. As it is, not a single API nor command-line-utility that’s provided by Apple will successfully copy a file and keep its metadata intact. Apple brags that a whole host of command-line programs have been updated to support resource forks and other HFS+ goodies, but in truth, none of them work as well as a drag & drop copy in the Finder. And comments… don’t get me started! Apple hasn’t even provided an API to manage comments at all!

So those few developers who put in the effort to be respectful of our data and move/copy it with precision are forced to roll their own file copying algorithms. These algorithms are slower than a normal copy, and are necessarily idiosyncratic and have to be updated as Apple adds features and functionality to the operating system.

Conclusion

In the end, I’m putting this all on Apple’s doorstep (not that they’re listening to their customers on these points – rsync has been broken to the point of uselessness for the past 3 or 4 point updates of MacOS X). They need to set the standard and give us all the tools to reliably move our files and benefit from the functionality this metadata provides. It’s unreasonable to force users to choose among tools based on which one is most respectful of their files. This should be easy, and it isn’t.

Until then, I hope this guide is informative as you look to the right tools to do what you need.

File copying/synchronization software and your metadata (and data!)

Following up on my earlier test of Mac archiving software, I decided to test some popular file copying/synchronization software to see which of these programs kept metadata and other Mac/HFS+ attributes intact. Rather than do a comprehensive test, I tried some popular utilities which seem to cover the general breadth of the software and which are particularly popular or prevalent. I also wanted to catch programs which had been updated since this article was written a year ago.

If you want detail on other utilities, I recommend reading the article linked above, or doing your own tests if you have the time. (And please let us know what you find out!)

The Utilities

The Finder: I just did a simple copy from one disk image to another. Drag and drop. For what it’s worth, an AppleScript copy via The Finder has the exact same results.

Disk Utility: Disk Utility does a pretty good job when making an image from a device, but I was more curious about how it would do making an image from a folder. So I simply chose “New Image From Folder…” and let ‘er rip.

cp: The venerable copy command on the command line. Apple’s updated it to respect resource forks and other metadata.

hard link: This is simply using the ln command or the Gnu cp command to create a separate instance in the logical directory of a file. This is a way to make a file literally exist in two places, where neither version is a pointer or alias to the other. This can be done with the ln command or the Gnu cp command (part of the gnu coreutils).

rsync: Rsync is a great and lightning fast synchronization utility that’s at the core of many command-line backup systems. Apple’s updated it to support resource forks and whatnot, but unfortunately, they broke it in the process. I ran a patched version installed via Fink as the other version pretty much doesn’t work at all.

ditto: This is Apple’s answer to rsync, as best as I can tell, and it’s a nice way to quickly duplicate folders and files. It’s been resource-fork-aware from the start.

psync: psync is a Perl-based utility that does file synchronization similar to rsync. A few graphical clients (such as Deja Vu) use psync on the back end.

CCC 3.0 b5: Carbon Copy Cloner is a great utility for cloning your hard drive. The 3.0 branch is brand new, so I thought I’d give it a try and see how it did.

SuperDuper!: Shirt-Pocket Software’s utility is a fave among many folks who like having a bootable backup. While somewhat inflexible in how it works (it’s pretty much geared toward being a whole-drive duplicator), it’s easy to use and very reliable.

ChronoSync: ChronoSync is probably the most configurable and powerful graphical synchronization utility available for the Mac. It’s users rave about it, so I figured I’d include it in my tests.

Result of tests of file copy/synchronization software

  Tool
  inode number
  permissions
  ACL
  BSD flags
  resource fork
  extended attributes
  type
  creator
  creation date
  modification date
  lock
  stationary
  invisible
  label
comments


  Finder
  copy
  N
  Y
  Y
  Y
  Y
  Y
  Y
  Y
  Y
  Y
  Y
  Y
  Y
  Y
  Y


  Disk
  Utility Image
  N
  Y
  N
  N
  Y
  N
  Y
  Y
  N
  Y
  N
  Y
  Y
  Y
  Y


  cp -r
  N
  N
  N
  N
  Y
  Y
  Y
  Y
  N
  N
  N
  Y
  Y
  Y
  Y



  hard link
  Y
  Y
  Y
  Y
  Y
  Y
  Y
  Y
  Y
  Y
  N
  Y
  Y
  Y
  N


  rsync -aE
  N
  Y
  N
  N
  Y
  Y
  Y
  Y
  N
  Y
  N
  Y
  Y
  Y
  Y


  ditto
  N
  Y
  N
  N
  Y
  N
  Y
  Y
  N
  Y
  N
  Y
  Y
  Y
  Y


  psync
  N
  N
  N
  Y
  Y
  N
  Y
  Y
  Y
  Y
  Y
  Y
  Y
  Y
  Y


  CCC 3.0
  b5
  N
  Y
  N
  N
  Y
  N
  Y
  Y
  N
  Y
  N
  Y
  Y
  Y
  Y


  SuperDuper!
  N
  Y
  Y
  Y
  Y
  Y
  Y
  Y
  Y
  Y
  Y
  Y
  Y
  Y
  Y


  ChronoSync
  N
  Y
  Y
  Y
  Y
  Y
  Y
  Y
  Y
  Y
  Y
  Y
  Y
  Y
  Y

Notes:

  • Finder Copies: Comments are maintained even when the invisible .DS_Store file is not copied, unlike most other techniques
  • Disk Utility Image: Attempting to build an image from a folder on a volume with ACLs enabled will fail. As a result, these tests were carried out on a folder without ACLs.
  • hard link: Files with the uchg flag/locked files could not be linked and were skipped
  • rsync: Attributes in bold were only successfully copied from a volume with ACLs disabled. With ACLs enabled, these attributes do not copy.
  • psync: Psync didn’t copy extended attributes, but it did embed the file’s type/creator information into a new extended attribute. Very strange behavior.
  • SuperDuper!: SuperDuper! maintained the “arch” BSD flag on the clone. While this is not technically correct, it is thorough, and may be desirable if you want a precise clone.
  • ChronoSync: ChronoSync has an option to maintain Finder comments even when you aren’t copying invisible .DS_Store files

Conclusions

Clearly, The Finder is your only free and bulletproof solution to copying files. Every command-line option, despite Apple’s efforts to make them compatible with all the fancy Mac metadata, has serious failings.

If you’re willing to spend a little money, SuperDuper!, or ChronoSync is a good option. Carbon Copy Cloner comes close, but even with its lower price (it’s donationware), SuperDuper! does a much better job and is quite reasonably priced.

Ensuring trouble-free backups from your Mac to not-a-Mac

My current project is to enable network backups of my Mac and my wife’s PC over the internet, so that we have an off-site backup of last resort, should our house burn down, fall over, and then sink into the swamp.

Anyone who’s used a Mac for a long time knows that transferring Mac-native files over the internet is fraught with peril. You risk losing type and creator codes and resource forks, as well as a number of other forms of metadata introduced with MacOS X. So my first step was to determine how I could safely encode my files so that they could make the trip to a foreign server (which would either by a Linux-type box on my web host, or an Amazon S3 account) and then back again, with the file intact for recovery.

A few months ago, the Plasticsfuture blog ran an excellent article comparing the capabilities of darn near every backup and restore program on the Mac. The results were disheartening: Only SuperDuper! could precisely back up and restore a file (although, in my own testing, I found that ChronoSync was updated and can now handle the job).

However, my needs are somewhat different. A network backup requires that I only update changed files, and furthermore, I won’t be accessing them on a filesystem. So SuperDuper’s use of a disk image to perform network backups is straight out. So, too, is ChronoSync, since it can only back up to a network filesystem.

No, I need something that can encode individual files, ideally via a script or some other automated method. Furthermore, I’m mostly just concerned about archiving documents (I don’t want to pay for online storage of my whole hard drive!), so if permissions, ACLs (which I don’t use), or BSD flags are munged, that’s all right. This is truly a last-resort backup, so if I can get resource forks and extended attributes to back up, I’m pretty happy.

Now, according to Apple, a number of command-line utilities have been updated to deal correctly with extended attributes and resource forks – these include tar, cpio, ditto, and zip. There are also a handful of third-party archivers/compressors out there which also claim Mac compatibility such as the x7z (a Mac version of the 7-zip compression algorithm) and StuffIt compression programs, the xar archiver, and Interarchy’s “backup” format. (Note: Interarchy’s backup page is currently 404’ing, but suffice to say it’s an open format that attempts to encode files in such a way as to store all of Apple’s fancy metadata.)

So I figured I’d do what any red-blooded geek would do, and test all these programs to see if they did what they claimed to do. My test was pretty simple: I took a text file, assigned a Finder label and some comments, and added a resource fork and some custom extended attributes. If all these things made it through intact, I’d consider the tool good enough for my purposes.

The results were interesting. First off, every program I tested successfully maintained the resource fork, which is really the most important part of the file to keep around. Additionally, every program except for tar managed to keep the Finder label. As for extended attributes, only tar, the Interarchy backup format and cpio successfully kept the custom extended attributes. This is especially baffling given that the resource fork in MacOS X 10.4 is nothing more than an extended attribute!

The most troubling thing for me was that none of the programs I tested managed to maintain the “Spotlight Comments” I’d added. I frequently use these comments as a way to tag my files for Spotlight searching, so their loss is somewhat problematic. It turns out that these spotlight comments are stored in the invisible .DS_Store file in the same folder as the file I was backing up. So provided I restored (and backed up) the whole folder, that wouldn’t be a problem. Still, it would be nice to see it handle all that.

Update: I hadn’t originally tested it, since Apple hadn’t listed it on their OS X pages as a utility that had been updated to work with resource forks, but some online discussions led me to believe that the pax archiver had also been updated. Indeed, it has, and it successfully maintains resource forks, extended attributes, and Finder labels, just like cpio. It does, however, seem to have bugs when used on systems with ACLs enabled, and like everything else, it loses Finder comments. I have updated the discussion, below, accordingly.

So this leaves me with three solid options: The Interarchy Backup format, pax, and cpio archives. The latter two archive formats are fully compatible with other unix-like systems and can even be expanded using the graphical BOMArchiveHelper on the Mac, so they may be the best choice. Both archivers permit me to compress files in the archive, which is a nice bonus for network archiving. Of the two, pax has some nice advantages, including a larger file size limit on archives and some interesting command-line tricks including the ability to write out to different archive types. Cpio, on the other hand, seems to work on systems with ACLs enabled.

However, I suspect (although I haven’t tested this) that the Interarchy Backup format captures more Mac-specific metadata, since it was designed with exactly that goal in mind, and Apple’s programs are, well, less than entirely consistent in supporting these filesystem features. I do wish Apple could at least provide tools that were consistent and reliable.

It is worthwhile to note that none of the graphical Mac compression utilities managed to maintain extended attributes, not even Apple’s customer zip archiver (which is the same tool used when you archive files in the Finder) nor StuffIt, which has long been a Mac-friendly standby compression program. This leaves Mac users with essentially no easy options for compressing files prior to emailing them or posting them to an FTP site other than disk images (which aren’t cross-platform).

I’m disappointed that the xar archiver, which was designed precisely to handle metadata schemes on different systems, didn’t perform better. It is still a work in progress, so I left some bug reports, and I remain hopeful they will properly support extended attributes and comments in future releases. Since xar can also handle compression as well as encryption, it would be a fantastic solution for off-site backups.

Soon I’ll post how my backup system develops and works over the long-term. If you have any questions or comments, please feel free to sound off, below!

Subscribe to RSS - metadata