I spent two-and-a-bit years as Polonsky Digital Preservation Technical Fellow at Cambridge University Library (CUL). My longer-term background (and current job) is in software engineering and systems development.
So, this article starts to explore Digital Preservation from more of a software engineering perspective. Note: I don't know many other software engineers who ever really think about Digital Preservation much - software engineering is a very 'forward-looking' profession (often recklessly so).
I'd actually like to begin by putting hands-on software engineering to one side for a minute. One of the most critical differences about Digital Stuff is that it has a different form of value to Real Stuff. So, if you have a giant. environmentally controlled barn full of precious Real Stuff, even if nobody ever goes to see any of it, you've always got the value of the stuff inside to fall back on. And of course you'd never sell any of it blah blah etc, but the accumulated Real Stuff is still an asset on the positive side of your ledger, one that tends to increase in value constantly, enabling you to offset your management costs. This was always painfully obvious at CUL where, for example, I used to have 1-1s with my manager while sitting in the middle of the actual books from Montaigne's Library.
Of course, with Digital Stuff you don't have any of that... Possibly, maybe with Non-Fungible Tokens, but really, given that the first para of the Wikipedia page about them says: "... access to any copy of the original file, however, is not restricted to the buyer of the NFT", no, not even with those. (I shall perhaps delve into the story of NFTs at a future juncture, and I'll get onto the topic of 'originality' below). So Digital Stuff requires completely different thinking about value, and in particular, value to who, exactly?
Perhaps the most useful and important thing I read when I was CUL's Digital Preservation Technical Specialist was the Sustainable Economics for a Digital Planet report. This makes the important point that, to keep Digital Stuff in play, it's vital to define those with a stake in your stuff, and then align their needs, in particular the needs of those stakeholders that use the Stuff, and those that pay for it to remain usable. In other words:
It should always be obvious that your Stuff matters to the people your leadership most want to cater for. Or as it says in the conclusion of the report:
Sustainable preservation strategies will find ways to turn the uncertainties of time and resources into opportunities for flexibility, adjustments in response to changing priorities, and redirection of resources where they are most needed.
If it's going to survive, your Stuff needs to be flexible, not pickled.
The "Digital Objects" concept underpins Digital Preservation thinking. But "Digital Objects" like files, folders, or even chunks of metadata, are only metaphors. The chances of preserving Digital Stuff improve if we address the underlying realities of Computer Systems better.
There's a great documentary about the project to get the Flying Scotsman running again, where the historian James Baldwin explains that, to preserve an old engine, you have to: "… strip it down to every nut, bolt and rivet". Digital Stuff is always one part of a piece of engineering, so we're actually in the business of keeping old engines running here. Genuine knowledge of how it all works keeps it safe. The best way to understand how it works is to tinker with it, break it to pieces, and put it back together again. (Note: you're not really breaking it into pieces. It isn't an actual thing).
But... we don't actually have a proper concept of what "Pieces of Digital Stuff" actually are yet. Liquid / fluid metaphors (e.g.: "streams", "Data Lakes" etc) have been popular over the last couple of decades but they don't really get to the bottom of it, either (more below).
Data is fundamentally useless: it's to the digital world what sand is to construction. Processing sand by melting it into glass or mixing it with cement makes it useful, but until then it just lies around in dirty heaps, getting in the way and costing money and effort to move around. The same with data: without the ability to process it into information, it's just annoying dirt.
Some things about information, though, are:
Every instance of your Stuff getting used adds to the Stuff. If you keep adding to it, more people care that it should be preserved, and it's the fact people care that will preserve it. Moving away from thinking about preserving "Digital Objects" and towards closer-to-the-digital-reality metaphors like "keeping streams going" or "enabling assemblages of linked concepts to form" ought to make this easier, once we can get such metaphors under better control.
In case it's not clear yet: using Digital Stuff preserves it. And that means the actual stuff, not a lo-fi copy you made when the 'original' was squirrelled away on some expensive redundant storage somewhere, never to be touched again. That's not to say that backups shouldn't be taken (and tested) – of course they should – but then that's just prudent, professional Systems Engineering.
If the above has given you the impression that I know any of the answers here then I heartily apologise. Anyone can sit on the sidelines and comment about how we're doing it wrong by treating "Digital Objects" as if they're digital versions of the paper that was mashed through Winston Churchill's typewriter. (And I met plenty of people in the Digital Preservation business who were as worried by the metaphor as I, of course).
The thing is, though, that I've been using the terrible-but-still-better-than-anything-else-I-can-think-of term: "Digital Stuff" throughout all the above. And there's the rub: to actually preserve Digital Stuff properly, we need to define what it is better. Only then will we be able to develop systems that match how people use it and the value it brings, now and for long into the future.
Files, folders and metadata really don't cut it. Nor do even vaguer terms such as 'content' or 'assets', which still imply coherent, whole 'things' with beginnings, middles and ends. A "piece of Digital Stuff" is something more like a technologically-assisted exploration of ideas or concepts, which weaves its way in and out of real-world contexts as time passes. (You can have that one for free - though it feels like it needs a bit of work...)
Fluid metaphors such as streams, lakes, etc get us a bit closer but they're still not right, because they don't take the "snapshot" aspect of working with Digital Stuff into account very well. (I.e. the idea that from time to time a human being will grasp the Stuff and do something with it; something that hopefully changes it in some way and adds to its story)..:
"It's kind of an unconstrained stream that could potentially keep running and running, but a stream that crystallises every so often and then forks into multiple new streams at that point. Then those streams morph into new streams that eventually might coalesce back into one tiny, trickling stream again. And then that stream crystallises again and..."
Getting to a decent, solid definition for Digital Stuff is going to be a recurring theme here, because once we have accurately modelled the domain we're working in, we can really get going.
© 2021 - Dave Gerrard