Archiving Meets AI

When interpretation feels like recovery, how do we keep history from changing?

Mar 25, 2026

The archiving world is well-versed in the quality spectrum, but generative AI introduces a new axis entirely.

When I digitize a century-old movie for a client and export to video, I have taken it from Preservation to Restoration. The images come back to life, sharable, cleaned up and handled with enough respect to optimize lighting and focus.

Edward Hall in 1912 (colorized by Gemini)

But sometimes I see a still frame or negative that is SO damn good that it needs to be upscaled and sharpened, maybe even colorized as a freebie for the client. At some point in that seductive process, we shift from restoration to Interpretation. I may invent a reasonable dress color and skin tone, even if the underlying image is mostly unchanged. Some “enhancement” tools must be used with a delicate touch, but with AI, this can be pretty convincing. The 1912 photo above is one of my Quaker ancestors near Philly, and spent the past century as a B&W print. It’s amazingly good, but the lighting is too modern with controlled interior fill, making it feel contemporary… and somehow too sharp for early color film. He has a gorgeous green suit, which was a thing at the time, but I think a staid 1912 Quaker gentleman would have worn his black suit for the rare event of a visiting photographer. But what about next year? It will probably be close to perfect.

Interpretation feels like recovery. This is the kind of thing that a family loves, but we mutate history in the process.

It gets crazier. Continuing on this axis away from source truth, things now shift into unfamiliar territory. I’ve used ancient images to generate video clips of my own bicycle, a beloved cockatiel, a jazz singer in 1940, mom on the beach with her kid, and people vamping for the camera. My clients know this is just extrapolation, though some of them are amazingly plausible. Here I was in January 1984, pedaling the Winnebiko through Palatka, Florida, based on an 8x10 newspaper photo:

Now, this is pure magic (and technically accurate including the behavior of my crossover drive and wheel details), but that is not Palatka, nor was there ever such a tracking shot. This is the fourth level of our PRIG spectrum… Generative.

The problem is that 50 years from now, someone finding this will quite reasonably believe it’s a real video from 1984. There is no reliable metadata that can’t get stripped or become technologically obsolete (EXIF or those Google-style json sidecars). Like Grandma blowing a kiss in 1926 or the adorable bird landing on dad’s knee in the sixties, these are inventions based on real imagery.

That is where we find ourselves technologically, and it’s a hoot, but as an archivist it makes me cringe a bit. Over time, a colorized still becomes “a photo of my great-grandmother” instead of “a frame from a 1926 16mm film digitized by Steven Roberts, gently plasticized by Topaz and colorized by Gemini.” (Grandma never had a pink sweater, but two generations hence, nobody will even question it.)

What to do about it

An artifact gets mistaken for the original when interpretive and generative layers disappear. Watermarks get cropped, files are renamed, EXIF is easy to strip, sidecars land in the weeds of a takeout, blockchains detach in a screen-cap, containers are transcoded. Lightroom libraries become confusing technological dinosaurs, sprawling without their catalog across some future hard disk from grandpa and “digitized” by someone like me in the future with a lab full of retro-gizmology, chewing through the ancient boxes every family still carries.

We need a way to embed provenance so deeply into the artifact that it travels with it, which is why the blockchain/NFT crowd was sort of on the right track even if the implementation was chaotic and confusing. More to the point, we need redundancy to reinforce actual provenance, as no single method is likely to survive. I would propose a mix of four layers:

Embedded, fragile but convenient - EXIF/XMP, metadata, captions, etc. This survives casual use (until something better comes along).
Visible, annoying but durable - corner tags like the Gemini mark in the photo of Edward up there, opening slates in video, faint persistent text overlays. These survive screenshots and re-exports.
Intrinsic, resilient but slightly lossy - steganographic watermarking, frequency-domain embedding, or AI-detectable signatures. These should survive compression, resizing, and editing.
Contextual, rich but dependent on cooperation - archival with associated narrative as I am doing with Bionode, or other human-readable provenance. These are easily defeated, but carry maximum depth as long as human and machine caretakers are on the case.

Otherwise, artifacts will get mistaken for originals. As interpretive and generative layers become commonplace, they disappear. Labels decay and context is lost. Interpretation becomes truth.

The PRIG spectrum should become part of the archiving vocabulary, if only as a way to remind us of the distance of a piece from its origin:

Preservation
Restoration
Interpretation
Generation

This taxonomy is orthogonal to the classic Preservation/Mezzanine/Access labeling that archivists use to describe the original source reference, the high-quality but inconvenient proxies like RAW or FILM files, and the “exports” that are convenient for routine use and distribution (JPEG, MP3, MP4, etc)

I hardly need to point out that the implied PRIG vulnerability is already being abused, even though my own brushes with the issue are innocent as I try to keep clients delighted. There is a very dark side, with moving images of people who were never filmed, voices that were never recorded, and comments that the subject would never have said. We are not used to this as a culture, and the impact via social media is insidious.

In the meantime, my dear fellow archivists and digitizers, you can do your part to slow cultural distortion by resisting the temptation to deliver creative spin-offs without obvious multilayer labeling. Speaking of which, I need to drop that Palatka clip into an editor and clutter it with a chyron identifying it as a Nano Banana video generation based on a 1984 photo of Steven K. Roberts by John Delzell of the Palatka Daily News.

Likewise this 35mm slide of me in the BEHEMOTH bicycle helmet, taken for Bicycling magazine in 1991 by Mel Lindstrom in Palo Alto and animated with a Meta widget. Honest, I just smiled into the camera, marveling at his knowledge of light. I never did this:

The Digitizing Report

Discussion about this post

Ready for more?