Looking for best approaches to storing updatable/editable records with revision history

Hi, as the title says I am seeking to understand for best approaches to storing updatable/editable records with revision history. There was some discussion about this under my post on Bluesky: @uncenter.dev on Bluesky.

The best example of this use case is a record where being able to compare the contents to previous versions is relevant. For example, GitHub Gists are basic text files with simple metadata and (primarily) text content. I was hoping to see what Tangled does for Tangled Strings (their alternative to GitHub Gists), but if I am not mistaken it appears that Tangled Strings in their current form simply have the content field overwritten and don’t store diffs.

I’m curious to hear what approaches people have considered or put into practice on this matter!

3 Likes

We want to do this in Roomy for wiki pages. I think I have a pretty good idea for a way to do it but I haven’t tried it yet.

The idea is to have to have your main record, which you always update with the latest version, and to have separate revision records.

Every time you edit a record, you create a reverse patch and store that in the revision record. The patch itself can be a JSON patch. I think we could probably adapt the JSON patch spec to work with CBOR records easily.

Because we write reverse patches instead of forward patches, we optimize for calculating the latest revisions. We start with the latest version, which can be a completely normal ATProto record that we just update like normal every time there’s a new edit, and applying the reverse patches to the latest version one by one will give us successively older versions of the record.

The great part about this is that the latest record can be read and understood by all AppViews just like normal, with no need to understand the fact that it has revisions.

The strategy, hypothetically, can also be applied to any ATProto record regardless of the lexicon. The caveat there is that the granularity is only at the field level. i.e. if you have a big markdown doc in a single lexicon field, you are replacing the whole thing every revision.

For Roomy, though, we were planning on using Leaflet’s block-based lexicon, so updates would be at the block / paragraph level, which is probably sufficient for most needs.

Maybe you could also extend the JSON patch with a Meyers Algorithm Patch mode that could be used to patch large strings without having to specify the entire string.


Reverse patches really feel like a nice solution, and I like that it optimizes for finding recent revisions, and always provides instant access to the latest version. I’m not sure if I’m missing any caveats to it, but I’m curious if anybody else has any thoughts on it.

There are still things to figure out, like what NSID to use for revisions. Do revisions for each record type get their own NSID? Like app.bsky.feed.post.revision?

Or do we just make a lexicon.community.revision lexicon that can be used for tracking reverse-patch-based revisions to any ATProto record? That seems like a pretty interesting option, but it also feels like revisions should be co-located based on the type of record they patch.

At the same time, if an app wants to track revisions of your Bluesky post, you can’t just use an app.bsky.feed.post.revision record, because Bluesky owns the app.bsky namespace, so maybe having a universal revision type makes sense, and app-makers can make a specific revision NSID if they want to specifically support revisions as a standard feature of their data.

2 Likes

Hey! I’ve created lichen.wiki, where wikis are represented as a collection of markdown notes.

I use forward diffs in a chain. Every edit is its own immutable wiki.lichen.noteRevision record holding a forward diff (via diff-match-patch) plus a parentRevision field. The first revision has parentRevision null, so there’s no special-casing of “creation” vs “edit”.

The note record itself (wiki.lichen.note) carries only stable identity: a permanent slug, a mutable display title, a wiki ref. There’s no “latest” record: the current content doesn’t live anywhere as a single record, it’s reconstructed by replaying the chain, and our appview materializes and caches that server-side.

The reason I went this way is multi-author editing. Each contributor writes their own revision record to their own PDS, so nobody needs write access to a record they don’t own and there’s no contention on a shared head; the appview stitches the per-author chains into linear history.

The appview becomes the long-term archive. It accumulates history that may no longer exist on PDSes (deleted accounts, offline PDSes, GC’d blobs), and stays the record of record over time even though the chain is reconstructable from live contributor PDSes. I’m currently building a full reconstruction script so a wiki can be recreated directly from the source edits.

One problem for the reverse-patch design: a mutable latest-record means a single writer (or a buggy client) can clobber history, and you’re trusting whoever holds write access. With separate immutable revision records owned by their authors, “revert” is just “append a new revision.” The flip side of that is if a contributor deletes one of their edits on their PDS, you lose the ability to rebuild that stretch from source (the appview still has it archived).

2 Likes