Looking for best approaches to storing updatable/editable records with revision history

Hi, as the title says I am seeking to understand for best approaches to storing updatable/editable records with revision history. There was some discussion about this under my post on Bluesky: @uncenter.dev on Bluesky.

The best example of this use case is a record where being able to compare the contents to previous versions is relevant. For example, GitHub Gists are basic text files with simple metadata and (primarily) text content. I was hoping to see what Tangled does for Tangled Strings (their alternative to GitHub Gists), but if I am not mistaken it appears that Tangled Strings in their current form simply have the content field overwritten and don’t store diffs.

I’m curious to hear what approaches people have considered or put into practice on this matter!

3 Likes

We want to do this in Roomy for wiki pages. I think I have a pretty good idea for a way to do it but I haven’t tried it yet.

The idea is to have to have your main record, which you always update with the latest version, and to have separate revision records.

Every time you edit a record, you create a reverse patch and store that in the revision record. The patch itself can be a JSON patch. I think we could probably adapt the JSON patch spec to work with CBOR records easily.

Because we write reverse patches instead of forward patches, we optimize for calculating the latest revisions. We start with the latest version, which can be a completely normal ATProto record that we just update like normal every time there’s a new edit, and applying the reverse patches to the latest version one by one will give us successively older versions of the record.

The great part about this is that the latest record can be read and understood by all AppViews just like normal, with no need to understand the fact that it has revisions.

The strategy, hypothetically, can also be applied to any ATProto record regardless of the lexicon. The caveat there is that the granularity is only at the field level. i.e. if you have a big markdown doc in a single lexicon field, you are replacing the whole thing every revision.

For Roomy, though, we were planning on using Leaflet’s block-based lexicon, so updates would be at the block / paragraph level, which is probably sufficient for most needs.

Maybe you could also extend the JSON patch with a Meyers Algorithm Patch mode that could be used to patch large strings without having to specify the entire string.


Reverse patches really feel like a nice solution, and I like that it optimizes for finding recent revisions, and always provides instant access to the latest version. I’m not sure if I’m missing any caveats to it, but I’m curious if anybody else has any thoughts on it.

There are still things to figure out, like what NSID to use for revisions. Do revisions for each record type get their own NSID? Like app.bsky.feed.post.revision?

Or do we just make a lexicon.community.revision lexicon that can be used for tracking reverse-patch-based revisions to any ATProto record? That seems like a pretty interesting option, but it also feels like revisions should be co-located based on the type of record they patch.

At the same time, if an app wants to track revisions of your Bluesky post, you can’t just use an app.bsky.feed.post.revision record, because Bluesky owns the app.bsky namespace, so maybe having a universal revision type makes sense, and app-makers can make a specific revision NSID if they want to specifically support revisions as a standard feature of their data.

2 Likes