Exploring CRDT Lexicons

I’ve (alongside many others I’m sure) been working on CRDT work with atproto, and I think we’ll need to come up with some standard for how to represent CRDT data as lexicons.

@chris.pardy.family has written a leaflet exploring what such a record could look like and is looking for feedback and thoughts: CRDT's on ATProto

I think it’s a good start. It doesn’t tie us to any particular CRDT, though it doesn’t guarantee interop between different CRDT types, which seems fine to me.

I’m thinking maybe ops themselves should maybe also be a lexicon? To extend to further types of applications beyond rich text.

Interested in further thoughts! :slight_smile:

3 Likes

One thing I would add to the discussion is that op compression is really important for size. If you are storing each operation in an individual record, or even in the same record as individual ops in an array, you will most-likely end up with a size explosion for per-character CRDTs.

CRDTs like Automerge and Loro have ways of efficiently storing ranges of operations without using up a ton of space, but it relies on CRDT-specific encodings for ranges of ops.

So you would probably want to have the ability to create a record that contains a list of ops, and maybe the metadata for those ops, but to store the ops themselves encoded in the CRDT-specific, compressed format, so that you can benefit from the efficient encoding.


Also I would look into Sedimentree by the automerge folks, which talks about an efficient mechanism to be able to sync commit ranges in a CRDT-agnostic way while still allowing for efficient compression.

2 Likes