Personal Private Data sooner rather than later?

iame.li · October 29, 2025, 7:21pm

Haha my post looks insane in the Discourse preview without the spacing. More readable:

I guess my take is that keeping it as close to the public repo semantics as possible minimizes protocol-complexity risk. If we do something radically different for personal-private and something radically different again for shared-private, then you’ve got three sets of things to memorize. But if personal-private repo and public repo are as similar as possible, that’s okay. The basic CRUD operations for public records are:

com.atproto.repo.createRecord
com.atproto.repo.deleteRecord
com.atproto.repo.getRecord
com.atproto.repo.listRecords
com.atproto.repo.putRecord

Personal-private’s versions could either be of the form com.atproto.privateRepo.createRecord or com.atproto.repo.createPrivateRecord, I don’t have a strong preference there.

Biggest open question/unsolved problem to me: is there a personal-private firehose off of the PDS to enable authorized applications to keep up-to-date on your personal-private data?

If no: that keeps things simple but will probably mean applications polling the PDS to find out about changes which I assume we want to avoid.

If yes: What protocol does this personal-private firehose use? The public firehose uses CBOR Merkle proofs but if a syncing application only has partial access to your repo… I’m not actually 100% sure if that breaks the sync semantics entirely or not.^[1]

An alternative would be to host this firehose in Jetstream format, which dodges the MST sync stuff but now we’ve introduced more protocol complexity by canonizing Jetstream’s syntax instead of having it be an optional thing on top. And you have the usual Jetstream downside which is that it’s harder to be 100% confident that you’re in-sync if you (or the PDS) dropped some Jetstream messages for whatever reason.

TLDR: Private sync is complicated maybe we launch this thing with just the CRUD operations and then do a firehose later.

My intuition is that you can maybe successfully do a sync with only the intermediate structure of the tree, excluding records that the consuming application lacks access to, but in doing so you’d leak a bunch of metadata about the repo? Like, bare minimum all consuming applications would be aware of any time you changed something in the repo because the root would change. And they could probably take a decent guess as to which collections were modified by observing which part of the tree changed. ↩︎