Personal Private Data sooner rather than later?

@iame.li suggests that we do this sooner:

This is @pfrazee.com’s wording

And yeah this immediately unblocks these use cases including the private bookmarks Bluesky has deployed and is likely using to experiment with this pattern!

2 Likes

Personal-private data is the only thing that I think we’ve run into the need for in Roomy. We would love to be able to store archives of a users’ chat data on their PDS.

This would be really useful I think for user-agency. We already store backups of server data on S3, but this would allow the user to have a self-sovereign backup on their own PDS that may include data from private chat spaces.

2 Likes

I have some personal-private ideas I’d like to try out.

I’m thinking though, how likely/unlikely is it that the solutions for other kinds of private data could have implications for personal-private?

Also, if it turns out we get it wrong on personal-private, how hard could it be to overcome? I guess it’s more of a general question about how breaking changes to the protocol could be handled and the downsides.

1 Like

Nothing will be standardized until people try stuff.

Right now any personal private stuff (like bsky bookmarks) is non standard.

Writing up what you plan to do and writing some code to try it out helps everyone.

FWIW I was just having this conversation and I hope to personally get my head back into this in December.

3 Likes

Haha my post looks insane in the Discourse preview without the spacing. More readable:

I guess my take is that keeping it as close to the public repo semantics as possible minimizes protocol-complexity risk. If we do something radically different for personal-private and something radically different again for shared-private, then you’ve got three sets of things to memorize. But if personal-private repo and public repo are as similar as possible, that’s okay. The basic CRUD operations for public records are:

com.atproto.repo.createRecord
com.atproto.repo.deleteRecord
com.atproto.repo.getRecord
com.atproto.repo.listRecords
com.atproto.repo.putRecord

Personal-private’s versions could either be of the form com.atproto.privateRepo.createRecord or com.atproto.repo.createPrivateRecord, I don’t have a strong preference there.

Biggest open question/unsolved problem to me: is there a personal-private firehose off of the PDS to enable authorized applications to keep up-to-date on your personal-private data?

If no: that keeps things simple but will probably mean applications polling the PDS to find out about changes which I assume we want to avoid.

If yes: What protocol does this personal-private firehose use? The public firehose uses CBOR Merkle proofs but if a syncing application only has partial access to your repo… I’m not actually 100% sure if that breaks the sync semantics entirely or not.[1]

An alternative would be to host this firehose in Jetstream format, which dodges the MST sync stuff but now we’ve introduced more protocol complexity by canonizing Jetstream’s syntax instead of having it be an optional thing on top. And you have the usual Jetstream downside which is that it’s harder to be 100% confident that you’re in-sync if you (or the PDS) dropped some Jetstream messages for whatever reason.

TLDR: Private sync is complicated maybe we launch this thing with just the CRUD operations and then do a firehose later.


  1. My intuition is that you can maybe successfully do a sync with only the intermediate structure of the tree, excluding records that the consuming application lacks access to, but in doing so you’d leak a bunch of metadata about the repo? Like, bare minimum all consuming applications would be aware of any time you changed something in the repo because the root would change. And they could probably take a decent guess as to which collections were modified by observing which part of the tree changed. ↩︎

1 Like