[Proposal] Shared private Data for social posting

After some experimentation with Appviews and researching the PDS reference, I’ve written up a proposal on how we at Northsky intend to handled shared private data with a PDS/Appview to create a form of community only posts. The core of it is, a lexicon mirroring app.bsky which when used stores the records/blobs separate from the public data which an appview then uses for hydrating feeds where data is exposed within configuration boundaries defined by domains witihin the lexicon.

Please give the post a read, feedback is welcome :slight_smile:

6 Likes

There was some good discussion between @evelyn.northsky.team and @ianopolous.bsky.social in the bsky comments, so for reference here’s Ian’s proposal:

2 Likes

Thanks for sharing @evelyn.northsky.team

What we talked about in Montreal is that if instead of making this strictly PDS based (where a user has to be on a particular PDS for this to work) we might use a new service endpoint record, so that a user hosted on selfhosted.social for example, could have a stratos endpoint in their DID doc as well as their public one.

This would mean that a user wouldn’t have to move a PDS but could have a “private data” PDS separately.

And, that separate apps could provision “private data PDS endpoints” rather than PDS being linked per app.

Anyway - this is an approach, not a solution, again with trade offs. Thanks for sharing Evelyn!

5 Likes

I didn’t specifically consider the DID approach for defining a stratos endpoint, at the early research stage I had considering having stratos as a separate service but wasn’t certain how well it could scale when we began to operate multiple PDS and would need to see how to shard across multiple stratos endpoints.

The all-in-one PDS ends up being the most convenient as we simplify the auth/access flows and for our use case it works. I do think in theory we could make it possible to run separately (in a similar manner that the appview ref is three in one)

1 Like

I’m not pushing back on all-in-one, I’m pointing out if you want adoption without people having to migrate, this could be an option.

Specifically, use the DID doc means that account A can be on bsky.social, but have private data on Stratos.

And, for apps, it means that if someone goes to login, and don’t have a stratos end point, they can have one added.

This is the kind of pattern we need to aim for if we don’t want to end up with an “island” architecture where small groups of users are stranded on particular flavours of a PDS.

4 Likes

seconding a sidecar private PDS endpoint attached to the DID document as a good approach in the general case (in an implementation of private data I did which never saw the light of day, this was the approach I took), though obviously it imposes additional constraints on a first mover building this out.

I think my main concern with using sidecars where users can be using different endpoints is how to handle the indexing side as it requires I have the ability to fetch the records. I suppose you could attach metadata that can be exposed through a negotiated trust (e.g oauth) and it’s indexed.

That’s the primary trade-off with a E2EE approach as we can’t fully index the content and search it, and more importantly we can’t moderate it (or scan for illegal content) unless you accept that the appview is a permanent party.

1 Like

Yes, the appview is a permanent party, but also is responsible for deleting if requested.

PDS hosts and Appview hosts are considered trusted.

It also means someone can build alternate appviews.

I think what you’re aiming for is “private microblogging”. Various other trade offs around who / if anyone can see metadata.

BTW, I don’t call this a sidecar @nonbinary.computer and I think we should reserve that for “sidecar records”, but rather it’s registering a new private data service endpoint.

So, something like

  "service": [
    {
      "id": "#atproto_pds",
      "type": "AtprotoPersonalDataServer",
      "serviceEndpoint": "https://morel.us-east.host.bsky.network"
    },
   {
      "id": "#atproto_stratos",
      "type": "AtprotoStratosPrivateData",
      "serviceEndpoint": "https://stratos.northsky.team"
  }
]
4 Likes

fair cop on terminology there.

Private microblogging is the right term to go for. I’m just overall a bit skeptical on how useful E2EE is for it as there’s so many inspection points it renders it useless.

Off the top of my head:

  1. Automated moderation/labeler needs to be able to inspect records and blobs
  2. Manual moderation needs to review records and blobs
    1. Could be adjusted where a user report triggers sending a pointer to the record and moderators are effectively users on the platform who view it real time but requires stratos allow that degree of access (e.g rather than pds exposure its a group of dids)
  3. Indexing for hydrating feeds
    1. Can be metadata pointing users to records to fetch on view but then we sacrifice edge caching and search
1 Like

Yep, I wouldn’t call it E2EE if the app view, moderation etc are end points.

3 Likes

A pair of devs independently arrived at similar conclusions as this proposal.

2 Likes

Hey! I’m the person who wrote the proposal up above ^^^. I wanted to drop in and leave a few thoughts.

First, I’m extremely satisfied at the fact that we over at furryli.st and Northsky came to what are broadly extremely similar solutions to our common problem. I absolutely agree that shadow lexicons are the way forward for private/semi-private community data in the Atmosphere.

I do have a few thoughts regarding the specifics of the proposed architecture here, though it’s mostly regarding the initial prioritization of “true” data privacy over system portability and data sovereignty. I would like to propose that simple obfuscation can provide significant additional benefits for community health, resilience, and portability while still nonetheless fulfilling the role of a bulwark against malicious actors.

(Before I get deep into this, I want to make clear that I am not a programmer/computer scientist by trade. My professional background is in biological systems/regulatory networks. This has certainly helped me understand distributed systems better, but it’s very possible I’ll miss key technical details that make a big difference. Please do correct me if this is the case!)


My main concern with the initial proposal of a PDS as a shared relationship. Of course, as the proposal points out, such a boundary definition is not set in stone. But it is my opinion that (with the exception of extreme cases where privacy is absolutely paramount for immediate user safety) a PDS boundary is not the best solution, and, moreover, that consensus towards an alternate boundary system with DID collections/lists could have significant benefits for interoperability, portability, and UX.

A few issues come to mind with the PDS gating architecture (some of which have already been mentioned):

  • PDS gating introduces the PDS as a single point of failure for the community. It lacks true portability, which, in my opinion, is a crucial aspect of the longevity of a community within the protocol. Although it’s obviously not 1:1, such a structure is reminiscent of Mastodon/ActivityPub servers, where accounts are at the mercy of the good will and competence of the server host. Loss of the PDS could lead to a catastrophic loss of community integrity and history.
    • Related to this: asking users to move to a PDS in order to interact with a community is a significant ask for users. It is essentially a request to entrust a user’s account hosting to a third party that the user might or might not consider trustworthy. This is not necessarily a deal-breaker, especially since account backups are technically possible, but I think it would rightfully give many users pause for newer and/or smaller projects.
    • If I understand @bmann.ca’s suggestion correctly, this could potentially be mitigated by using a record to define an arbitrary number of new service endpoints. However, this would still result in records being siloed off within individual PDSes.
  • PDS gating creates a significant moderation load for the PDS host. Most significantly, it potentially removes the first line of defense against revolting content such as gore/CSAM. As I understand it, all content within user repositories in Bluesky PDSes are moderated by Bluesky, regardless of whether it’s published as an app.bsky lexicon or not (since it’s content still hosted on their servers). I think that for the long-term sustainability and mental health of the moderation team, it’s in the best interests to offload at least initial moderation screening to a third party if possible, and perhaps creating an override system for bad calls.
    • Since I understand that moving away from Bluesky moderation can be a big motivator for private community spaces in ATProto, an alternative might be implementing Hive content moderation tools before serving posts to users. However, I’m not entirely sure on the specifics of how that would work.
  • A monolithic PDS-centric design limits the number of communities a user can join. Imagine a user who is Black, queer, a furry, and a game developer. Each of these identities could plausibly have a need for pseudo-private community spaces. Yet, if all of these communities are bounded through PDS gating, it forces users to choose: which identity do you value the most? I don’t think this choice should be necessary.

If the disadvantages I’ve listed are indeed accurate, I think that exploring other alternatives within the solution space is worth our time.


One such alternative solution @evelyn.northsky.team pointed out is a simple collection of DIDs. This is essentially what furryli.st is: a collection of DIDs based on @furryli.st’s follow list which can be used dynamically in any way we see fit. This collection is manually curated by screening users that request to join the list using an internal set of rules and basic visual screening to approve or reject requests. We internally call this a “curated cluster”.

I’ve described our proposed design in my leaflet post, but I’ll summarize it here. Our proposed design is two-pronged:

  1. To create a shadow lexicon mirroring app.bsky or any arbitrary social app, creating a boundary between records meant to be kept within the cluster and records meant to be broadcast to the wider network.
  2. To use a curated cluster’s DID collection (such as furryli.st) to exclude shadow lexicon records from users who are not part of the cluster.

Using curated DID collections could potentially solve the issues I described:

  • There is no personal investment into the community ecosystem through PDS migration. Users are able to remain on Bluesky’s PDS, or migrate to their own, without losing access to whichever communities they are a part of.
  • Bluesky T&S can be used to lower moderation load. While this may not be ideal for some communities, for others who do not have the manpower to consistently maintain active moderation systems, this could make the operation of these community spaces much less burdensome. Cluster operators could act more as bouncers, which we’ve found to be an easier and more forgiving role.
  • Users are not limited in the number of curated clusters they can join. A user can be a member of a Black cluster, a queer cluster, a furry cluster, and a game dev cluster, without needing to sacrifice any particular community or identity.

This design can also open up a world of new possibilities that I would describe as Atmospheric:

  • Curated clusters are portable: in theory, anyone could make a carbon copy of furryli.st and run their own cluster using the open-source tools we’ve built. In the event we go rogue, users could switch to infrastructure run by a separate party. Since the vast majority of them would likely still be members of the carbon copy, the experience would be broadly identical, and, since the design is PDS-agnostic, no records would be lost.
    • This might require an alternate design from the “boundary” property initially proposed by Evelyn. If the boundary DID collection is baked into records, alternate clusters and apps might need to use workarounds to serve records using the previous cluster boundary to new users.
  • The portability of the clusters could encourage users to create peripheral infrastructure (e.g. feeds and labelers) custom-made for their respective communities, which could be promoted and even integrated into the user experience by community runners if they benefit the cluster. Different services could even create different experiences while still gating using the same curated cluster. This opens up all sorts of possibilities for tailor-made experiences for a particular community.
  • Given that the design is only semi-private, there could be exciting opportunities for interoperability between different communities. If communities organize using a similar schema, users could, rather than accessing separate apps for separate clusters, use a single, portable client to interact with multiple clusters at once, and allow them to switch between different social “lenses” depending on the cluster they’d like to view/interact with (Perhaps letting cluster operator accounts define parameters for the social experience using a standardized record).
    • These “lenses” don’t have to be limited to showing the shadow-lexicon either. They could also filter app.bsky posts to show public posts from members of the cluster (or even boolean functions with different clusters). This would leverage curated clusters and the already existing, popular feeds using the app.bsky lexicon to narrow the field of view through which the user experiences the network.

These benefits I’ve listed, of course, ignore the elephant in the room regarding this proposed collection-centric design: since records are not confined to users within a PDS, the records are not fully private. Although they would be obfuscated by not being served by major clients, and a collection boundary would largely prevent bad actors from participating in discussions where they don’t belong, records using the shadow lexicon would still be publicly findable and accessible just like any other record. This is, indeed, the biggest advantage of Evelyn’s proposed design, and strictly meets this WG’s goal of designing a scheme for private records within the AT Protocol.

I did, however, want to bring up the benefits of our alternate collection-centric design which, while not fully meeting the goals of this WG, nonetheless approximates a similar result while opening up several possibilities for a more modular, portable, resilient, and overall atmospheric ecosystem design.

As I mentioned before, I don’t think collection-gating would be suitable for every community. I think Evelyn’s proposal is especially valuable for users at especially high risk of persecution or harassment: political dissidents, persecuted minorities, private collectives and organizations. However, I do want to at least posit the question: At what point do the benefits of a semi-public yet interoperable design outweigh the drawbacks of a truly private yet monolithic design?

I think that there are several kinds of communities that could happily reap the benefits I listed of a semi-private interoperable design while experiencing very few of the potential negatives from lack of true data privacy. This is not limited just to the Black community, furries, game devs, queer people, etc . This could create curated, gated community spaces for, say, universities, research communities, fandoms, really anything you could make a subreddit about. Yet, unlike subreddits, which are monolithically owned by moderators and site admins, communities would be a commons within the ATmosphere that can be tailored to each community’s individual needs and wants, without the risk of centralization, data loss, or a single point of failure.

Obviously, it would be important to let users know, in no uncertain terms, that posts using the shadow-lexicon are not truly private. It would be a mistake to give users a false sense of security. But it’s my sincere belief that many communities would be okay with this, so long as the community space they interact with is truly gated from outsiders and malicious actors.

Edits: Wording/formatting

3 Likes

The important nuance I want to point out is that when we’re referring to a PDS, we’re really talking about the community which at this point in time is a single PDS. For all intents and purposes on ATproto the community exists on one or more PDS operated by the community “manager” which is in this case, Northsky. The other current option is Blacksky with the key difference that they have a membership layer that existed prior to the PDS that is still in use (see the private posts implementation Rudy previewed recently) and there’s no vetting to join (aside from the obvious).

It doesn’t really serve the conversation to focus so much on the PDS aspect, we’re starting there because Northsky is an invite gated community that exists on our PDS. Since writing the post I threw together an implementation that was part of a PDS and determined it’s not scalable and not a very good solution for the Atmosphere and have pivoted to a dedicated Stratos service. Having a user DID document contain their service endpoint introduces a high barrier to entry since it requires signing which for most is the PDS and thus email verification. The other option is joining the service adds the endpoint to the collection and can be referenced.

Details are being hashed out but we arrived at the same approach as Smoke Signals that Rudy is working off of which is a public record that points to a private record and the appview hydrates the record for feeds. From there the question is more “How do users enroll in this” and less “What type of boundaries are there” as then it’s more about how auto-enrollment can be achieved and that’s where you can then look at using a standard list or PDS as the “source”.

4 Likes

My solution is a hybrid

  1. largely implemented in the PDS as a separate table from the public records
  2. uses all the same auth/oauth/permission set stuff built in
  3. leverages a permission database (spicedb) to handle the keeping and querying of access (see the zanzibar paper)

This supports

  • permissioned access ranging from fully public, fully private, and anything in between
  • community is assigned an account / did, same PDS code, backward compat with the public network
  • community user content can live in the community repo or the user repo, up to the app / community
    • many use-cases require no migration to a “permissioned” PDS (PPDS) for existing accounts, normal migration does work if one wants to
    • the default setup I have been using is that user content goes into the community repo, this simplifies a ton of things for POCs
    • for content that goes into the user repo (1) public would work (2) private requires both on a PPDS, hybrid can also work here
    • the (I think Nick’s) proposal for dual records for linking is likely very helpful here
  • If self-hosters are using docker-compose (what the official PDS project provides), there is no extra work on their end, the usual update methods can migrate for them
1 Like

Great discussion!

For context: I’m building Barazo, a forum AppView on ATProto (lexicon thread).

Forum private sections have different requirements from private social posting that stress-test this proposal a bit:

  • Visibility is community-scoped, not friend-group-scoped. The community admin decides who sees what based on roles and membership, not the post author.
  • Ideally the content remains searchable and browseable within the community (categories, tags, thread navigation). E2EE is off the table for this use case (I assume?)
  • It still needs to be moderatable. Private doesn’t mean unmoderated.

This makes me lean toward @bmann.ca’s sidecar endpoint approach. If Barazo required users on a specific PDS to access private sections, that breaks portable identity. A bsky.social user should be able to join a private forum section without migrating.

Gap: for public data, the relay/firehose handles distribution to AppViews. For private sidecar data, there’s no equivalent. The AppView would need to maintain per-user auth and actively subscribe to each member’s sidecar endpoint. That works for small communities but gets expensive at scale, and it’s a fundamentally different indexing pattern than what ATProto
AppViews are built around today.

Not a sidecar :slight_smile: - multiple service endpoints in the DID Doc

3 Likes

Thx Boris, noted! :folded_hands: So with service endpoints in the DID Doc the discovery side is basically solved, since the AppView just resolves member DIDs it already knows about. Check :grinning_face_with_smiling_eyes:

The delivery side seems further along than I thought, I found Dave Nash’s RFC proposing metadata notifications through the firehose (so not the actual content), then the AppView fetches directly from the PDS with OAuth. That would work for private forum sections :flexed_biceps:

My sense is that most people are largely against broadcasting metadata for private content over the firehose or otherwise making the trail public.

Hi everyone, I wanted to jump back in here. I think I have something that speaks to the open questions in this thread.

@evelyn.northsky.team , I think your pivot from a PDS-based to a dedicated service makes a lot of sense, and the public-pointing-to-private solution is the right shape I believe. However, I’d like to propose a variation of that pattern from the proposal I’m in the process of writing.

For the basic needed background, my proposal essentially separates the question of “Who belongs” (i.e. vetting for Northsky, furryli.st, Blacksky) and “What community spaces exist”. The connective tissue of this system is a credential, a persistent, self-authenticating, portable proof of membership that lives in the user’s repo.

I don’t fundamentally disagree that a service provider has to be trusted to make private data work. It can’t be E2EE if the AppView is the endpoint within the trust boundary (And it must be, if it can read, moderate, and index).

What I’ve been wondering, then, is: If the service can read anything anyways, then can encryption buy you anything? I think it can – it can mean that private data can stay in the member’s own repo.

Consider the following structure:

  1. When a member presents their credential to a service (proving they hold a valid credential from the issuer), the member and service establish a shared secret key. This becomes the key the member uses to scope records to the service (I’ll call this the member’s “service key”).
  2. When a member posts to the service, their client encrypts the content with their personal service key before writing it to the repo as a record. To anyone else (including the PDS), this record appears as just an encrypted string.

The service can derive every member’s personal service key from their credential (or public key). Therefore, it can decrypt, index, moderate, and serve all content intended for it. But, crucially, no member holds a key that can decrypt another member’s content – just their own. Members are forced to read community content through the service, who’s the only one who can decrypt and serve it. If a service removes a user, they lose access to all other members’ content immediately and cryptographically. The only content they retain is their own posts, which they wrote.

This is essentially the same public-pointer-to-private-record pattern that Stratos and Rudy are working with for their trust models. However, the users retain full sovereignty over their data through ownership of their encrypted records. Simultaneously, it’s the service, and only the service, that can draw from their repos, decrypt the records, index them, and present them as a unified whole.

This can address a few problems:

With encrypted records in member repos, the firehose can work as-is by carrying encrypted payloads that no one can read. The community AppView subscribes to the firehose, filters for records from credentialed members, decrypts with each member’s derived service key, and indexes, without any per-user auth or sidecar endpoints needed.

Encrypted records on the firehose will always show that someone posted something. However, if all private community content uses a single generic record type, the firehose can’t distinguish which community a record belongs to. The surface area for metadata harvesting is shrunk to “This user posted something private, somewhere”.

Credentials can resolve this conflation. The community isn’t hosted anywhere. It’s just the set of credential holders, wherever they happen to be. Members can stay on whichever PDS they wish, since the community boundary is defined by who holds a credential, rather than who’s on which server.

Instead of using the PDS as the source, enrollment can look something like this: An AppView that wants to serve Northsky users simply checks “Does this user hold a credential from Northsky? Are they in good standing with me and/or Northsky?” If yes, they receive a service key. This means no DID document changes are required and no PDS cooperation is needed other than record publishing. The credential is just a record, and standing can be maintained internally or queried from the issuer.

This model gives the same privacy guarantees as service-custodied storage, with the added benefit that the data itself lives in the user’s repo. In the event of service failure, a successor service that the community trusts can receive the service keys from users migrating and restore access to community content by decrypting and indexing each user’s records as they trickle in.

There is additional client-side complexity, of course, mainly in credential presentation, key derivation, and encryption/decryption. But I still think the benefits makes this worth it. The credential model combined with a per-member-key model, makes the question of data sovereignty vs. service custody moot. And, because the data is not held unilaterally by services, it’s ultimately the user and the community as a whole who decide who they trust with their private data.

1 Like