[Proposal] Shared private Data for social posting

After some experimentation with Appviews and researching the PDS reference, I’ve written up a proposal on how we at Northsky intend to handled shared private data with a PDS/Appview to create a form of community only posts. The core of it is, a lexicon mirroring app.bsky which when used stores the records/blobs separate from the public data which an appview then uses for hydrating feeds where data is exposed within configuration boundaries defined by domains witihin the lexicon.

Please give the post a read, feedback is welcome :slight_smile:

3 Likes

There was some good discussion between @evelyn.northsky.team and @ianopolous.bsky.social in the bsky comments, so for reference here’s Ian’s proposal:

2 Likes

Thanks for sharing @evelyn.northsky.team

What we talked about in Montreal is that if instead of making this strictly PDS based (where a user has to be on a particular PDS for this to work) we might use a new service endpoint record, so that a user hosted on selfhosted.social for example, could have a stratos endpoint in their DID doc as well as their public one.

This would mean that a user wouldn’t have to move a PDS but could have a “private data” PDS separately.

And, that separate apps could provision “private data PDS endpoints” rather than PDS being linked per app.

Anyway - this is an approach, not a solution, again with trade offs. Thanks for sharing Evelyn!

5 Likes

I didn’t specifically consider the DID approach for defining a stratos endpoint, at the early research stage I had considering having stratos as a separate service but wasn’t certain how well it could scale when we began to operate multiple PDS and would need to see how to shard across multiple stratos endpoints.

The all-in-one PDS ends up being the most convenient as we simplify the auth/access flows and for our use case it works. I do think in theory we could make it possible to run separately (in a similar manner that the appview ref is three in one)

1 Like

I’m not pushing back on all-in-one, I’m pointing out if you want adoption without people having to migrate, this could be an option.

Specifically, use the DID doc means that account A can be on bsky.social, but have private data on Stratos.

And, for apps, it means that if someone goes to login, and don’t have a stratos end point, they can have one added.

This is the kind of pattern we need to aim for if we don’t want to end up with an “island” architecture where small groups of users are stranded on particular flavours of a PDS.

4 Likes

seconding a sidecar private PDS endpoint attached to the DID document as a good approach in the general case (in an implementation of private data I did which never saw the light of day, this was the approach I took), though obviously it imposes additional constraints on a first mover building this out.

I think my main concern with using sidecars where users can be using different endpoints is how to handle the indexing side as it requires I have the ability to fetch the records. I suppose you could attach metadata that can be exposed through a negotiated trust (e.g oauth) and it’s indexed.

That’s the primary trade-off with a E2EE approach as we can’t fully index the content and search it, and more importantly we can’t moderate it (or scan for illegal content) unless you accept that the appview is a permanent party.

Yes, the appview is a permanent party, but also is responsible for deleting if requested.

PDS hosts and Appview hosts are considered trusted.

It also means someone can build alternate appviews.

I think what you’re aiming for is “private microblogging”. Various other trade offs around who / if anyone can see metadata.

BTW, I don’t call this a sidecar @nonbinary.computer and I think we should reserve that for “sidecar records”, but rather it’s registering a new private data service endpoint.

So, something like

  "service": [
    {
      "id": "#atproto_pds",
      "type": "AtprotoPersonalDataServer",
      "serviceEndpoint": "https://morel.us-east.host.bsky.network"
    },
   {
      "id": "#atproto_stratos",
      "type": "AtprotoStratosPrivateData",
      "serviceEndpoint": "https://stratos.northsky.team"
  }
]
4 Likes

fair cop on terminology there.

Private microblogging is the right term to go for. I’m just overall a bit skeptical on how useful E2EE is for it as there’s so many inspection points it renders it useless.

Off the top of my head:

  1. Automated moderation/labeler needs to be able to inspect records and blobs
  2. Manual moderation needs to review records and blobs
    1. Could be adjusted where a user report triggers sending a pointer to the record and moderators are effectively users on the platform who view it real time but requires stratos allow that degree of access (e.g rather than pds exposure its a group of dids)
  3. Indexing for hydrating feeds
    1. Can be metadata pointing users to records to fetch on view but then we sacrifice edge caching and search
1 Like

Yep, I wouldn’t call it E2EE if the app view, moderation etc are end points.

2 Likes

A pair of devs independently arrived at similar conclusions as this proposal.

2 Likes

Hey! I’m the person who wrote the proposal up above ^^^. I wanted to drop in and leave a few thoughts.

First, I’m extremely satisfied at the fact that we over at furryli.st and Northsky came to what are broadly extremely similar solutions to our common problem. I absolutely agree that shadow lexicons are the way forward for private/semi-private community data in the Atmosphere.

I do have a few thoughts regarding the specifics of the proposed architecture here, though it’s mostly regarding the initial prioritization of “true” data privacy over system portability and data sovereignty. I would like to propose that simple obfuscation can provide significant additional benefits for community health, resilience, and portability while still nonetheless fulfilling the role of a bulwark against malicious actors.

(Before I get deep into this, I want to make clear that I am not a programmer/computer scientist by trade. My professional background is in biological systems/regulatory networks. This has certainly helped me understand distributed systems better, but it’s very possible I’ll miss key technical details that make a big difference. Please do correct me if this is the case!)


My main concern with the initial proposal of a PDS as a shared relationship. Of course, as the proposal points out, such a boundary definition is not set in stone. But it is my opinion that (with the exception of extreme cases where privacy is absolutely paramount for immediate user safety) a PDS boundary is not the best solution, and, moreover, that consensus towards an alternate boundary system with DID collections/lists could have significant benefits for interoperability, portability, and UX.

A few issues come to mind with the PDS gating architecture (some of which have already been mentioned):

  • PDS gating introduces the PDS as a single point of failure for the community. It lacks true portability, which, in my opinion, is a crucial aspect of the longevity of a community within the protocol. Although it’s obviously not 1:1, such a structure is reminiscent of Mastodon/ActivityPub servers, where accounts are at the mercy of the good will and competence of the server host. Loss of the PDS could lead to a catastrophic loss of community integrity and history.
    • Related to this: asking users to move to a PDS in order to interact with a community is a significant ask for users. It is essentially a request to entrust a user’s account hosting to a third party that the user might or might not consider trustworthy. This is not necessarily a deal-breaker, especially since account backups are technically possible, but I think it would rightfully give many users pause for newer and/or smaller projects.
    • If I understand @bmann.ca’s suggestion correctly, this could potentially be mitigated by using a record to define an arbitrary number of new service endpoints. However, this would still result in records being siloed off within individual PDSes.
  • PDS gating creates a significant moderation load for the PDS host. Most significantly, it potentially removes the first line of defense against revolting content such as gore/CSAM. As I understand it, all content within user repositories in Bluesky PDSes are moderated by Bluesky, regardless of whether it’s published as an app.bsky lexicon or not (since it’s content still hosted on their servers). I think that for the long-term sustainability and mental health of the moderation team, it’s in the best interests to offload at least initial moderation screening to a third party if possible, and perhaps creating an override system for bad calls.
    • Since I understand that moving away from Bluesky moderation can be a big motivator for private community spaces in ATProto, an alternative might be implementing Hive content moderation tools before serving posts to users. However, I’m not entirely sure on the specifics of how that would work.
  • A monolithic PDS-centric design limits the number of communities a user can join. Imagine a user who is Black, queer, a furry, and a game developer. Each of these identities could plausibly have a need for pseudo-private community spaces. Yet, if all of these communities are bounded through PDS gating, it forces users to choose: which identity do you value the most? I don’t think this choice should be necessary.

If the disadvantages I’ve listed are indeed accurate, I think that exploring other alternatives within the solution space is worth our time.


One such alternative solution @evelyn.northsky.team pointed out is a simple collection of DIDs. This is essentially what furryli.st is: a collection of DIDs based on @furryli.st’s follow list which can be used dynamically in any way we see fit. This collection is manually curated by screening users that request to join the list using an internal set of rules and basic visual screening to approve or reject requests. We internally call this a “curated cluster”.

I’ve described our proposed design in my leaflet post, but I’ll summarize it here. Our proposed design is two-pronged:

  1. To create a shadow lexicon mirroring app.bsky or any arbitrary social app, creating a boundary between records meant to be kept within the cluster and records meant to be broadcast to the wider network.
  2. To use a curated cluster’s DID collection (such as furryli.st) to exclude shadow lexicon records from users who are not part of the cluster.

Using curated DID collections could potentially solve the issues I described:

  • There is no personal investment into the community ecosystem through PDS migration. Users are able to remain on Bluesky’s PDS, or migrate to their own, without losing access to whichever communities they are a part of.
  • Bluesky T&S can be used to lower moderation load. While this may not be ideal for some communities, for others who do not have the manpower to consistently maintain active moderation systems, this could make the operation of these community spaces much less burdensome. Cluster operators could act more as bouncers, which we’ve found to be an easier and more forgiving role.
  • Users are not limited in the number of curated clusters they can join. A user can be a member of a Black cluster, a queer cluster, a furry cluster, and a game dev cluster, without needing to sacrifice any particular community or identity.

This design can also open up a world of new possibilities that I would describe as Atmospheric:

  • Curated clusters are portable: in theory, anyone could make a carbon copy of furryli.st and run their own cluster using the open-source tools we’ve built. In the event we go rogue, users could switch to infrastructure run by a separate party. Since the vast majority of them would likely still be members of the carbon copy, the experience would be broadly identical, and, since the design is PDS-agnostic, no records would be lost.
    • This might require an alternate design from the “boundary” property initially proposed by Evelyn. If the boundary DID collection is baked into records, alternate clusters and apps might need to use workarounds to serve records using the previous cluster boundary to new users.
  • The portability of the clusters could encourage users to create peripheral infrastructure (e.g. feeds and labelers) custom-made for their respective communities, which could be promoted and even integrated into the user experience by community runners if they benefit the cluster. Different services could even create different experiences while still gating using the same curated cluster. This opens up all sorts of possibilities for tailor-made experiences for a particular community.
  • Given that the design is only semi-private, there could be exciting opportunities for interoperability between different communities. If communities organize using a similar schema, users could, rather than accessing separate apps for separate clusters, use a single, portable client to interact with multiple clusters at once, and allow them to switch between different social “lenses” depending on the cluster they’d like to view/interact with (Perhaps letting cluster operator accounts define parameters for the social experience using a standardized record).
    • These “lenses” don’t have to be limited to showing the shadow-lexicon either. They could also filter app.bsky posts to show public posts from members of the cluster (or even boolean functions with different clusters). This would leverage curated clusters and the already existing, popular feeds using the app.bsky lexicon to narrow the field of view through which the user experiences the network.

These benefits I’ve listed, of course, ignore the elephant in the room regarding this proposed collection-centric design: since records are not confined to users within a PDS, the records are not fully private. Although they would be obfuscated by not being served by major clients, and a collection boundary would largely prevent bad actors from participating in discussions where they don’t belong, records using the shadow lexicon would still be publicly findable and accessible just like any other record. This is, indeed, the biggest advantage of Evelyn’s proposed design, and strictly meets this WG’s goal of designing a scheme for private records within the AT Protocol.

I did, however, want to bring up the benefits of our alternate collection-centric design which, while not fully meeting the goals of this WG, nonetheless approximates a similar result while opening up several possibilities for a more modular, portable, resilient, and overall atmospheric ecosystem design.

As I mentioned before, I don’t think collection-gating would be suitable for every community. I think Evelyn’s proposal is especially valuable for users at especially high risk of persecution or harassment: political dissidents, persecuted minorities, private collectives and organizations. However, I do want to at least posit the question: At what point do the benefits of a semi-public yet interoperable design outweigh the drawbacks of a truly private yet monolithic design?

I think that there are several kinds of communities that could happily reap the benefits I listed of a semi-private interoperable design while experiencing very few of the potential negatives from lack of true data privacy. This is not limited just to the Black community, furries, game devs, queer people, etc . This could create curated, gated community spaces for, say, universities, research communities, fandoms, really anything you could make a subreddit about. Yet, unlike subreddits, which are monolithically owned by moderators and site admins, communities would be a commons within the ATmosphere that can be tailored to each community’s individual needs and wants, without the risk of centralization, data loss, or a single point of failure.

Obviously, it would be important to let users know, in no uncertain terms, that posts using the shadow-lexicon are not truly private. It would be a mistake to give users a false sense of security. But it’s my sincere belief that many communities would be okay with this, so long as the community space they interact with is truly gated from outsiders and malicious actors.

Edits: Wording/formatting

2 Likes

The important nuance I want to point out is that when we’re referring to a PDS, we’re really talking about the community which at this point in time is a single PDS. For all intents and purposes on ATproto the community exists on one or more PDS operated by the community “manager” which is in this case, Northsky. The other current option is Blacksky with the key difference that they have a membership layer that existed prior to the PDS that is still in use (see the private posts implementation Rudy previewed recently) and there’s no vetting to join (aside from the obvious).

It doesn’t really serve the conversation to focus so much on the PDS aspect, we’re starting there because Northsky is an invite gated community that exists on our PDS. Since writing the post I threw together an implementation that was part of a PDS and determined it’s not scalable and not a very good solution for the Atmosphere and have pivoted to a dedicated Stratos service. Having a user DID document contain their service endpoint introduces a high barrier to entry since it requires signing which for most is the PDS and thus email verification. The other option is joining the service adds the endpoint to the collection and can be referenced.

Details are being hashed out but we arrived at the same approach as Smoke Signals that Rudy is working off of which is a public record that points to a private record and the appview hydrates the record for feeds. From there the question is more “How do users enroll in this” and less “What type of boundaries are there” as then it’s more about how auto-enrollment can be achieved and that’s where you can then look at using a standard list or PDS as the “source”.

2 Likes

My solution is a hybrid

  1. largely implemented in the PDS as a separate table from the public records
  2. uses all the same auth/oauth/permission set stuff built in
  3. leverages a permission database (spicedb) to handle the keeping and querying of access (see the zanzibar paper)

This supports

  • permissioned access ranging from fully public, fully private, and anything in between
  • community is assigned an account / did, same PDS code, backward compat with the public network
  • community user content can live in the community repo or the user repo, up to the app / community
    • many use-cases require no migration to a “permissioned” PDS (PPDS) for existing accounts, normal migration does work if one wants to
    • the default setup I have been using is that user content goes into the community repo, this simplifies a ton of things for POCs
    • for content that goes into the user repo (1) public would work (2) private requires both on a PPDS, hybrid can also work here
    • the (I think Nick’s) proposal for dual records for linking is likely very helpful here
  • If self-hosters are using docker-compose (what the official PDS project provides), there is no extra work on their end, the usual update methods can migrate for them
1 Like