This is a forum thread for giving feedback on the early permissioned data protocol proposal that I recently posted.
From the PR:
0016 Permissioned Data
This is a draft proposal, not the final specification. Details, terminology, and behaviors are all likely to change.
Introduction
The [AT Protocol][ATPROTO] is a protocol for public broadcast data. Users publish records into a repository on their PDS, and applications crawl those repositories to build views. Authority rests in the DID that publishes a record, and records are signed, redistributable, and universally addressable.
This document specifies a permissioned data protocol for data that is not public, data with an access perimeter. It runs alongside the public protocol and serves modalities such as:
- Personal data: bookmarks, mutes, drafts
- Gated content: paid newsletters, subscriber-only posts
- Socially shared: private posts, stories
- Groups: private forums, communities, group chats
The permissioned protocol shares the abstract shape of public atproto. It retains identity-based authority, per-user repositories, lexicon-typed records, and the general flow of applications crawling PDSes to build views. However it is a distinct protocol rather than an extension of the public one. It has its own repository format, sync mechanism, addressing scheme, and resolution path. Public atproto is built for public broadcast (signed, archival, rebroadcastable) while the permissioned protocol is built for party-to-party transmission within an access boundary.
This protocol provides access control, not confidentiality. It is not end-to-end encrypted. Services (both PDSes and authorized applications) can read the data they handle, which is required for server-side features such as search, indexing, notifications, aggregation, and moderation. E2EE is a separate concern that may be layered on top by an application and is out of scope in this proposal.
Relationship to public atproto
| Public atproto | Permissioned protocol | |
|---|---|---|
| Unit of data | Record in a repo | Record in a permissioned repo |
| Repo scope | One repo per user | One permissioned repo per (user, space) |
| Record authority | User DID | User DID |
| URI authority | User DID | Space authority DID |
| Commit | Merkle Search Tree root | LtHash set-hash digest |
| Signature | Rebroadcastable, archival | Deniable on rebroadcast |
| Addressing | at:// URI |
ats:// URI |
| Access | Public | Gated by space credential |
Terminology
- Space: an authorization and sync boundary for a set of permissioned records, identified by an
(authority, type, skey)triple. - Permissioned repo: one userās records within one space, with a cryptographic commit, hosted on that userās PDS.
- Repo host: a service that stores and serves usersā permissioned repos.
- Space host: a service that answers for a space as a whole, issuing credentials, enumerating writers, and routing notifications.
- Space authority: the DID at the root of a space, which resolves to the space host and the key material for issuing credentials.
- Space credential: a token issued by the space authority that grants read access to a space.
- Delegation token: a token issued by a userās PDS that an application exchanges with a space authority for a space credential.
- Client attestation: a token signed by an applicationās own client authentication key, proving the applicationās identity to a space authority. Required only when a space gates on app identity.
- Syncer: an application that keeps its own copy of a space in sync by pulling from repo hosts.
A PDS fulfills both the roles of a repo host and a space host. However, these roles are discussed separately because they do not necessarily need to be filled by a PDS. A permissioned repo or a space may be hosted by any service that implements the required APIs.
Spaces
A space is an authorization and sync boundary representing a shared social context. A space may include many different types of records from many users. The space does not colocate records. Instead, each user stores their own records for a given space in a permissioned repo on their own repo host. A space is the union of these per-user repos across the network: an application presenting a space pulls each memberās repo from its host, assembles the view, and applies access control to requesting users.
Each space is identified by three values:
- authority: a DID, the root of authority for the space
- type: an NSID describing the modality of the space
- space key (
skey): a string distinguishing spaces of the same type under the same authority
Reading or syncing a space requires a space credential signed by the declared signing key of the space authority. The space authority decides whether to issue one based on the requesting user and client application. The protocol does not define how that decision is made and carries no member list (see Access Control). Spaces scale from a single userās personal data (e.g. bookmarks) to communities of millions of users.
Addressing
A permissioned record is addressed by an ats:// URI of six components:
ats://{spaceDid}/{spaceType}/{skey}/{authorDid}/{collection}/{rkey}
| Component | Type | Description |
|---|---|---|
spaceDid |
DID | Space authority DID |
spaceType |
NSID | Space type |
skey |
string | Space key |
authorDid |
DID | DID of the recordās author |
collection |
NSID | Record collection |
rkey |
string | Record key |
All six URI segments are necessary to identify a permissioned record. The first three components may be used to reference a space:
Space: ats://{spaceDid}/{spaceType}/{skey}
Record: ats://{spaceDid}/{spaceType}/{skey}/{authorDid}/{collection}/{rkey}
Space authority
A spaceās authority is the DID at the root of the space and the issuer of its credentials. It may be a userās own DID as for personal data such as bookmarks or mutes. Or it may be a dedicated DID which lets a shared space transfer between users independently of any individual account.
The authority DID MUST expose two entries in its DID document:
- a verification method with id
#atproto_space: the public key used to verify the spaceās credentials - a service entry with id
#atproto_space_host: the endpoint of the space host
Both values MAY resolve to the same values as #atproto and #atproto_pds from the public data protocol.
Space type
A spaceās type is an NSID that names its modality and resolves to a space declaration. It identifies the kind of data a space holds before any network resolution, much as a collection NSID does in public atproto. Because a type names a concrete modality, every space is some specific kind of space rather than a generic container.
The type is also the OAuth consent boundary. Access is granted to a user by type, e.g. āaccess to your AtmoBoards forumsā (see OAuth scopes).
Snipped because too long, read the full PR
Frankly beautiful, and I donāt say that often.
This covers all uses I have in mind myself so far quite nicely and, more importantly, appears to be a user-safe design that doesnāt leak attestations.
The simplespace feature requirement for PDSs is an important access-to-technology measure, so Iām very glad you included that here. (Its listMembers endpoint is private to the authority of the space, right?)
I think keeping that simple and not including authority ownership in the API there is probably fine, or at least I think client-only apps can still manage a form of transferable space authority ownership by storing credentials privately under this model. (Though, I suspect the owned-authority OAuth login canāt be fully automated in a normal browser with just this. Hm⦠But I think thatās a separate problem.)
In this line:
Space permissions can also be bundled, unlike with more user-friendly verbiage, into a permission set.
Is āunlikeā a typo for āusuallyā?
Yes it is, thanks for flagging!
Yes as described, it is. Although if a space expresses the members as records in the space, then it will be viewable by anyone with read access to the space.
ā¦which would be the only thing in that space ā e.g. a space of type member records ā is that correct?
Awesome!
Since the proposal right now says terminology is still likely to change, Iād like to flag a potential naming issue early.
In everyday usage, āspaceā almost always refers to a place youāre in. However, the PD primitive is a permissioning boundary, a far more abstract concept. That conflict between a āplaceā and a āboundaryā is, in my opinion, strong enough to count as a misnomer: it mismodels the concept for anyone meeting it for the first time.
I think that @zicklag.devās initial Arbiter design leaflet illustrates this well, especially considering that this seems to be where the group-permissioning layer is heading:
From now on, Iām going to use the terms āgroupā, āroleā, and āspaceā somewhat interchangeably, based on how a specific space is meant to be used, even though theyāre all just spaces.
If a role (a set of people holding a permission) is a āspaceā (colloquially, a place), then the word has lost its intuitive meaning; readers have to actively fight their intuition to follow the architecture.
Iām usually not one for nomenclature wars, but I do think that this one is particularly important: itās a core concept that all of PD hangs from, and the cost of the mismodeling lands squarely on newcomers, who are a key part of the adoption boundary.
I wouldnāt be against a wholesale rename if that becomes consensus, but I think a fair compromise might be adopting āPermission Spaceā as the canonical term, with a capitalized āSpaceā as shorthand once context is set. This way, we could use the full āPermission Spaceā on first reference and anywhere the ambiguity is a real problem, and use the shorthand elsewhere. Compound terms already supply the needed context and could stay as they are (āSpace credential,ā āSpace host,ā āSpace authority,ā āSpace type,ā etc.), but the standalone āSpaceā in prose that does the most damage could be disambiguated by using its full qualified form.
I know itās a small thing against a draft this early, but I do think naming is load-bearing for legibility and adoption, especially for a primitive this central, so I wanted to get it on the record.
I started a sidebar topic to test our appetite for further terminology bikeshedding: Terminology of permissioned data
My main thoughts are that access control is actually naturally stateful and data driven. āDoes this user have permission to do this?ā is a stateful question asked at call time somewhere in the call stack and the permission itself is data.
Also, I also donāt think terminology concerns are necessarily bike shedding. Counterintuitive terms are pedagogically expensive, naming things well does matter, and itās harder to change the names of things when the protocol is already in place.
Another thought - the space ownerās PDS is a single point of failure for the whole space. That PDS must stay online to keep delivering tokens, or else eventually they expire and everyone loses access. I would like some distributed or redundant authority.
I think that can be done with any of the methods used to make distributed / redundant web services.
You can have the DNS for your space host contain multiple IP addresses, use a reverse proxy that routes to multiple backends, you can use a distributed database for membership lists if you need it, etc.
I donāt think that needs to be built-in to the protocol, because that is all stuff available to us through DNS, HTTP, etc.
For some projects itās stateful and based on a member list that is stored on the space host, but in other projects it will be āstatelessā and based on the result of a call to a 3rd party API like GitHub ( imagine allowing access to the space to any contributor to a GitHub repo with keytrace link to their ATProto account ).
The proposal allows different space hosts to implement the credential check however they need to.
In these āstatelessā use-cases itās actually very easy to make it highly available / redundant because the service itself can be made out of as many servers as you want running around the world and the rest of the reliability is built on top of GitHubās API.
Fair point about a single DID having redundancy behind it.
Also, fair point that sometimes the permission will come from a 3rd party API. I still think a layer between this protocol and applications would be useful, but I suppose lexicons are already that layer, in principle. I think I am then interested in lexicons working the same for both public and permissioned data, which they should because it would be very unlike atproto if they didnāt.
Yes Lexicons work the same for public & permissioned data.
I actually expect 3 layers (at least for the usecase of communities/groups): the protocol, a cross-app community governance/structure standard, and then application Lexicons.
Weāre discussing what an atmospheric community is over here: What is an atmospheric group/community?
And I shared some early thoughts around some data modeling questions in this blog post: Modeling communities on permissioned data - Daniel's Leaflets
I took some high-level notes while giving PR review feedback, iām going to dump them here publicly.
- There has been some mixed feelings about the design process (eg, bsky team driving a proposal), and some projects are already building/operating alternative architectures. But as a general vibe there seems to be growing ecosystem consensus around this branch of development.
- Nothing is set in stone until we have multiple interoperating implementations and a real application built around it. this includes both small details and the overall architecture.
- App services having possession of multiple space credentials from multiple accounts for the same space does feel weird, and will lead to weird/arbitrary implementation decisions (as was raised early on by divy). I donāt think it is a blocking issue, but I hope we can bring structure to this. Eg, provide simple implementation patterns/guidelines.
- This design enables interoperation, and retains some of the principles from public data. But I think the centralizing forces and anti-competitive behaviors are different for spaces (vs public āshared heapā), and we donāt have as deep an analysis or narrative around that yet.
- Could use a careful security and privacy review from somebody with fresh eyes.
- The moderation and anti-abuse story is not very fleshed out yet.
- may want generic space blob pre-scanning at the PDS?
- relationship between spaces, app services, and labelers needs to be worked out
- human moderator access to reported content needs to be worked out
- rate limits and resource quotas will be needed
Here is a sketch set of ādesign goalsā and āarchitecture propertiesā that have come up through the process:
- scales to millions of readers (eg, for newsletter use case)
- personal data (eg, preferences) works with multiple devices, client apps, and app services
- the authority for space records is rooted in the accountās DID (keys in DID document), not the accountās PDS hostname. eg, hijacking the PDS DNS should not allow an attacker to publish fake data. this is the same as public data repos.
- newly authorized services can ābackfillā an entire space (all space records from all members). this is critical for competition and credible exit from service providers.
- support for multiple concurrent client apps, and services (eg, moderation services), in the same space. but at the same time, may want the space authority to be able to limit or exclude client apps and services.
- ability to migrate spaces between hosts and services
- an accountās individual permissioned data can be bulk exported, and migrates between PDS hosts the same as public data
- services can verify the authenticity of data they fetch from PDS hosts; but leaked data is refutable (not strongly authenticated at rest). end users would mostly trust their client app software and app services to have verified authenticity, but in most cases should be able to re-verify individual records by re-fetching (eg, using goat, unless the space has locked down client app list)