tl;dr non-AtprotoPersonalDataServer services might host records, let’s introduce following AT-URI syntax with optional service field to reference those non-PDS hosted data.
at://tangled_knot@did:plc:user/...
^^^^^^^^^^^^ ^^^
| path can be any format
| (NSID/rkey prefered, but not enforced)
service name
(default to "atproto_pds" when omited)
In this proposal, I’m going to explain why we need cross-service records and proposed AT-URI syntax to reference them universally. There are way more ideas about cross-service records (e.g. do we want universal way to sync them?) but that would be bit off-topic here.
Context
While redesigning Tangled for true decentralization, I found that PDS is pretty lacking. It doesn’t allow collaborative data or private data, its data format is pretty lacking too. It doesn’t seem to be a good idea to store everything to the PDS.
Prior discussions the-case-for-universal-login-and-off-protocol-services and media-pds-service already pointed out similar problems pretty well. PDS can’t match all use-cases. I’ve also seen this non-PDS data server has been discussed often to solve private/shared records.
I discovered that we eventually have to store some data in services other than PDS (I mean specifically AtprotoPersonalDataServer typed service in #atproto_pds name.) Then we want a way to reference those off-protocol data from PDS records and vice versa. There are data which user should own and there are data which group (repo collaborators) should own. Because current PDS spec doesn’t support group-owned data (and I suppose it will never get that,) those group-owned data should be stored in non-PDS service; service which doesn’t have AtprotoPersonalDataServer type and assigned to DID in different namespace (just like #atproto_label service.)
Use-case of cross-service records
This section is to explain why we cannot use existing PDS for all user-owned data and fully cover possible examples. I’m explaining this here because they seem to be possible to implement in existing atproto spec at first glance, but not actually. You can safely skip this part if you want.
1. Group-owned data
PDS can’t serve group-owned data because it doesn’t have ACL needed to allow collaborative edits. Someone can make custom PDS for that or put lightweight proxy around it for extra logic, but that’s not correct AtprotoPersonalDataServer implementation. We shouldn’t force users to use custom PDS implementations to use a service like Tangled. Even worse, those custom implementations might not be compatible with other Atmosphere services. This approach is more closer to Fediverse’s approach. In Fediverse, this approach is ok because the instance represents both data storage and the app serving those data.
2. Record with revision history / 3. Auto-updating records
Tangled specifically needs these. We want full revision history of issue records. We want to create workflow runs regardless of explicit trigger by user and automatically update their status. Even though they are representable in JSON/CBOR format, we cannot use PDS here.
4. JSON projection of non-JSON data
e.g. git commits, git trees, workflow status
When actual data is in different format but projected as JSON. These are technically just auto-updating records.
5. External blob that requires its own way to fetch
e.g. workflow log stream, video stream, really large data
This is what addressed from media-pds-service.
Honestly I’m not sure if we strictly need AT-URI referencing for them. Though those stream should still be owned by the user so their unique id eventually looks like AT-URI:
- where is this data stored (service)
- who owns the data (identity)
- which kind of data (collection)
- exactly which data from this user (record key)
6. Private/E2EE data..?
I’m not trying to solve the private/E2EE here, but more focused on the collaborative/non-JSON data. Though allowing cross-service records seems to be good basework for future private/E2EE data implementation.
Cross-service referencing
So, if all data are stored across multiple services, it is reasonable to have a universal way to reference them instead of custom identifier spec all over the place. Usually we use AT-URI for data in PDS (exception being blobs but they are attached to the records), so let’s extend the current AT-URI syntax to include the service name where data is stored.
as a reference,
Full AT-URI syntax:"at://" AUTHORITY [ PATH ] [ "?" QUERY ] [ "#" FRAGMENT ]Current blessed AT-URI syntax:
"at://" AUTHORITY [ "/" COLLECTION [ "/" RKEY ] ]
We can append ?service={service_name} to blessed AT-URI syntax to qualify current full AT-URI syntax. But honestly I think using userinfo field of URI makes more sense:
"at://" [ SERVICE "@" ] AUTHORITY [ PATH ]
examples:
at://tangled_knot@did:plc:example/org.tangled.pull/<tid>
at://tangled_knot@did:plc:example/commit/<commit-hash>
The path can be any format. NSID/rkey is preferred, but it is fine to not follow if needed. When we cannot define the data schema in lexicon, using NSID doesn’t mean much there.
Fetching off-protocol records
While #atproto_pds records are fetched by com.atproto.repo.getRecord or com.atproto.sync.getRepo, these off-protocol records might not be compatible with all those xrpc methods. For example, we just cannot call repo.getRecord for non-JSON data, sync.getRepo won’t work when underlying MST structure is different to support revision history.
I’m not super sure about fetching/syncing part. I think it’s better to allow custom methods for maximum flexibility but obviously that will force to make similar sync protocol for each non-PDS services. As that’s what Tangled/Streamplace/Germ/Roomy is doing, I guess it’s fine..? Even we don’t reuse existing common xrpc methods, having universal identifier spec for the user-owned data would still be valuable.
Plz share how you think about this concept!