[Proposal] AT-URI for cross-service records

tl;dr non-AtprotoPersonalDataServer services might host records, let’s introduce following AT-URI syntax with optional service field to reference those non-PDS hosted data.

at://tangled_knot@did:plc:user/...
     ^^^^^^^^^^^^              ^^^
     |                         path can be any format
     |                         (NSID/rkey prefered, but not enforced)
     service name
     (default to "atproto_pds" when omited)

In this proposal, I’m going to explain why we need cross-service records and proposed AT-URI syntax to reference them universally. There are way more ideas about cross-service records (e.g. do we want universal way to sync them?) but that would be bit off-topic here.

Context

While redesigning Tangled for true decentralization, I found that PDS is pretty lacking. It doesn’t allow collaborative data or private data, its data format is pretty lacking too. It doesn’t seem to be a good idea to store everything to the PDS.

Prior discussions the-case-for-universal-login-and-off-protocol-services and media-pds-service already pointed out similar problems pretty well. PDS can’t match all use-cases. I’ve also seen this non-PDS data server has been discussed often to solve private/shared records.

I discovered that we eventually have to store some data in services other than PDS (I mean specifically AtprotoPersonalDataServer typed service in #atproto_pds name.) Then we want a way to reference those off-protocol data from PDS records and vice versa. There are data which user should own and there are data which group (repo collaborators) should own. Because current PDS spec doesn’t support group-owned data (and I suppose it will never get that,) those group-owned data should be stored in non-PDS service; service which doesn’t have AtprotoPersonalDataServer type and assigned to DID in different namespace (just like #atproto_label service.)

Use-case of cross-service records

This section is to explain why we cannot use existing PDS for all user-owned data and fully cover possible examples. I’m explaining this here because they seem to be possible to implement in existing atproto spec at first glance, but not actually. You can safely skip this part if you want.

1. Group-owned data

PDS can’t serve group-owned data because it doesn’t have ACL needed to allow collaborative edits. Someone can make custom PDS for that or put lightweight proxy around it for extra logic, but that’s not correct AtprotoPersonalDataServer implementation. We shouldn’t force users to use custom PDS implementations to use a service like Tangled. Even worse, those custom implementations might not be compatible with other Atmosphere services. This approach is more closer to Fediverse’s approach. In Fediverse, this approach is ok because the instance represents both data storage and the app serving those data.

2. Record with revision history / 3. Auto-updating records

Tangled specifically needs these. We want full revision history of issue records. We want to create workflow runs regardless of explicit trigger by user and automatically update their status. Even though they are representable in JSON/CBOR format, we cannot use PDS here.

4. JSON projection of non-JSON data

e.g. git commits, git trees, workflow status

When actual data is in different format but projected as JSON. These are technically just auto-updating records.

5. External blob that requires its own way to fetch

e.g. workflow log stream, video stream, really large data
This is what addressed from media-pds-service.

Honestly I’m not sure if we strictly need AT-URI referencing for them. Though those stream should still be owned by the user so their unique id eventually looks like AT-URI:

  • where is this data stored (service)
  • who owns the data (identity)
  • which kind of data (collection)
  • exactly which data from this user (record key)

6. Private/E2EE data..?

I’m not trying to solve the private/E2EE here, but more focused on the collaborative/non-JSON data. Though allowing cross-service records seems to be good basework for future private/E2EE data implementation.

Cross-service referencing

So, if all data are stored across multiple services, it is reasonable to have a universal way to reference them instead of custom identifier spec all over the place. Usually we use AT-URI for data in PDS (exception being blobs but they are attached to the records), so let’s extend the current AT-URI syntax to include the service name where data is stored.

as a reference,
Full AT-URI syntax:

"at://" AUTHORITY [ PATH ] [ "?" QUERY ] [ "#" FRAGMENT ]

Current blessed AT-URI syntax:

"at://" AUTHORITY [ "/" COLLECTION [ "/" RKEY ] ]

We can append ?service={service_name} to blessed AT-URI syntax to qualify current full AT-URI syntax. But honestly I think using userinfo field of URI makes more sense:

"at://" [ SERVICE "@" ] AUTHORITY [ PATH ]

examples:

at://tangled_knot@did:plc:example/org.tangled.pull/<tid>

at://tangled_knot@did:plc:example/commit/<commit-hash>

The path can be any format. NSID/rkey is preferred, but it is fine to not follow if needed. When we cannot define the data schema in lexicon, using NSID doesn’t mean much there.

Fetching off-protocol records

While #atproto_pds records are fetched by com.atproto.repo.getRecord or com.atproto.sync.getRepo, these off-protocol records might not be compatible with all those xrpc methods. For example, we just cannot call repo.getRecord for non-JSON data, sync.getRepo won’t work when underlying MST structure is different to support revision history.

I’m not super sure about fetching/syncing part. I think it’s better to allow custom methods for maximum flexibility but obviously that will force to make similar sync protocol for each non-PDS services. As that’s what Tangled/Streamplace/Germ/Roomy is doing, I guess it’s fine..? Even we don’t reuse existing common xrpc methods, having universal identifier spec for the user-owned data would still be valuable.

Plz share how you think about this concept!

4 Likes

Great write-up and thank you for the insight. The artificial limitations imposed by a single-PDS per user is not practical for many of the reason’s you’ve stated above. We can’t expect a single PDS to scale to the services that will be invented in the coming years. Even if a single PDS could evolve to provide these future innovations, it would require ALL PDS providers to adopt innovations simultaneously for an app to be successful at capturing the existing user base. And in a reasonable timeframe. That, of course, will never happen.

We’re seeing people looking for workarounds for what the URI spec already successfully accomplishes. I mentioned the need for the AT-URI to come into compliance with the URI standards in this discussion:

Adding a query parameter to the AT-URI would be compliant with URI and DID Document standards. You reference a DID service with the query ?service={service_name}. That is what we’ve adopted for NorthStar Social to be aligned with existing standards.

This addition of the DID document service to the AT-URI authority component would further deviate from URI (and DID document) standards. The @ is already reserved for user information (username:password) with specific implementations in URI parsers. I’d rather see the URI standard adopt DIDs in the authority with the emergence of a “DNS” for DID lookup (something greater than the PLC directory). This makes more sense since the authority can easily adopt the did:xxx:abcd.. in place of hostname/ip:port without much headache for parsers to adopt universally. Two colons vs one colon. A URI compliant workaround without such a standard would be encoding the DID colons in the AT-URI.

1 Like

Thank you for the detailed feedback! I wasn’t aware of that there was already a try, good to know!

This addition of the DID document service to the AT-URI authority component would further deviate from URI (and DID document) standards. The @ is already reserved for user information (username:password ) with specific implementations in URI parsers.

I thought it would be fine and reasonable to use the reserved user information part to represent service name.

Generic URI Compliance

  • Userinfo: not currently supported, but reserved for future use. a lone @ character preceding a handle is not valid

source

Because the authority part already represents the user and we won’t have sub-user, I think it makes sense to think it as “service Foo at user did:xxx:bar”. Yes, I’m proposing to extend the current AT-URI syntax. All existing aturi parsers only expect blessed type ignoring the query part anyways. both {service}@ prefix and ?service={service} suffix will introduce breaking change to existing at-uri parsers. I haven’t seen a real use-case of full AT-URI syntax yet, so I thought it would be fine to update the exact spec now.

Though this is just a stylistic choice so ?service={service} is also completely fine. I just found out using userinfo is possible & imo better looking.