Bringing back at://

The biggest complaint that I’ve heard about the permissioned data protocol design is that it feels like a totally separate thing from the existing public broadcast protocol. While many of the things that differentiate it (spaces, the repo data structure, the signature algo) I believe are critical to the overall design, using a separate uri scheme is not.

I’ve been wrestling with this one for a while now. But I’m currently leaning fairly strongly toward keeping the same at:// uri scheme as the public broadcast protocol.

My biggest problem is honestly narrative/branding. This might sound like a silly thing to focus on in protocol design, but I actually think it’s quite important and is where the worries of it feeling like a separate thing is coming from. Atproto, or “the atmosphere”, is supposed to be “the universal data network”. But in introducing a new URI scheme we’re kinda splintering the data space. Practically, this might not be the case (many things overlap & apps can easily use data from both protocols). But it still just doesn’t feel quite right to me. The at-uri is one of the Big Ideas of atproto. I think it’d be really unfortunate to dilute that idea.

I hope that by using the same scheme, it makes it clear that these two data protocols are ultimately part of the same system.

On a more practical note, uniting under one uri format makes more sense in the context of lexicons. I don’t think you want to have to redefine every lexicon for permissioned vs public use cases. And differentiating between possible sub formats like at-uri-public and at-uri-space in the event of wanting one in particular, (both subsets of the more generally used at-uri) seems more straight forward to me than defining a new space-uri format and having to fall back to uri when you want either.

So without further ado, my new proposal for the permissioned data uri structure is the following:

at://{spaceDid}/space/{spaceType}/{skey}/{authorDid}/{coll}/{rkey}

You’ll note it looks basically the same as the previous proposal except that there is one new segment, the literal space right after the space authority. This is clearly differentiable from the current common first path segment in at-uris which is an NSID and thus always has at least two .s in it. This pattern also opens up the possibility for identifying other atproto resources through at-uris, including labels and blobs.

This proposal goes against my previous argument in diary 5.

I gave three reasons:

    1. The resolution mechanism is pretty different for permissioned vs public data. Since URI resolution is informed by the scheme, we think it makes sense that the scheme is different.
    2. There are light security implications to the different URI formats. Permissioned data URIs should essentially never be viewed outside of the perimeter of their space. Bumping into one of these in the wild should look & feel different from a public URI.
    3. There is now a working group for AT Protocol at the IETF. One part of the charter is describing a URI scheme for the protocol. As we’re not (yet!) specifying the permissioned data protocol at the IETF, we want to avoid mixing up URI semantics with the working group’s work.

My current argument against the first is that at:// does not actually specify a sync mechanism. It identifies a piece of content in a data space. You could receive an atproto record on a thumbdrive and it may still be verified as “authentic”. In this sense, the resolution mechanism isn’t the most important thing. While the sync mechanisms vary between the public & permissioned protocols, the data space remains the same, and thus they should share a uri scheme.

I think the security implications are handled by the fact that permissioned data uris are much longer. That is a much simpler visual distinction than an extra letter in the scheme. I think this was always the weakest point, and I think the uri scheme is the wrong place to litigate this.

And on the IETF working group for atproto, I’ve had a couple of conversations with folks with much more experience at the IETF than I have that suggest this isn’t a big thing we should worry about. The IETF process, in some sense, is designed to handle exactly this. Companies do things with technologies in ecosystems, and as patterns emerge, they get specified through consensus at the IETF. There is some risk that the working group specifies a URI that breaks our permissioned data URIs. However that risk already exists for public data as well.

So let me know your thoughts!

In the mean time, I think I’ll probably update the proposal. Even if it’s in the proposal, it’s not set in stone yet. The proposal is just meant to capture the most up to date thoughts of where our heads are at.

8 Likes

A single URI scheme makes sense to me, and the literal space can never be a NSID so you’re not conflicting there. There may be an issue with outdated at-uri validation choking on new space at-uri’s, but I don’t think there’s a way through that, other than maybe defining constraints on at-uri’s. (e.g., this field accepts at-uri’s within this collection, or this at-uri must be a space at-uri)

5 Likes

I have a strong preference for separate URI schemes, but if the proposal does go forward with a singular scheme, I would not recommend inserting an extra keyword to differentiate public and permissioned URIs; the amount of path segments should be the differentiator. That being said, utilizing a reserved pseudo-NSID for blob references is something I can get behind.

3 Likes

I thought space:// felt pretty “different” but I could’ve lived with it.

Seems to like ats:// is a great choice though.

  • Makes it easy to know if software supports at:// (public) or ats:// (spaces)
  • Feels similar to http://https://
  • Nice and short, looks cool

The at:// with a special path /space/ segment seems a little hacky? Maybe at least define a general reserved space in there like /.well_known/space/or similar?

I still like ats:// though.

Great work guys!

4 Likes

I think one URI format that covers resources handled by two separate protocols is a very bad idea because the purpose of a URI is to indicate an object, and the protocol mechanism to reach it.

You write “I think the security implications are handled by the fact that permissioned data uris are much longer”, and any path that addresses security concerns with “the URI we have to be concerned about is visibly much longer” should be reconsidered. The history of URL usage across the web (and especially in email) is a history of bad visibility and user education leading to security issues.

I’m not dismissive of the social reasons to overload at:// , but I think the social human issues against overloading it are much stronger.

2 Likes

I’m generally in favor of this, provided that at-uri is treated as a first-class type.

I hope that keeping the existing at-uri format as the legacy-compatible base format will help reduce confusion during the transition. It should also help avoid a proliferation of string subformats in the future.

I don’t have a strong opinion on adding the extra segment, but if anything, I’m slightly against it. It may be a bit more robust than relying only on the number of path segments, but it still feels somewhat ad hoc to me.

1 Like

I recall the discussion of space:// as an option and ats:// was a good compromise on branding since the we wanted to keep it similar to at:// while also making it clear this was not the public open data model that the rest of the protocol abides by. I found the reasons laid out in your diary to be pretty strong justification as I do believe the first is honestly sufficient to convince me as otherwise you do need to build in some resolution logic into services working on the protocol to know how to treat this new special thing. I’m also partial to ats:// as it means we can build future things on that in the future that respect the same model.

On the other hand, using at:// does make it easier with lexicon development and is something I struggled with when working on the northsky permissioned data model as I was effectively forking every lexicon I wanted to use privately so it is a thing that we should address.

3 Likes

To me it seems like the underlying question is whether the approach is to treat permissioned data as a new protocol, or if there’s actually only one protocol and the original publicly-focused AT protocol is being reworked to handle permissioned data as well. If it’s a new protocol it should have a separate prefix, and as @evelyn.northsky.team points out the lexicon design needs to be adapted so that they can be cross-protocol. If it’s just a single protocol then it makes sense to have a single prefix … but from Daniel’s writing, that’s not how things were approached.

Abstractly, unless the approach considers a more fundmental rework (not just extension) of the protocol, I prefer a two-protocol approach; trying to staple permissioned data on after the fact means that it’s to be very hard to overcome the history of a protocol that was primarilly designed for public usage. But, I can also see the pragmatic arguments for trying to unify the two.

I’ve appreciated each diary entry and the reflections afterwards. Standardizing on at:// seems like the logical conclusion. I suspect spaces would benefit directly from the current IETF AT-URI effort instead of launching a separate and parallel effort.

I’m curious why /space/ and not /com.atproto.space/? Maybe an enhancement to lexicons for defining their own syntax that follows the nsid. There is query , procedure and record, why not something to define a custom and extensible handler.

Likewise, blobs could be defined under such a lexicon enhancement under com.atproto.blobs where /{cid} could be defined OR some other third party app could define their own media handler lexicon, such as com.example.stream with /{cid}[/{rendition}?bytes={x}-{y}. Flexibility over hard coding?

I look at stuff like this from a perspective of “how hard is it going to be if we change our minds later?” and, in this early phase where the entire approach could potentially need to be rethought after practical implementation, I think using a separate, experimental URL protocol-namespace like ats:// that doesn’t pollute the at:// protocol-namespace is the way to go. It seems to me that there’s less risk in starting with something like ats://, and then potentially integrating it later as a “sub-protocol” special case of at://, versus trying to “un-mix that yogurt” if you decide later it’d be better to give it its own special protocol and there’s already this ambiguously-cased double-meaning version of at:// strings out there.

In other words, if it turns out that there’s a smooth way Spaces can fit into at:// down the line, transitioning toward that can be figured out once it’s ready; whereas if you realize you need to go back to the drawing board (on whatever point, for whatever reason), and now there’s a whole class of at:// URLs from this brief experimental moment in history that don’t obey the dotted-NSID constraint at the second level in the wild, oop, now every at:// parsing implementation needs to handle the misstep of that legacy special case, forever. It’s easier to recognize the special case in the implementation before jumping into the logic fork for protocol handling than after.

1 Like

Putting my vote in for one scheme (at://). I think permission spaces are a part of the at protocol, and the proposal is sufficient to identify space resources without too much trouble.

1 Like

Thanks for all the feedback!

Wanted to chime in on two things in particular that I’ve seen come up a few times:

The reason for the extra segment is so that you can differentiate between a public record ref & a space reference. For instance

Record ref: at://did:ex:ample/com.example.coll/blah/

Space ref: at://did:ex:ample/com.example.space/blah/

Without the extra segment, there is no way to tell what is on the other side of these URIs (is it a record or a space?) without first doing a network request to resolve the relevant Lexicon. Space refs turn out to be very common in the code & I think it’s pretty important to avoid this sort of confusion.

Not trying to be cheeky, but this also depends on what you mean by “protocol”. Even though we commonly refer to atproto as a “protocol”, I actually think it’s a framework composed of several distinct protocols (DID, repo, sync, OAuth, etc) + a schema language (Lexicon).

In that sense, this is a distinct protocol. It is a new way of storing & syncing data that is distinct from the public protocol. However it shares all the other foundations (DIDs, OAuth, Lexicon, etc) with the public broadcast protocol. In that sense, it’s a part of the same framework which we refer to as “AT Protocol”.

I know this reads as annoying semantics, but it’s kinda the thrust of my point. I really don’t want this to be viewed as a separate, new thing, different from atproto. It’s part of atproto! Despite being a distinct data/sync protocol.

3 Likes