Licensing terms for data in the Atmosphere

knowtheory.net · January 29, 2026, 10:05pm

I’d like to raise a fun and interesting topic, and that’s intellectual property and content licensing.

The data that flies around the Atmosphere is all visible and public. We’re at a stage within the Atmosphere where services are beginning to grow, and we’re beginning to see lexicons and data that express creative works that extend in length beyond microblogging. I think that the Standard.site lexicons are a great example.

But these creations lead to further questions. Who has permission to republish data that’s posted to the Atmosphere? Should we be making licenses to publish data clearer?

Should lexicons like Standard.site include licensing fields? Should we be recommending particular licenses for publishing to open social ecosystems?

Additionally, who in the Free Culture ecosystem should we be tagging into this conversation?

baldemo.to · January 29, 2026, 10:28pm

There was a proposal regarding user intent flags a la robots.txt that could feasibly be extended to more granular permission sets. I’m not entirely sure what the current status of this proposal is, though.

github.com/bluesky-social/proposals

0008-user-intents/README.md

main

0008: User Intents for Data Reuse
=================================

*Join the [Github](https://github.com/bluesky-social/atproto/discussions/3617) discussion here.*

*Some of the proposals we publish are nearly complete, and represent specification drafts. Others are sharing early work for feedback, and communicate a general direction more than the specifics. This proposal is one of the later. We are interested in aligning both technical mechanisms and policy language with other emerging efforts, and are sharing this proposal as a first step.*

This draft proposal describes how atproto accounts (eg, Bluesky users) could declare "intents" (aka, preferences) about certain categories of reuse of their public content. The mechanism and expectations are similar to `robots.txt` files on the web: a machine-readable format, which good actors are expected to abide, and does carry ethical weight, but is not legally enforceable. In particular, there is not a burden on hosting providers (eg, atproto PDS or Relay operators) to "enforce" these preferences by investigating scrapers or blocking requests for user data.

Intents would be limited to specific reuse categories, not free-form declarations or complex parameterized access control lists. The initial categories described here include:

* generative AI
* protocol bridging
* bulk datasets
* public archiving and preservation

These categories are individually defined and discussed below. Public atproto data is still public: this is not a mechanism for users to declare their accounts "private". Virtually all existing infrastructure and applications would continue to operation for all accounts in the network.

Mechanically, user intent declarations would be records in public atproto repositories. Applications (eg, the Bluesky App) would allow configuration of intents in a settings dialog. Intent preferences would be tri-state: explicitly allow, explicitly disallow, or undefined. Design options around mechanisms are described in a later section.

This file has been truncated. show original

erlend.sh · January 29, 2026, 10:32pm

Would be nice to have this incorporated:

RSL enables multiple publishers or content owners to reference a shared set of licensing terms, such as open source or Creative Commons license frameworks, through the <standard> element. By pointing to a common license URL, websites can declare that their content is governed by the same standard agreement. When a client application (e.g., an AI company) accepts the terms of a standard license, it gains access to all content covered by RSL licenses that reference that shared URL through their <standard>element.

In addition to supporting established frameworks, the ability to easily define and implement shared licensing terms gives content owners the ability to unify their voices and negotiate collective licensing agreements with AI companies and other applications that require access to large collections of digital content, including websites, books, videos, and proprietary datasets.

They don’t make it very clear but there’s an option to express ‘AI (training) prohibited’ which is what I’d want for my content. Would’ve liked to have a slightly more nuanced prohibition that makes an exception for Digital Commons usage in research and academia though.

bnewbold.net · January 29, 2026, 10:54pm

I can’t remember if I’ve put something out publicly, but I think media objects (blobs) in particular should usually have license and attribution metadata attached when designing lexicons.

Two metadata fields:

licenseUri: string, format=uri, optional. standard/normalized URL(or short URI) for common licenses, or can point to more bespoke license/policy page
attribution: string, optional: more free-form field for listing a name or source of the media. can also be a URI that links back to the original source

These fields would go alongside things like alt-text. If there are multiple pieces of media in a record, it should be possible to attach different metadata to each (same as with alt-text).

bnewbold.net · January 29, 2026, 11:00pm

I wrote that proposal and still think it is the best way forward for public atproto records/repos in the general case. Media files (blobs) maybe need more granular metadata (see my sibling post in this discussion).

I registered a domain and created a demo for this system a while back: https://demo.user-intents.org/

I’d love to collaborate with folks on getting that project shipped and adoption in the ecosystem. There have been a bunch of AI and monetization debates which have made it hard to get a more generic system out; focusing on the non-AI parts might help move it forward.

lu.is · January 29, 2026, 11:57pm

I have a lot of thoughts and very little time right now. But some quick bullet points:

All of this is on uncertain legal ground right now, which makes it hard to give legal advice and harder still to give creative, optimistic legal advice.
I have talked to @bnewbold.net and his “less is more” instincts are fundamentally sound here. May make sense for those two simple fields to be a recommended baseline for new lexicons—it is imperfect but hard to go wrong at least starting with those two fields.
To the extent lexicon fields can be typed (I don’t know the answer to this?) It may be helpful to specify that the license field prefers the SPDX license names, since “it’s just a URL” often results in unparseable messes as people slap slightly mangled copies of licenses on their own web pages.
I suspect for some lexicons, it may be worth having specificity about which fields are actually being licensed, as you might want (eg) identity to be tightly held but content to be broadly licensed.
RSL is interesting, but it is early and you probably don’t want to lock in a particular standard this early in the game. Flip side, though, I bet they’d see outreach from this community as an interesting and welcome test case.

Hope these are some helpful notes. Will try to check back in again soon.

[For those of you who don’t know me, I’m a Licensing Guy - former Open Source Initiative board, counsel to Mozilla and WMF, currently Creative Commons board]

bumblefudge.com · February 2, 2026, 4:45pm

AI-pref is still in limbo/derailment state at IETF best i can tell, so i would recommend to wait to see where that data model lands and don’t make something out of alignment with it. and yeah at the lexicon level probably makes sense.

i have been ruminating on how an iscc field in a lexicon would work, for content whose author(s) want its authorship, licensing, and/or ai-prefs to be “findable” (vector/perc hashing) across edits and metadata stripping… seems to be the same metadata pattern, and there too metadata at the lexicon/MASL level makes the most sense imho

robin.berjon.com · February 11, 2026, 11:32am

Conversely: AI Prefs is in limbo, which gives us the opportunity to lead?

User intents (non-ideal name but sound idea IMHO) with some tweaks as mentioned by @lu.is and taking into account some RSL ideas (without the XML — we know that it won’t be deployed in a conformant way and this’ll be RSS all over again) could be Good Enough™ to ship something and drive adoption I’d think?

blaine.bsky.social · February 11, 2026, 2:53pm

This is good. I don’t know how this could/should work, but a mutual license, kind of like copyleft, for atproto/decentralized data could be very interesting.

“I’ll license mine if you license yours”

erlend.sh · February 15, 2026, 9:34am

Useful framing of what we’re grappling with:

Broadly speaking, two distinct visions are emerging for how societies might respond to this moment.

Abundance and redistribution

In this scenario— outlined in Open Future’s Beyond AI and Copyright white paper— societies accept that digital information is fundamentally abundant and stop trying to recreate scarcity where none naturally exists.
(…)
Openness remains the default. Public-domain works stay public. Access is managed for sustainability, not weaponized to extract rents. AI becomes a layer that expands access to knowledge while reinforcing—rather than hollowing out—the institutions that produce, curate, and contextualize it.

Scarcity as a market strategy

The second scenario also starts from the recognition that the traffic-based web economy is breaking down as search gives way to AI-mediated access. But it is being advanced primarily by a different set of actors—large infrastructure providers, content delivery networks, and publisher coalitions—whose response is to re-engineer scarcity at the point of access. This vision has been articulated most explicitly by Cloudflare CEO Matthew Prince in a Stratechery interview.

In this vision, the collapse of the old bargain is treated not as a reason to rethink how value is redistributed across the information ecosystem, but as an opportunity to turn information into a tradable input once again.

jackvalinsky.com · February 24, 2026, 12:07am

On the ATProto Touchers’ discord there was a conversation I started by asking about the legal issues of reusing another project’s lexicons. This is important because a lot of atproto projects share and reuse lexicons and this is a strength of the ecosystem (ex: standard.site). The consensus was that it will probably never matter because most projects are small and if someone does try to sue then there’s an argument for fair use. Note, fair use isn’t automatic, it has to be argued in court. Getting sued isn’t usually the immediate worry but potential legal harassment (ex: scary legal letter from lawyer).

Most importantly: people should license their lexicons in a way to promote this sharing/remixing. Otherwise it breaks the mental model and ethos of atproto and the Atmosphere.

erlend.sh · April 5, 2026, 12:28pm

From @ngerakines.me

Add community.lexicon.preference.ai lexicon by ngerakines · Pull Request #72 · lexicon-community/lexicon · GitHub

I’d like a way to say ‘can only be used in training for non-commercial purposes’. Maybe the above supports it, but that’s unclear to me.

What @blaine.bsky.social has proposed makes that affordance and many others explicit by way of a descriptive rather than prescriptive approach:

surfdude29.ispost.ing · April 7, 2026, 2:26pm

Given that the new gallery embed type lexicon is being designed, could those metadata fields please be included in that?

bmann.ca · April 7, 2026, 5:14pm

Discussion here too