Hey everyone,
Picking up from a few weeks ago, I created a PR that I’d like comment on. I’m specifically looking for who would be interested in using this lexicon to interop:
Hey everyone,
Picking up from a few weeks ago, I created a PR that I’d like comment on. I’m specifically looking for who would be interested in using this lexicon to interop:
Myself and Blacksky would like to push this through as a working group and are committed to adopting this in our app in an interoperable way.
I’d like to be involved in that. I took a first shot at a lexicon for identifying as automated but that’s largely useless without some way for users to communicate their own needs. I’d love to see this handled at a level that is not focused on Bluesky-based lexicons.
Hey, would like to be involved for Eurosky too. Cheers!
I’d like to be involved as well! Would be helpful for Charcoal.
I’d like to participate in the discussion. Liccium would implement the lexicon if it makes sense. I also left a comment on Github:
See previous discussion here
I see in Sebastian’s github comment that the issue of where to express licensing and/or AI terms has come up. I believe account-level is a bit of a fig leaf because the “unit of measure” for the web is still (and realistically will be for decades) the HTTP URL, not the at:// URI, and it is a “polite software” strategy to politely ask all webviews to properly crawl-to and expand and apply account-level metadata onto each display of an at:// URI…
the more extreme web3 position would be that if AI preferences and licensing terms aren’t signed by the private key of a public key in the DID Doc, you’re just inviting bad actors to impolitely ignore or strip that metadata, and you should keep these declarations not just in each URL but in each piece of verifiable data full stop. that’s where i’m increasingly convinced ActivityPub needs to go as Client-to-Server inches its way to production, and I would suggest it here as well.
Read through your comment. I don’t have any objections really. To bridge the gap between “we should” and “this is how”, can you provide some examples of what specifically you’d like changed or incorporated?
I am working on getting an opt-out feature delivered in my current development sprint.
I am not personally seeing any objection to adopting community.lexicon.preference.ai – from what I’m seeing folks either want to include other preference options or to expand the preferences to other record types beyond the account level.
That does not seem to preclude moving forward with this as-is which would serve my short term needs personally and then these other use cases could be addressed in ongoing conversations or follow on efforts.
Is anyone opposed to merging the lexicon as is?
Great call – the tradition is basically “does anyone have ‘over my dead body’ objections?” otherwise keep things moving.
First, I think there are some issues with the proposal from a legal perspective in that the language is very mixed with regards to whether it is just a suggestion about what a user prefers or if it’s trying actually create some sort of legally enforceable set of constraints or allowing users to give affirmative permission for their data to be used if a law were to require that.
Saying they are “user preferences” to me implies that it’s simply what the user prefers and is not even trying to be legally binding in any way.
However, the phrasing “These schemas allow users to declare how their public data may be used by external consumers” seems much stronger and that it implies that it is legally binding.
I am not a lawyer, so I can’t really offer any advice on whether this sort of thing could actually place enforceable legal obligations on external data consumers (and if it can, it almost certainly depends on the country). However, if that is the goal, it seems pretty important to have a lawyer review this and add what is likely to be many pages of additional legal statements.
For example, when it comes to defaults or situations where a user has indication some preferences, but not all, it is possible that there is a meaningful legal distinction in some jurisdictions. Perhaps users explicitly opting out confers additional legal penalties when violated or legal use of the data requires users to take some affirmative action to consent beyond just accepting the defaults. It’s possible the inability to distinguish between those could create problems.
If this is just a sort of for fun set of suggestions that no one is even pretending might have any legal implications for the use of the data, then I think the language should make that more explicitly clear so that users understand this isn’t the basis for them to sue anyone.
Secondly, from the perspective of an AI builder who is trying in good faith to comply with these preferences, there is a lack of clarity that makes it extremely difficult. Most terms of service call for granting a license to the data in perpetuity to make things easy. That probably doesn’t work for this, but not having any listed time frame potentially makes it impossible to comply with. Most notably what happens when a user changes their preferences to opt out of something that they had previously opted into.
For example, being clear that a current user preference for their data to being used to train an AI model confers a license to train the model for 3 years from the date of access and then the right to use the model in perpetuity after it has been trained. Shorter times obviously react more quickly to changing user preferences, but if they are too short it makes certain applications, such as publishing research with an accompanying data set for someone else to replicate the results much tougher.
There are plenty of bad actors who will ignore the user preferences no matter what, so if there is any worthwhile goal of this, it should be to allow someone to build something with AI that they can affirmatively say is done completely with the consent of the users who generated the data. Right now, I think it still falls short of that.
Rereading this after posting my comment also makes me realize that in many legal contexts, it may not be the text of the lexicon documentation that matters, but the text that was shown to the user when they indicate their preferences. If every site that allows someone to change the data for this lexicon uses different language, it starts to really complicate things.
At minimum, it seems like the documentation for the lexicon needs to include the text to be shown to users when they indicate their preferences. But if that language is ever updated, since it would be shown on a network of different websites that’s no one can force to update, the lexicon probably should store some indication of which version of the text the user was shown, if not the entirety of the text itself.
Thanks everyone. This has been merged and is now live!