Towards Sustainable Moderation for 3rd Party ATProto Development

I had the chance to chat with Ronen and Wesley from semble.so at the ATProto Boston meetup last week. They mentioned to me that a topic of discussion among non-Bluesky ATProto developers in Montreal was a growing recognition that they would need to start doing moderation on their services sooner rather than later.

I was very quickly able to demonstrate that labeling third party lexicons was fairly simple. That said, I think tooling is the easy part—between larger organizations like Roost and Zentropi providing tools for moderation, and independent efforts like what I’ve built with Skywatch Automod being reasonably easy to adopt, I don’t think tooling is going to be the challenge.

We know that content moderation is difficult to do at scale, and I’m guessing that many third party services would prefer not to have to build their own tooling and T&S teams in house to handle it at first. This then begs the question, what does Moderation-as-a-Service or as a pooled common resource look like and perhaps more importantly, how do we sustain it.

Some initial questions that come to mind are:

  • What form does this take: is it a pro-competitive commons (a la Eurosky’s CoCoMo) or a contracted service?
  • What is it’s purview: Do folks want something that will cover just the worst of the worst—i.e., ensuring that CSAM and illegal content is quickly uncovered and removed, or something a bit more full-stack and customizable. The former is more simple to stand-up and the business model for the latter probably looks different.

These questions will inform what the entity looks like. I’d love to hear the thoughts of others—I’m sure folks will have other questions for us to consider and ideas of their own to bring forward.

8 Likes

I think it depends on your role in the network: if you’re a PDS host, you’re probably wanting to scan for CSAM and malware, since distributing these could get you in legal problems.

If you’re an app view, it’ll really depend on the content types you support, but for many apps they’ll at least need CSAM / hash & match scanning on profile banners and maybe avatars (depending on size). Some applications may need detection of spam or harmful URLs (e.g., link aggregators like frontpage). Video platforms will probably probably need hash & match + content ID + audio recognition, such that music rights can correctly be paid for any commercial music.

Then you get to the more general community reporting and defining moderation policies, and that’s something that takes a lot of work.

Besides that you probably also need spam & other clearly malicious abuse detection.

2 Likes

I’d probably also say that for non-bluesky projects, looking at what they’re doing with Osprey is particularly interesting: GitHub - bluesky-social/osprey-atproto: A set of packages, UDFs, and rules to use Osprey with ATProto

That can provide good tooling for detecting and acting on patterns within AT Protocol data.

@thisismissem.social I agree re Osprey, this is why I don’t think the focus should be on tools. Tools are a solved / easily solved problem. The issue is that every third-party developer is not going to want to figure the tools out and stand-up their own moderation infrastructure, at least at first, so what can we build that essentially lets them outsource that to another entity.

2 Likes

Thanks for kicking off this discussion.

I think for starters is more documentation in what needs to be setup so it can be done at all.

That is, you documenting / explaining a bit more what Skywatch Automod can do. Can you add some links to the source and a brief overview, too?

Images and video are the big expensive ones.

But starting with profile names, and really any text record content, would be amazing.

Or are you thinking you would want to stand up one big instance to handle things? This feels compatible with Slices.

To me, investing in trust & safety is just a requirement for building a social application. It comes with the territory.

Maybe we could create some sort of guide that helps people figure out the tools they need to deploy for their given application?

There’s pretty limited options here due to the expense of hash & match (has some complex legal risks + bandwidth/cpu/gpu bound operations)

There’s maybe something that can be done for offering malware scanning for PDS admins, but would need to cost money to maintain + build

Other than that you get more into figuring out policies and how to apply them and build a moderation team, and that’s just hard work that’s part of shipping a social app.

@bmann.ca : had a busy week at work, but I see your call for more thorough documentation, I will take advantage of some down time this week to do that and talk a bit more about capabilities.

Appreciate you!

We put time in when we can. A lot of things happening make this all feel like a sprint, but it really is a marathon. Pace yourself, get to it when you can!