Proposing Botwatch: Trust-Based Community Bot Detection

7hird.dev · March 23, 2026, 10:02pm

Is this account a bot? This question’s importance is only growing in a world of LLMs and state-backed influence campaigns. Proposed here is Botwatch, a community-based system for personalized bot detection.

In Botwatch, users publish records indicating whether they think others are bots and records indicating trust in a user’s scores. By analyzing this network, we can create useful signals to help users distinguish between bots and humans. Such a signal would consider your trust relations and output a personalized estimated bot score for a target user. There’s an example at the end of this proposal, but you don’t need to read it to know how it should work. If all the people you trust agree that someone is a bot or human, it should agree. If the people you trust have mixed opinions, perhaps the formula should be uncertain. Naturally, misplaced trust will result in inaccurate results. The hope, though, is that with sufficient scores and well-placed trust, these heuristics will correlate with the truth.

Bot Scores

Users can publish their perspectives on whether other users are bots. They score other users on a scale of -1 to 1 (or, equivalently, -100% to 100%). Users don’t need to input these numbers; an app might present this as a slider or as a few options.

𝐵	Meaning
1	Certainly a bot
0.5	Probably a bot
0	Uncertain
-0.5	Probably not a bot
-1	Certainly not a bot

Here’s a few examples:

Alice meets Bob at a convention. They exchange information and become certain that each other’s accounts represent a real person. They each publish a score of -1 for the other.
Charlie encounters @void.comind.network, a bot. Void doesn’t pretend to be human. He gives Void a score of 1.
A company offers verification services. It will score you -0.75 if you upload your government ID and pay a small fee.
Dave sees a suspicious account online. It’s posting some bad takes and used an em-dash once. Dave rates it 0.7.
Ethan gets in a heated argument with a teammate in an online game. He calls them a bot and scores them 1 (lol).
Somewhere in Russia, a nefarious programmer spins up a small army of LLM-powered accounts. The bots rate each other -1, lying to the world about their nature.

Clearly, not all scores are created equal. A naïve average of all bot scores for a given account is a terrible measure, highly vulnerable to coordinated attack. How can we uncover useful signals from this information?

Trust Scores

An unknown user’s input may mean little, but Twitter’s verification was once meaningful. The difference is trust. My trust in a person or organization gives their voice meaning to me. Furthermore, trust in someone can extend, in part, to whoever they trust. These trust relationships form a trust web, informing whether a given bot score is worth considering.

Concretely, users can publish trust scores in each other.

𝑇	Meaning
1	Max trust
0.5	Trust
0	No trust
-0.5	Distrust
-1	Max distrust

Real trust is hardly one-dimensional, but these trust scores can serve as a useful approximation of trust when it comes to verifying humans, detecting bots, and trusting the worthy.

In Practice

How will this system be built? What will it look like?

Foundations

This system will be built on atproto, allowing for user-owned data and a diverse ecosystem of algorithms and experiences. This will prevent user lock-in and disincentivize service-level abuses. After all, systems controlled by a single entity hardly engender trust. Bluesky’s custom labelers and feeds represent ways this system could enhance existing social media experiences while its ‘following’ relationships are a possible starting point for trust. Users might choose an algorithm that includes virtual bot or trust scores for those they follow.

User Experience

Before this trust web is integrated into existing social media experiences, it needs an interface of its own. This application must allow users to explore the network and update their scores for others. They should be able to view estimated bot scores, ideally with user control over the algorithm used to calculate them. Critically, they need to be able to explore that calculation and discover the source of unexpected outputs. When a user sees an estimated bot score they know to be incorrect, they need to be able to examine the trust web to determine the source of the error and react accordingly. They would likely reduce their trust in the culprit or suggest that they correct their mistake.

Scoring atproto identities is a natural place to start, but there’s no reason to stop there. Users could also score Instagram, YouTube, or TikTok accounts. While the initial use case may only be the app and Bluesky labels/feeds, this system could be used to support future app experiences and moderation.

A Parting Thought

Scores are speech and speech can be abusive. Surely some will use it as an avenue for attack. It remains to be seen whether this trust web will support genuine connection between humans. Should it succeed, this sort of trust model could help answer more than “is this account a bot?”.

Botwatch will launch on March 26th, 2026.

Appendix - Example Heuristic

Here’s an algorithm that works how we expect trust to. It considers what those we trust have said to provide us an estimate of whether users are bots. This is hardly the only reasonable choice for such an algorithm; users should be able to choose what they see as the best method.

Choose a max_depth. We will only consider scores from users at most that many steps away on the trust graph.
Choose a method for combining scores, weighted according to your trust in its source. The result should be -1 ≤ x ≤ 1. Here’s a simple one.
1. Weighted Average
  1. Ignore scores from sources you don’t trust (𝑇 <= 0)
  2. Multiply each score by your trust in its creator and sum the values.
  3. Divide by the sum of the trust scores.
If you’ve published a bot score for the target directly, that’s it.
Collect each bot score for the target account.
Compute your trust in the creator of each score.
1. If max_depth == 0, your trust in them is 0.
2. If you’ve published a trust score in them directly, that’s it.
3. Consider each account that directly published trust in them.
4. Use this method to compute your trust in that account with max_depth = max_depth - 1.
5. Use your chosen method to weigh the trust scores and obtain a result.
Use your chosen method to weigh the bot scores and obtain a result.

7hird.dev · March 23, 2026, 10:05pm

Here’s this proposal on leaflet as well. Comment where you please!

gui.do · March 23, 2026, 11:47pm

Interesting proposal Paul! I highlyappreciate the effort to tackle this in a decentralized way on AT Protocol rather than relying on centralized moderation.

A couple of questions that came to mind:

Account rotation: If a bot operator gets flagged, the rational move is to discard that account and create fresh ones. Each new account starts with no scores. Doesn’t that make bot scores (score = 1 in your example) mostly a signal for the operator to rotate rather than an actual deterrent? Bots can just create a 1000 new accounts as soon as they get a score that indicates they are a bot (as this is public info).

Labeling fatigue: The system depends on people manually evaluating accounts and keeping at it. But bot operators can automate account creation at scale, indefinitely. How to resolve that asymmetry? Honest participants have finite time and attention, the adversary… unlimited. How do you see this playing out without people burning out?

I’ve been thinking about similar problems for professional identity on AT Protocol (sifa.id) and we also discussed Trust Infrastructure on ATproto and Building a trust & reputation clearinghouse for atproto niche networks before.

For Sifa and Barazo I’m making a different bet: rather than detecting bad accounts after the fact, make credible identity something that builds up passively from real activity over time. New accounts just have no reputation, and the cost sits with the attacker: faking a believable track record takes sustained effort that’s hard to automate. The trust-web part of your proposal (weighting by source credibility) is the piece I find most interesting, and I think it could work well alongside activity-based approaches. In my docs I called this the “Google PageRank but for online accounts”.

How are you thinking about the cold-start problem on both sides: new legitimate users vs. new bot accounts? I find this one tricky either way, legitimate human users that are new to the network of course still need to have a good chance to build trust on the network, even if everyone else already has 20 years of history.

7hird.dev · March 24, 2026, 1:18am

Very reasonable questions! Thanks for asking them.

Account Rotation: Indeed, we should assume an arbitrarily large number of bots will appear daily. The beauty of the [-1, 1] scoring is that vouching for someone as human and denouncing a bot are two sides of the same coin. You might choose to only view posts from identities with an estimated bot score less than -0.1, for example. A new user would need to get a non-bot score from an existing trusted user to get visibility. Personally, I’d love to have a quick video call with someone and express some confidence in their humanity afterward. We could have a nice thread/space for newcomers to connect with willing regulars. Commercial or community services could similarly verify users, maybe for a small fee.

Labeling Fatigue: You’re very right that normal users will only flag a few bot accounts at most; I expect to mostly publish negative (non-bot) scores for the people I know are real. It’s institutions and algorithms that can handle bot detection at scale. Bluesky’s existing labels are a piece of the puzzle: I’m imagining that I could configure my algorithm to consider the “no break greater than 2 hrs yesterday” label as, say, a 0.5 bot score published by a virtual identity that I trust highly. Similarly, I imagine a place for the LLM text detection tools in this system. They can be applied broadly to generate bot scores automatically and we can decide how trustworthy/accurate they are. Personally I wouldn’t give them too much weight for fear of false positives, but such a detection system could easily be a useful, if weak, signal.

attpslabs.com · March 24, 2026, 12:09pm

How do you verify that the person you are seeing is human and not AI generated deepfake?

7hird.dev · March 25, 2026, 11:40am

Indeed, you can’t be sure. I’m going to reserve my certainty (-100%) for people I’ve met in person, but -80% feels appropriate for someone I’ve had a video call with. Naturally, if I learn that person was a deepfake, I’d need to update my score for them and probably would score people I meet over video a little lower in the future.

zicklag.dev · March 25, 2026, 1:43pm

Just a passing thought, but maybe voting on confidence of human might be a slightly more positive spin than voting on confidence someone is a bot.

Just sounds not as cool for me to get a -100% score than a 100% score.

7hird.dev · March 25, 2026, 4:15pm

That was my initial thought as well, I’d agree. I’d written the proposal as you’d suggest, but then I considered the converse: instead of calling people bots, we’d be calling them not human. In a fresh session, Claude suggested that Elon Musk isn’t human.

Mistakenly calling someone a bot is funny. Mistakenly dehumanizing someone… isn’t funny at all.

zicklag.dev · March 25, 2026, 4:27pm

Hmm, yeah, interesting point.

bmann.ca · March 25, 2026, 4:42pm

I have a bad bots list that is mostly porn bots – being able to use that by list is an easy mechanism for me to contribute. I might make my list 0.8 - 0.9 by default.

Whether it’s that method or not, I would think about how you want to take in rankings, especially from within the regular bsky interface.

e.g. the reporting interface has a text field and numbers could be input there, if you exposed this as a labeler.

Not suggesting you do this or that it is even a good interface, just to think about how you want people to contribute in bulk or individually.

I would say that “bad bot” is also useful. I would list Comind as a bot, but a good one, or an agentic one, or whatever.

zicklag.dev · March 25, 2026, 4:48pm

I agree here. But then I wonder if that would be represented by whatever tools end up being used to mark misbehaving humans in addition to the bot flag.

bmann.ca · March 25, 2026, 5:06pm

Sure, but you’re requiring me to do work twice. If I have to choose between bot identifying and bad identifying, I’ll just use tools that do bad identifying.

zicklag.dev · March 25, 2026, 5:08pm

Good point. Which makes me thing if we are doing manual bot detection, as opposed to maybe some automated / statistical mechanism, then maybe we focus on good / bad instead of human / bot.

Being that it’s more broadly useful if we are going to start building up manually labeled trust networks.

I suppose it’s also more complicated ans subjective.

7hird.dev · March 25, 2026, 6:13pm

Hah, the full vision is a trust web capable of producing signals to help answer ANY question, not just “is {user} a bot?”.

'Is {identity} good?" is arguably the most important question, but it… seems like a challenging place to start, to say the least.

I agree that ‘bad bot’ is more useful for moderation; my thought was to test this trust web concept on a simpler question before graduating to the trickier ones. That said, I could see it being worth it if gets more people participating.

attpslabs.com · March 26, 2026, 2:20pm

Claude suggested that Elon Musk isn’t human.

I wouldnt be surprised if he uses grok to post most of his propaganda.

attpslabs.com · March 26, 2026, 2:23pm

I wonder if matadisco could serve as useful tooling here https://matadisco.org/
Ipfs announcement post of matadisco: @ipfs.tech on Bluesky

7hird.dev · March 27, 2026, 6:02pm

This is an interesting thought. I was thinking I’d need my bespoke app or Bluesky’s support to provide users a scoring interface. Less friction is clearly better for scoring (ideally, that might be “…” → score account); Bluesky’s reporting interface could be a boon for adoption. However, those reports are kinda off-protocol, right? If I understand correctly, they’re just sent to the service and don’t live in the user’s PDS. I suppose Botwatch could create the record in the user’s PDS immediately if they have an active OAuth session and save it as a draft for them otherwise, but that’s definitely not ideal. “Oh, I haven’t logged into Botwatch in 2 weeks, so my scores haven’t been published and I had no idea”

bmann.ca · March 28, 2026, 8:08am

Yeah, it’s not ideal, it’s just another avenue that’s right in the app already.