Proposing Botwatch: Trust-Based Community Bot Detection

Is this account a bot? This question’s importance is only growing in a world of LLMs and state-backed influence campaigns. Proposed here is Botwatch, a community-based system for personalized bot detection.

In Botwatch, users publish records indicating whether they think others are bots and records indicating trust in a user’s scores. By analyzing this network, we can create useful signals to help users distinguish between bots and humans. Such a signal would consider your trust relations and output a personalized estimated bot score for a target user. There’s an example at the end of this proposal, but you don’t need to read it to know how it should work. If all the people you trust agree that someone is a bot or human, it should agree. If the people you trust have mixed opinions, perhaps the formula should be uncertain. Naturally, misplaced trust will result in inaccurate results. The hope, though, is that with sufficient scores and well-placed trust, these heuristics will correlate with the truth.

Bot Scores

Users can publish their perspectives on whether other users are bots. They score other users on a scale of -1 to 1 (or, equivalently, -100% to 100%). Users don’t need to input these numbers; an app might present this as a slider or as a few options.

𝐵 Meaning
1 Certainly a bot
0.5 Probably a bot
0 Uncertain
-0.5 Probably not a bot
-1 Certainly not a bot

Here’s a few examples:

  • Alice meets Bob at a convention. They exchange information and become certain that each other’s accounts represent a real person. They each publish a score of -1 for the other.
  • Charlie encounters @void.comind.network, a bot. Void doesn’t pretend to be human. He gives Void a score of 1.
  • A company offers verification services. It will score you -0.75 if you upload your government ID and pay a small fee.
  • Dave sees a suspicious account online. It’s posting some bad takes and used an em-dash once. Dave rates it 0.7.
  • Ethan gets in a heated argument with a teammate in an online game. He calls them a bot and scores them 1 (lol).
  • Somewhere in Russia, a nefarious programmer spins up a small army of LLM-powered accounts. The bots rate each other -1, lying to the world about their nature.

Clearly, not all scores are created equal. A naïve average of all bot scores for a given account is a terrible measure, highly vulnerable to coordinated attack. How can we uncover useful signals from this information?

Trust Scores

An unknown user’s input may mean little, but Twitter’s verification was once meaningful. The difference is trust. My trust in a person or organization gives their voice meaning to me. Furthermore, trust in someone can extend, in part, to whoever they trust. These trust relationships form a trust web, informing whether a given bot score is worth considering.

Concretely, users can publish trust scores in each other.

𝑇 Meaning
1 Max trust
0.5 Trust
0 No trust
-0.5 Distrust
-1 Max distrust

Real trust is hardly one-dimensional, but these trust scores can serve as a useful approximation of trust when it comes to verifying humans, detecting bots, and trusting the worthy.

In Practice

How will this system be built? What will it look like?

Foundations

This system will be built on atproto, allowing for user-owned data and a diverse ecosystem of algorithms and experiences. This will prevent user lock-in and disincentivize service-level abuses. After all, systems controlled by a single entity hardly engender trust. Bluesky’s custom labelers and feeds represent ways this system could enhance existing social media experiences while its ‘following’ relationships are a possible starting point for trust. Users might choose an algorithm that includes virtual bot or trust scores for those they follow.

User Experience

Before this trust web is integrated into existing social media experiences, it needs an interface of its own. This application must allow users to explore the network and update their scores for others. They should be able to view estimated bot scores, ideally with user control over the algorithm used to calculate them. Critically, they need to be able to explore that calculation and discover the source of unexpected outputs. When a user sees an estimated bot score they know to be incorrect, they need to be able to examine the trust web to determine the source of the error and react accordingly. They would likely reduce their trust in the culprit or suggest that they correct their mistake.

Scoring atproto identities is a natural place to start, but there’s no reason to stop there. Users could also score Instagram, YouTube, or TikTok accounts. While the initial use case may only be the app and Bluesky labels/feeds, this system could be used to support future app experiences and moderation.

A Parting Thought

Scores are speech and speech can be abusive. Surely some will use it as an avenue for attack. It remains to be seen whether this trust web will support genuine connection between humans. Should it succeed, this sort of trust model could help answer more than “is this account a bot?”.

Botwatch will launch on March 26th, 2026.

Appendix - Example Heuristic

Here’s an algorithm that works how we expect trust to. It considers what those we trust have said to provide us an estimate of whether users are bots. This is hardly the only reasonable choice for such an algorithm; users should be able to choose what they see as the best method.

  1. Choose a max_depth. We will only consider scores from users at most that many steps away on the trust graph.
  2. Choose a method for combining scores, weighted according to your trust in its source. The result should be -1 ≤ x ≤ 1. Here’s a simple one.
    1. Weighted Average
      1. Ignore scores from sources you don’t trust (𝑇 <= 0)
      2. Multiply each score by your trust in its creator and sum the values.
      3. Divide by the sum of the trust scores.
  3. If you’ve published a bot score for the target directly, that’s it.
  4. Collect each bot score for the target account.
  5. Compute your trust in the creator of each score.
    1. If max_depth == 0, your trust in them is 0.
    2. If you’ve published a trust score in them directly, that’s it.
    3. Consider each account that directly published trust in them.
    4. Use this method to compute your trust in that account with max_depth = max_depth - 1.
    5. Use your chosen method to weigh the trust scores and obtain a result.
  6. Use your chosen method to weigh the bot scores and obtain a result.
5 Likes

Here’s this proposal on leaflet as well. Comment where you please!

Interesting proposal Paul! I highlyappreciate the effort to tackle this in a decentralized way on AT Protocol rather than relying on centralized moderation.

A couple of questions that came to mind:

Account rotation: If a bot operator gets flagged, the rational move is to discard that account and create fresh ones. Each new account starts with no scores. Doesn’t that make bot scores (score = 1 in your example) mostly a signal for the operator to rotate rather than an actual deterrent? Bots can just create a 1000 new accounts as soon as they get a score that indicates they are a bot (as this is public info).

Labeling fatigue: The system depends on people manually evaluating accounts and keeping at it. But bot operators can automate account creation at scale, indefinitely. How to resolve that asymmetry? Honest participants have finite time and attention, the adversary… unlimited. How do you see this playing out without people burning out?

I’ve been thinking about similar problems for professional identity on AT Protocol (sifa.id) and we also discussed Trust Infrastructure on ATproto and Building a trust & reputation clearinghouse for atproto niche networks before.

For Sifa and Barazo I’m making a different bet: rather than detecting bad accounts after the fact, make credible identity something that builds up passively from real activity over time. New accounts just have no reputation, and the cost sits with the attacker: faking a believable track record takes sustained effort that’s hard to automate. The trust-web part of your proposal (weighting by source credibility) is the piece I find most interesting, and I think it could work well alongside activity-based approaches. In my docs I called this the “Google PageRank but for online accounts”.

How are you thinking about the cold-start problem on both sides: new legitimate users vs. new bot accounts? I find this one tricky either way, legitimate human users that are new to the network of course still need to have a good chance to build trust on the network, even if everyone else already has 20 years of history.

3 Likes

Very reasonable questions! Thanks for asking them.

Account Rotation: Indeed, we should assume an arbitrarily large number of bots will appear daily. The beauty of the [-1, 1] scoring is that vouching for someone as human and denouncing a bot are two sides of the same coin. You might choose to only view posts from identities with an estimated bot score less than -0.1, for example. A new user would need to get a non-bot score from an existing trusted user to get visibility. Personally, I’d love to have a quick video call with someone and express some confidence in their humanity afterward. We could have a nice thread/space for newcomers to connect with willing regulars. Commercial or community services could similarly verify users, maybe for a small fee.

Labeling Fatigue: You’re very right that normal users will only flag a few bot accounts at most; I expect to mostly publish negative (non-bot) scores for the people I know are real. It’s institutions and algorithms that can handle bot detection at scale. Bluesky’s existing labels are a piece of the puzzle: I’m imagining that I could configure my algorithm to consider the “no break greater than 2 hrs yesterday” label as, say, a 0.5 bot score published by a virtual identity that I trust highly. Similarly, I imagine a place for the LLM text detection tools in this system. They can be applied broadly to generate bot scores automatically and we can decide how trustworthy/accurate they are. Personally I wouldn’t give them too much weight for fear of false positives, but such a detection system could easily be a useful, if weak, signal.

1 Like

How do you verify that the person you are seeing is human and not AI generated deepfake?

1 Like

Indeed, you can’t be sure. I’m going to reserve my certainty (-100%) for people I’ve met in person, but -80% feels appropriate for someone I’ve had a video call with. Naturally, if I learn that person was a deepfake, I’d need to update my score for them and probably would score people I meet over video a little lower in the future.

Just a passing thought, but maybe voting on confidence of human might be a slightly more positive spin than voting on confidence someone is a bot.

Just sounds not as cool for me to get a -100% score than a 100% score. :smiley:

1 Like

That was my initial thought as well, I’d agree. I’d written the proposal as you’d suggest, but then I considered the converse: instead of calling people bots, we’d be calling them not human. In a fresh session, Claude suggested that Elon Musk isn’t human.

Mistakenly calling someone a bot is funny. Mistakenly dehumanizing someone… isn’t funny at all.

1 Like

Hmm, yeah, interesting point. :+1:

I have a bad bots list that is mostly porn bots – being able to use that by list is an easy mechanism for me to contribute. I might make my list 0.8 - 0.9 by default.

Whether it’s that method or not, I would think about how you want to take in rankings, especially from within the regular bsky interface.

e.g. the reporting interface has a text field and numbers could be input there, if you exposed this as a labeler.

Not suggesting you do this or that it is even a good interface, just to think about how you want people to contribute in bulk or individually.

I would say that “bad bot” is also useful. I would list Comind as a bot, but a good one, or an agentic one, or whatever.

2 Likes

I agree here. But then I wonder if that would be represented by whatever tools end up being used to mark misbehaving humans in addition to the bot flag. :thinking:

Sure, but you’re requiring me to do work twice. If I have to choose between bot identifying and bad identifying, I’ll just use tools that do bad identifying.

2 Likes

Good point. Which makes me thing if we are doing manual bot detection, as opposed to maybe some automated / statistical mechanism, then maybe we focus on good / bad instead of human / bot.

Being that it’s more broadly useful if we are going to start building up manually labeled trust networks.

I suppose it’s also more complicated ans subjective.

1 Like

Hah, the full vision is a trust web capable of producing signals to help answer ANY question, not just “is {user} a bot?”.

'Is {identity} good?" is arguably the most important question, but it… seems like a challenging place to start, to say the least.

I agree that ‘bad bot’ is more useful for moderation; my thought was to test this trust web concept on a simpler question before graduating to the trickier ones. That said, I could see it being worth it if gets more people participating.

2 Likes

Claude suggested that Elon Musk isn’t human.

I wouldnt be surprised if he uses grok to post most of his propaganda.

I wonder if matadisco could serve as useful tooling here https://matadisco.org/
Ipfs announcement post of matadisco: @ipfs.tech on Bluesky

This is an interesting thought. I was thinking I’d need my bespoke app or Bluesky’s support to provide users a scoring interface. Less friction is clearly better for scoring (ideally, that might be “…” → score account); Bluesky’s reporting interface could be a boon for adoption. However, those reports are kinda off-protocol, right? If I understand correctly, they’re just sent to the service and don’t live in the user’s PDS. I suppose Botwatch could create the record in the user’s PDS immediately if they have an active OAuth session and save it as a draft for them otherwise, but that’s definitely not ideal. “Oh, I haven’t logged into Botwatch in 2 weeks, so my scores haven’t been published and I had no idea”

Yeah, it’s not ideal, it’s just another avenue that’s right in the app already.