A collection of all lexicons/collections would be great

and I cannot be the first person who has musing about all the cool things you could do with that, right?

However both Google and a number of LLMs did not come up with anything useful. So I figured I should better ask here and get some pointers :slight_smile:

If not I will continue my collecting of ATmosphere collections and would be happy to share with anybody interested …. Let me know.

4 Likes

I think there are at least two projects that are indexing all of the lexicons.

I can’t ever remember their names to find them again. :cry:

Ah, here’s one:

I think I’ve heard of one other one.

5 Likes

lexidex is a proof-of-concept that I built. it isn’t polished, and the current deployment is downright broken.

We are cooking on a replacement which will reliably index all published lexicons in the network. It will function as both a reference for known schemas (eg, replace the rendered docs on docs.bsky.app), and a mechanism to discover published lexicons. This work is pretty far along and I expect it to ship in the next couple months.

Some other cool projects in the ecosystem look at data published with a given collection/NSID. I think the most popular of these is https://ufos.microcosm.blue/, which shows trending lexicons (based on how many accounts/records are using them), and shows example records with that data (which is very helpful to complement the formal schemas).

7 Likes

A list of the lexicons used by atproto apps.
Not sure why the preview here says it’s private; it’s public.

2 Likes

for a prescriptive list that contains agreed upon shared lexicons there’s GitHub - lexicon-community/awesome-lexicons: A collection of awesome lexicons

for scraping what records actually get posted, there’s microcosm ufos (as previously mentioned). a similar project is https://atpdashboard.usounds.work/

mary’s scraper afaict monitors all officially published schemas (which are records with the com.atproto.lexicon.schema), independent of usage stats GitHub - mary-ext/atproto-lexicon-scraping: Git scraping of AT Protocol lexicon schemas

That awesome list is pretty dated (started by me, never got momentum from me in advertising it), and I should sunset it - the @pipup.social list and UFOs are good examples of where to look.

I expect Quickslice and bsky’s new lexicon discovery to be the modern, live data sources for this.

Thanks for all the feedback. Appreciate it.

Just to keep you posted and show how useful your suggestions where :

Have since looked into a number of the suggestions and found ufo-microcosm and atpdashboard to look like “large lists” (the other ones where even below the number of collections I have collected so far, or contained extended data beyond what I am looking for). What I found about PipUp.social looked like real nifty/powerfull writing tools, but no lexicon collection … So I must have missed something?

I copied the list from atpdashboard, used the API from ufo-microcosm and found some striking differences (2.653 vs. 3.776 entires). So I will have to look into the details as to how these collections came to be.

So I’ll continue working on this :slight_smile: .

This GitHub repo may be helpful. One of the script here generates a list of lexicon seen in the atmosphere (there’s a spreadsheet but it’s not up to date).

It does use microcosm.blue, so if there’s a problem with that lexicon collection it may also be here. Would love to know if that’s the case.

Tnx for the pointer. This uses the same API as I did in my Swift tool.

So it is bound to return the same lexicon list :slight_smile:

I’ll try and confirm tomorrow (now is not the time to figure how to run TypeScript on my Mac, it’s bedtime :yawning_face::sleeping_face::zzz:).

Confirmed : after the usual struggle with TypeScript (ESM vs. CJS is a b§$%&) both the TypeScript based download-collections.ts, as well as my Swift tool currently (just before [2025-12-18 14:31:43]) return 3816 collections :slight_smile:

Thanks for the pointer :slight_smile:

Ohh, just to be clear: it’s not just the number of collections, but also the actual 3.816 collections names are the same in both cases :slight_smile: