lexidex is a proof-of-concept that I built. it isn’t polished, and the current deployment is downright broken.
We are cooking on a replacement which will reliably index all published lexicons in the network. It will function as both a reference for known schemas (eg, replace the rendered docs on docs.bsky.app), and a mechanism to discover published lexicons. This work is pretty far along and I expect it to ship in the next couple months.
Some other cool projects in the ecosystem look at data published with a given collection/NSID. I think the most popular of these is https://ufos.microcosm.blue/, which shows trending lexicons (based on how many accounts/records are using them), and shows example records with that data (which is very helpful to complement the formal schemas).
for scraping what records actually get posted, there’s microcosm ufos (as previously mentioned). a similar project is https://atpdashboard.usounds.work/
That awesome list is pretty dated (started by me, never got momentum from me in advertising it), and I should sunset it - the @pipup.social list and UFOs are good examples of where to look.
I expect Quickslice and bsky’s new lexicon discovery to be the modern, live data sources for this.
Just to keep you posted and show how useful your suggestions where :
Have since looked into a number of the suggestions and found ufo-microcosm and atpdashboard to look like “large lists” (the other ones where even below the number of collections I have collected so far, or contained extended data beyond what I am looking for). What I found about PipUp.social looked like real nifty/powerfull writing tools, but no lexicon collection … So I must have missed something?
I copied the list from atpdashboard, used the API from ufo-microcosm and found some striking differences (2.653 vs. 3.776 entires). So I will have to look into the details as to how these collections came to be.
This GitHub repo may be helpful. One of the script here generates a list of lexicon seen in the atmosphere (there’s a spreadsheet but it’s not up to date).
It does use microcosm.blue, so if there’s a problem with that lexicon collection it may also be here. Would love to know if that’s the case.
Confirmed : after the usual struggle with TypeScript (ESM vs. CJS is a b§$%&) both the TypeScript based download-collections.ts, as well as my Swift tool currently (just before [2025-12-18 14:31:43]) return 3816 collections