Incidentally, I work for an ASO company so store taxonomies are indeed quite a challenge.
We maintain a database of all categories used in both the iOS and Play Stores, and our data science team built a finer classification layer (“appDNA”) based on large-scale app metadata. It groups apps using underlying signals and tends to perform much better at clustering apps by actual user intent.
One consistent takeaway from this work: trying to define an exhaustive, fixed taxonomy indeed doesn’t scale.
So with that in mind, I don’t think the goal of this lexicon should be to enumerate all possible types and domains. That will quickly become either too complex or too generic.
Instead, it might be more effective to bake into this lexicon a smaller set of signals (and/or “features”, as in atproto.garden) that describe what an app does and how it behaves.
From those signals, different consumers can derive their own groupings depending on their needs:
- user-facing directories → “Photos apps”, “Messaging apps”
- developer views → “feed builders”, “repo tools”
- analytics / discovery → clustering based on usage patterns
Rather than hardcoding categories, each app could expose things like:
- capabilities (posting, publishing, messaging, etc.)
- protocol interactions (feeds, repos, firehose, identity, etc.)
- surface type (client, service, tool, etc.)
Then “categories” become a view, not the source of truth. We could even have an optional tech_stack entry listing the tech related to the app (interesting for finding developers)
That said, I don’t think this contradicts having a minimal set of high-level fields like Type or Domain. Those are useful as human-readable entry points, but they shouldn’t be the foundation of the lexicon. The foundation should be these lower-level signals, from which higher-level groupings can be derived.
I’m very interested in collaborating on a minimal version of this.
I can also share my current dataset (~75 apps) to test different approaches against something real.
Also +1 on the “garden” framing — it fits well with the idea of something that grows organically rather than being strictly defined upfront.
also a last thought: we could probably also do an analysis of what schema.org for apps SoftwareApplication - Schema.org Type can teach us