The core tension being “how many resources are generic PDS hosts expected to provide to random users” @bnewbold.net on Bluesky
Let’s imagine a streamer accumulates thousands of hours of Streamplace vods. Where should they be hosted? Do they belong in the streamer’s PDS, or somewhere else? Once permissioned data drops, atproto becomes appealing for transferring a much larger class of potentially much larger data. The question of PDS storage limits seems unavoidable.
Naturally, Bluesky and other large providers can’t offer unlimited free storage. I assumed we’d end up with a variety of offerings at various prices: free tiers with limited usage and paid tiers with higher caps. In all honesty, I guessed this was Bluesky’s eventual monetization plan.
However, I seem to have been mistaken. Bryan Newbold proposed an S3 compatible sidecar service for file storage, noting that PDS hosting/switching is cheaper and easier if repos are small. While there’s definitely upsides to storing some things off-protocol, I love the idea of storing all my personal data in my PDS. “Google drive without Google lock-in” has been one of my favorite examples of what atproto could become. Maybe AetherOS + pds-backed files?
Do such storage-heavy applications belong on atproto?
@iame.li is planning to build “the best PDS”, which will have performance characteristics, storage, and fast disk to be able to support heavy streamers. And it will be a paid service, and heavy streamer accounts will move there.
Yes, I too love the concept of “all my stuff on a PDS” … and that thread from @bnewbold.net was very much thinking about what that looks like operationally, because blobs on disk are not a file system.
So, some lexicon where the pointers to individual objects are on S3 is likely a low cost and operationally stable way to build something like this.
It’s “all in your PDS”, but the content is stored in another system that is tuned for synch / fetch / upload / download of larger / more files.
Someone is going to have to take a crack at actually designing such a thing and what trade offs it makes before we can reason about its costs or scaling possibilities.
This is an issue I’ve had to noodle on regarding the future of Cartridge as well. At some point I want to get into sales and distribution for video games. Developers upload their game binaries to a permissioned space, when a user purchases the game they get access to the permissioned space and thus can download/install the binary.
Some game binaries are 200+ GB. If Blizzard/Activision decided to start selling their games via this method, they’d need hundreds of terabytes of storage (at least) to store binaries, archival versions, platform forks, etc. That’s just for one large studio.
I haven’t thought much beyond the general concept, but I’m in the same boat as @iame.li. I think the solution is going to be offering our own PDS with paid tiers that give customers access to larger storage.
Alternatively, companies can host their own PDS with as much storage as they want. I believe my gamesgamesgamesgames.games repo is currently among the largest repos on the network in terms of blob storage, hosting more than 300GB of blobs right now.
It’s definitely doable, and I believe it’s a reasonable use case for the network.
This business model describes Steam. Unless you have a direct line to Blizzard/x you do not have these large clients.
It makes more sense to focus on the long-tail of indie developers publishing their apps for their price (+ you do not have to worry about large binaries early on).
Would love to see Cartidge be a replacement for Steam! So game devs can go direct to consumer, without some payment processor that can censor them.
hosting more than 300GB of blobs right now.
damn, are these game assets? and if so, can other people can use these as is? and what about copyright?
I don’t think the PDS storing some data’s content hash counts as that data being “on the PDS”. If I heard and believed that some data was “on my PDS”, I might be in for a nasty surprise when my PDS backup doesn’t contain it.
Now, I think we can agree that both off-PDS and on-PDS storage have their uses. Keeping hashes of non-PDS data is definitely meaningful while Bluesky demonstrates that PDS blobs are an appropriate way to store some data. On-PDS storage is simple and intuitive; off-PDS storage trades complexity for potential benefits.
Now, I want to focus on the user experience. Let’s assume @iame.li is offering paid PDS hosting, directly writing MP4 files to those PDSs. What happens when a user on a Bluesky PDS tries streaming? How much VOD data can they write to their PDS? Is Streamplace responsible for explaining the limitations of the user’s PDS and selling them a better one? What are those limitations? Just the blob size limit? What if we break the files into many smaller blobs? We get suspended for abuse, presumably, but where’s the line?
Uncertainty is toxic to investment, whether it’s money or dev effort. PDS limits should be explicit. We need clear boundaries; the standard shouldn’t be PDS services that are “unlimited” until you cross some fuzzy line. Ideally, users on Bluesky PDSs would be able to try out PDS-storage-heavy applications and see a prompt to upgrade or migrate when they hit a cap. It might be “no fun to [tie the] amount of available storage to an offer to join a PDS”, but it’s just the reality. Let’s accept that and focus on making the UX great.
Apologies, I feel I’ve communicated poorly on this. I didn’t mean focus on Streamplace besides as an example of a storage-heavy application.
My ask: PDSs should serve a page explaining their usage policies. Apps can then direct users to that page when a write to their PDS fails due to a usage limit. What that page contains will depend on the business model; some might explain a hard limit and encourage users to migrate, while others might offer paid tiers with higher caps. In either case, a tool to help the user see what’s taking up space and make room would be a great inclusion.
One option would be to do what Bluesky is currently doing: have a general rule against abuse instead of an explicit cap. While the protocol should allow for this, it shouldn’t be the standard. Apps and users alike benefit from clarity.
Imagine a ‘dropbox on atproto’ app that writes the files to PDS blobs. As a user, I risk catching a takedown if I write too much data to my Bluesky PDS. As the developer, I’m unsure whether a given write to their PDS puts my user at risk. Do I make the app only available to people on my PDS? Do I warn users that they risk losing their data? Neither is ideal. That uncertainty would hinder development and uptake of such applications. An expectation of safe PDS writes would allow developers to build confidently rather than defensively.
Seems like the next step is to add support for per-account storage limits to the reference PDS. I’m willing to contribute, but I’d like to hear from the community first. Am I wrong about the benefits of clear PDS storage limits? Would Bluesky and other PDS hosts be willing to adopt clear limits, assuming they’re easy to set?
Yes I think this very much goes to a particular product or business model and it would be up to that service to figure it out.
You don’t risk takedown, there are rate limits.
I can also say that the Bluesky Typescript PDS implementation isn’t particularly open to contributions of features, so you’d be better off seeing if another implementation is interested.