Media PDS / Service

So recently there’s been some discussion about video on demand and how that may work for AT Protocol. I’ve been thinking a bit about this topic recently, since solving video and larger media files would unlock some interesting new classes AT Protocol applications.

There’s a few different considerations here:

  1. Storing large media objects with in your PDS is likely not possible — the PDS implementation distributed by Bluesky PBC has a default 5mb blob size limit. Whilst that can be increased, it’s typically not.
  2. For media content like video and audio files, transcoding and metadata clean up is usually necessary. You don’t want someone accidentally posting their location without their consent just because that video file had that metadata embedded.
  3. To improve user experience, using a CDN for content delivery optimisation would be ideal.
  4. Taking streaming content and converting it to video recordings and video clips requires stitching together the individual segments of the stream back together into a video file.
  5. For video content, a publisher may want to upload multiple video tracks with a single audio track, such that video quality can be negotiated but audio remains clear.
  6. When uploading large media files, doing a one-shot upload of a large file can be problematic, especially on unstable connections.

On top of these considerations, we also have the cost factor: who pays for that bandwidth to deliver the video to all your followers? Who pays for the storage and transcoding?

Given all of this, I think we can perhaps take an idea from Tangled and develop a protocol for a sidecar service inspired by their Knots service.

So I’d like to propose a “media service” that can be deployed and handles the above considerations.

Media Service

I’m envisioning a service that can be independently deployed, allows for setting quotas on accounts and handles all the uploading, transcoding, and storage. It should also support the ability to let a user know that payment is required — either to use the service as a whole, or to increase limits beyond freely available limits.

When you upload media to this service, it responds with a signed record that you can then store in your PDS — this would be the information on how to request the media for playback/display, metadata like title, description and poster image, may include keyframe images, duration, filesize, and content rights, (re)distribution policy, and licensing information.

The signature of that blob can be verified using the verification material from the did document for that service (so probably a did:web:<service url>). This signature allows for relays and appviews to verify that the blob data is correct without having to subscribe to anything from the media service. (metadata like title and description may be excluded from the signature, tbd)

What would the record look like?

{
  "$type": "example.media.lexicon.attachment",
  "server": "media-service.example",
  "creator": "did:<method>:<identifier>"
  "id": "some-unique-identifier",
  "signatures": {
    "$type": "com.example.inlineSignature",
    "signature": {"$bytes": "MzQ2Y2U4ZDNhYmM5NjU0Mzk5NWJmNjJkOGE4..."},
    "key": "did:media-service.example#signing1"
    "properties": ["$type", "server", "creator", "id"]
  }
}

The properties field is a proposal for @ngerakines.me’s Attestations specification, which allows creating a signature over just a few properties of the parent object, not the entire object / record.

This record would be returned by the Media Service upon successful upload, conversion or clipping, and would be stored in your PDS as a record (or as an attachment to another record inline)

Media Service APIs

The media service would have a few different APIs:

  • direct upload of a blob (one-shot, works like com.atproto.repo.uploadBlob, but allows larger files)
  • direct upload via resumable upload (large files)
  • streaming, where segments are stored for a stream and then combined together to create either a recording of the full stream or a clip from the stream. For clipping, the full-stream does not need to have finished.

To handle large files, we could use the tus protocol.

I think the queries would be:

  • example.media.lexicon.getUploadLimits — returns any account limits, size limits for uploadBlob, quotas, etc.
  • example.media.lexicon.getLinks — returns the CDN URLs to the uploaded attachment
  • example.media.lexicon.getMetadata — returns the metadata for the uploaded attachment

I’m envisioning some procedures along the lines of:

  • example.media.lexicon.uploadBlob
  • example.media.lexicon.createUpload — returns the URL to interact with the tus protocol to perform the upload, also returns an ID and processing status for the upload.
  • example.media.lexicon.deleteUpload — allowing the user to delete a finished upload
  • example.media.lexicon.getUpload — returns the status of the upload and if the upload has completed the attachment record.
  • example.media.lexicon.createClip — takes an existing upload and creates a clip of it, if the original upload’s policies allow for it. Similar to the streaming one, but requires the upload be fully completed first.
  • example.media.lexicon.stream.create — used by streaming services to start a stream upload, where they have a individual segments to upload, would likely look like place.stream.segment | Streamplace Docs
  • example.media.lixecon.stream.uploadSegment
  • example.media.lexicon.stream.getSegments — returns a list of already uploaded segments: place.stream.live.getSegments | Streamplace Docs
  • example.media.lexicon.stream.createClip — allows creating a video clip from the already uploaded segments. Would return the same response as getUpload overall.
  • example.media.lexicon.stream.finish — used to finish the stream, preventing more segments from being uploaded, and possibly converts the segments into a recording in a single file. Would return the same response as getUpload overall.

The APIs that return an upload would likely need error handling for media conversion errors, payment being required, quotas being exceeded, etc.

We may also want an API for associating a poster image with an uploaded video or audio file, which would just be uploaded as a blob. We may also want a way to create a multi-track upload, where you can upload multiple video files and audio tracks at once, which are combined into the single attachment, such that responsive media can be served without transcoding on demand.

These procedures and queries will need further definition, but I wanted to get the ball rolling here, and share where my thinking was at so far.

Closing thoughts

This style of sidecar service would allow for media processing in ways that we cannot currently do on AT Protocol natively, relieves pressure from the PDS hosts for handling large media files, and provides content delivery optimisation.

I’ve already talked this through quite a bit with @iame.li, but I’d love to get other people’s thoughts.

I think we could also support some way to associate an AT Protocol repo with it’s preferred media services, as well as a way for media services to advertise themselves, such that media-centric applications can discover them via the firehose.

9 Likes

Asked this on Discord before, but got told to have it here too. Also changed one to better ask what I meant(3) and added a new one(4).

  1. if someone wanted to move where their media is stored, they would just have to modify the record and to check if its the same video for sure the video consumer can just check the signature with what they have stored?

  2. i assume this would need to be codec aware and there would have to be some list of codecs that are supported for ingest and fetching?

  3. if a service wanted to allow making a new version of the video and prerequisite for that is that both videos are similar, would that have to be a completely new upload or some segments could be reused (i expect that to be impossible if timings are off)

  4. it being codec-aware makes me bit concerned. what if some service requires or wants to use certain codec that is available in a new spec but some hosts are slow with their updates, should it care about codec or just worry about it being segments and leave codec specific stuff to the services to handle

1 Like

These are some good questions:

I’d say that moving media should be out of scope for now, as it’s not likely to directly be a protocol operation, though long term it’d be good to have. That said though, it’s important to communicate to creators that they should always retain a copy of their original media.

Moves would require a listMedia query (paginated) for the authenticated DID, and then require some form of copy/export between the media servers — maybe this can be done efficiently directly between them, instead of going through a transfer node?

I think we’d probably need to define a baseline of codecs to support, and then advertise if additional formats are available. Both on ingest side and CDN side. I’d probably expect mp4/mp3 and ogg ingest, with HLS and mpeg-dash at distribution level. Might be unwise to accept wav or flac ingest due to sizes, same with .mov — that said though, I’m certainly no media specialist, so I’d rather experts in that domain specify the formats and codec that are baseline / optional.

I think this could initially just require a full file upload, anything else would likely get messy with transcoding to slice and dice media files at very specific duration offsets. That’s also how it is for youtube as far as I know.

Then it’d be a matter of coordinating with people to implement support for specific things, and downgrading if their non-baseline codec is unsupported.

1 Like

Thanks @thisismissem.social for doing the writeup! I think this is a really solid overview of what a sophisticated and feature-rich VOD atproto server could look like. I’ve played around with the term “VPDS” before (the V is for video), though I think it’s an open question whether this kind of media server would be the source of truth for the video content (like a PDS) or just a supplemental set of tools (like Bluesky’s existing video upload infra.)

To me, the most interesting and urgent piece of the design: what does the VOD primitive record look like? If you’re looking at a big list of videos on “ATProto YouTube”, what are you looking at? If video-on-atproto developers could come to a rough consensus on this, it opens up the design space a lot; that’d allow backend and frontend developers to get to work on implementations at the same time. From the other thread:

I (and Streamplace) would be interested in doing a few community consensus/work sessions to see if we could throw together something usable for this. Open questions for such a design, off the top of my head:

  1. How is video content uniquely identified? I don’t think “take the CID of the input video” is a very good idea; who wants to keep the input video around after processing? (Streamplace has a very opinionated answer to this based on segmentation but I wouldn’t necessarily want to lock the entire ecosystem into Streamplace segmentation.)
  2. Realistically we’re probably not going to ship fully-atproto-backed infrastructure for this stuff in the next few months, in time for (say) AtmosphereConf. Should our design accommodate a middle ground whereby videos can be backed by a variety of CDN providers and URLs, to accommodate videos on things like Bunny Stream or Vimeo?
  3. What’s the moderation story look like?

If folks are interested I could set up some recurring meetings to start to flesh this stuff out, kind of like a Community Specification process but I’m not sure I’d want to be that formal.

3 Likes

The Tracklist Catalog Server currently in-development works very similar to this and follows a similar API, but we also use content protection and store encrypted segments. Our HLS .m3u8 is dynamically generated to support a (very near future) update where encryption keys are rotated, though we only encrypt once for now and just block access for users if they aren’t allowed to view the stream at the moment. Since the URLs for both the m3u8 playlist and transport segments are XRPC API routes, we can create an HLS playlist that points to XRPC URLs. In fact, with an at:// URI resolver inside the request handler for an HLS client (like in our case hls.js) it’s also possible to have the HLS playlist point to at:// URIs as segments instead of https:// ones. Especially with the work done in WG Private Data, this might be a really easy way to publish .m3u8 playlists that point to on-protocol resources…

1 Like

@iame.li I think the example.media.lexicon.attachment record would just be either referenced or embedded within an application specific type, e.g., example.videoapp.feed.video as the attachment is just the media, not the attributes and related data like hashtags, description with facets, etc.

An alternative to this media service may just be an “large object storage” service, but I think the transcoding & CDN are probably the most expensive parts to operate, storage tends to be relatively cheap overall. So bundling that together allows for one place to do billing for the service.

@iame.li yes, I think so, that’s why there’s the getUrls method, such that we can just do the upload & store in object storage part, and then provide the URLs for playback via whatever media processing CDN.

I’d probably suggest streaming APIs would be lower priority, just because of the reassembly required?

1 Like

@psychedeli.ca do you have a link to more information that you could share on this?

i can’t talk publicly about it just yet (wait about one more month when we start the beta) but i can talk privately in DMs :slight_smile:

Really excited to see this work – I agree that blob storage and baseline guidance for video/streaming is one of the most underspecified parts of the PDS install template, and writing the going to production guide made that even clearer.

3 Likes

Subtitle tracks are an important consideration for accessibility. If an uploaded video file/segment contains subtitles, the distribution service (e.g., HLS transcoder/packager) should detect and package them accordingly for end-user consumption. (uploading separate out-of-band subtitle tracks is a bit trickier so could be punted until a later version.)

1 Like

Yes, agree. Subtitles are important too. I didn’t specifically include it in the first draft concept, but yes, the APIs should support uploading vtt files or whatever it is for subtitles — I’m by far not an expert in media streaming/encoding. I’m just wanting to start getting the ball rolling and identify what may or may not be a good idea or good APIs for us to use.

1 Like

What you’ve written is awesome. I have a lot of experience in media tech (on the product side) and am eager to contribute where I can!

1 Like

Is there anything (besides subtitles noted above) that should be considered? This is after all intended to be collaborative!

e.g., how do multi-track uploads work on other platforms? What about uploading edits to a video?

As someone that is really interested in what ends up with Media over QUIC, would anything from that project tie into this?

https://blog.cloudflare.com/moq/

2 Likes

I would also consider alternate audio tracks for accessibility. An HLS manifest specifies a “primary” audio track, but it can also include other tracks to accommodate alternate languages (and other features, but languages are the most common use case).

Most platforms support it by allowing users to upload multiple audio files per video and then transcode/package everything into HLS on the backend. The Mux docs have a good overview of how theirs works.

When I worked at JW Player, we implemented it in our platform. Things like segment alignment and audio priming were kinda tricky, but overall it was pretty straightforward.

Edits could be done post-upload with EDLs, but that is a whole other level of complexity. In my experience, if they edit at all, most users will do it on their computer or phone and upload the final cut. That said, most platforms support creating simple clips, or trimming videos (for example, cutting the awkward dead air from the start of a recorded live stream).

2 Likes

Hey folks!

Someone pointed this out to me cause I help run a company that has been building distributed user owned storage for several years and thought I might be able to share some lessons learned.

Fundamentally, a media PDS is a special case of a generalized problem: eventually users can’t own and manage all storage themselves, but the “I’ll let someone else handle it” decision usually means giving up lots of control. Because usually you’re now locked in with a vendor, and usually you are at their mercy for data access.

Honestly video is probably the hardest version of this to get right cause you’re dealing with complex transcoding, stitching/reassembly and content delivery networks as @thisismissem.social outlined. Personally I’m not sure I’d start with the Mt. Everest of data distribution problems – I don’t know if image PDS is even needed but it’d be a much easier way to get started building.

Our company has been trying to build a more protocol based version of storage, essentially trying to build a large scale IPFS storage service that operates across independent storage operations in a vendor neutral way. The content addressing ATProto uses is super useful in that it means you can move data around and still verify its integrity. The other technology we build on is UCAN, a capability based authorization system, which enables us to flip the ownership model so the storage service really acts as your own private storage locker (potentially with encryption at rest if you want privacy).

Our requirements around decentralization of infrastructure and trustlessness may not apply here so you can probably cut some corners if you don’t mind the media PDS having some level of authority over the data it stores.

My strongest recommendation is to build in layers, and treating each as a service. You can build incrementally and a do-one-thing service is easier to deploy than a do-many-things service. Here are the layers we’ve identified:

  1. object storage – people who do storage at rest like to know nothing about data formats or what they’re storing – just object storage or blobs of bytes as others mentioned.
  2. metadata and indexing - it’s helpful to have a service to just handle that – what’s content tagged with what metadata and where can I find the underlying data. (though you can also combine that with storing metadata or indexes themselves as blobs on object storage – as long as you have a good caching layer)
  3. data preparation – how do you shard/hash/transcode data so it’s easing to send to object storage and retrieve later. we do all that through software libraries on the client machine, but that’s cause it’s important to us the client themselves generate content hashes themselves that they can fully trust. guessing you don’t neccesarily need that here
  4. content delivery networks – really just a large set of caching and bandwidth providers – we haven’t gotten that far here – we just use centralized CDN products, as does Bluesky PBC uses for delivery of assets in their AppView (to my knowledge anyway). I’ve tried to build a trustless CDN once and it’s hard AF, especially if you want streaming hash verification on video – very very hard to land in a browser.

One last thing: I can’t recommend checking out UCAN enough – I think Bluesky PBC was planning to use it in early versions of ATProtocol but it wasn’t ready for primetime then. I think it is now: ucan.xyz

Happy to contribute however I can. Our company’s product is open source just like ATProto and MIT/Apache2 licensed. Storacha Network · GitHub – there are many repos there and it may be a bit overwhelming, but happy to point to things that might be useful/

5 Likes

Possibly! I don’t know enough about it but @iame.li mentioned it

1 Like