So recently there’s been some discussion about video on demand and how that may work for AT Protocol. I’ve been thinking a bit about this topic recently, since solving video and larger media files would unlock some interesting new classes AT Protocol applications.
There’s a few different considerations here:
- Storing large media objects with in your PDS is likely not possible — the PDS implementation distributed by Bluesky PBC has a default 5mb blob size limit. Whilst that can be increased, it’s typically not.
- For media content like video and audio files, transcoding and metadata clean up is usually necessary. You don’t want someone accidentally posting their location without their consent just because that video file had that metadata embedded.
- To improve user experience, using a CDN for content delivery optimisation would be ideal.
- Taking streaming content and converting it to video recordings and video clips requires stitching together the individual segments of the stream back together into a video file.
- For video content, a publisher may want to upload multiple video tracks with a single audio track, such that video quality can be negotiated but audio remains clear.
- When uploading large media files, doing a one-shot upload of a large file can be problematic, especially on unstable connections.
On top of these considerations, we also have the cost factor: who pays for that bandwidth to deliver the video to all your followers? Who pays for the storage and transcoding?
Given all of this, I think we can perhaps take an idea from Tangled and develop a protocol for a sidecar service inspired by their Knots service.
So I’d like to propose a “media service” that can be deployed and handles the above considerations.
Media Service
I’m envisioning a service that can be independently deployed, allows for setting quotas on accounts and handles all the uploading, transcoding, and storage. It should also support the ability to let a user know that payment is required — either to use the service as a whole, or to increase limits beyond freely available limits.
When you upload media to this service, it responds with a signed record that you can then store in your PDS — this would be the information on how to request the media for playback/display, metadata like title, description and poster image, may include keyframe images, duration, filesize, and content rights, (re)distribution policy, and licensing information.
The signature of that blob can be verified using the verification material from the did document for that service (so probably a did:web:<service url>). This signature allows for relays and appviews to verify that the blob data is correct without having to subscribe to anything from the media service. (metadata like title and description may be excluded from the signature, tbd)
What would the record look like?
{
"$type": "example.media.lexicon.attachment",
"server": "media-service.example",
"creator": "did:<method>:<identifier>"
"id": "some-unique-identifier",
"signatures": {
"$type": "com.example.inlineSignature",
"signature": {"$bytes": "MzQ2Y2U4ZDNhYmM5NjU0Mzk5NWJmNjJkOGE4..."},
"key": "did:media-service.example#signing1"
"properties": ["$type", "server", "creator", "id"]
}
}
The properties field is a proposal for @ngerakines.me’s Attestations specification, which allows creating a signature over just a few properties of the parent object, not the entire object / record.
This record would be returned by the Media Service upon successful upload, conversion or clipping, and would be stored in your PDS as a record (or as an attachment to another record inline)
Media Service APIs
The media service would have a few different APIs:
- direct upload of a blob (one-shot, works like
com.atproto.repo.uploadBlob, but allows larger files) - direct upload via resumable upload (large files)
- streaming, where segments are stored for a stream and then combined together to create either a recording of the full stream or a clip from the stream. For clipping, the full-stream does not need to have finished.
To handle large files, we could use the tus protocol.
I think the queries would be:
example.media.lexicon.getUploadLimits— returns any account limits, size limits for uploadBlob, quotas, etc.example.media.lexicon.getLinks— returns the CDN URLs to the uploaded attachmentexample.media.lexicon.getMetadata— returns the metadata for the uploaded attachment
I’m envisioning some procedures along the lines of:
example.media.lexicon.uploadBlobexample.media.lexicon.createUpload— returns the URL to interact with the tus protocol to perform the upload, also returns an ID and processing status for the upload.example.media.lexicon.deleteUpload— allowing the user to delete a finished uploadexample.media.lexicon.getUpload— returns the status of the upload and if the upload has completed the attachment record.example.media.lexicon.createClip— takes an existing upload and creates a clip of it, if the original upload’s policies allow for it. Similar to the streaming one, but requires the upload be fully completed first.example.media.lexicon.stream.create— used by streaming services to start a stream upload, where they have a individual segments to upload, would likely look like place.stream.segment | Streamplace Docsexample.media.lixecon.stream.uploadSegmentexample.media.lexicon.stream.getSegments— returns a list of already uploaded segments: place.stream.live.getSegments | Streamplace Docsexample.media.lexicon.stream.createClip— allows creating a video clip from the already uploaded segments. Would return the same response asgetUploadoverall.example.media.lexicon.stream.finish— used to finish the stream, preventing more segments from being uploaded, and possibly converts the segments into a recording in a single file. Would return the same response asgetUploadoverall.
The APIs that return an upload would likely need error handling for media conversion errors, payment being required, quotas being exceeded, etc.
We may also want an API for associating a poster image with an uploaded video or audio file, which would just be uploaded as a blob. We may also want a way to create a multi-track upload, where you can upload multiple video files and audio tracks at once, which are combined into the single attachment, such that responsive media can be served without transcoding on demand.
These procedures and queries will need further definition, but I wanted to get the ball rolling here, and share where my thinking was at so far.
Closing thoughts
This style of sidecar service would allow for media processing in ways that we cannot currently do on AT Protocol natively, relieves pressure from the PDS hosts for handling large media files, and provides content delivery optimisation.
I’ve already talked this through quite a bit with @iame.li, but I’d love to get other people’s thoughts.
I think we could also support some way to associate an AT Protocol repo with it’s preferred media services, as well as a way for media services to advertise themselves, such that media-centric applications can discover them via the firehose.