The short idea is like Wayback Machine, but done by individuals, and stored in new permissioned data.
This is very much like what Pinboard offered – it would archive / snapshot your bookmark links, including a PDF to capture the entire look and feel of the site. This was a paid feature, and might be a part of a great paid feature for Semble in the future (I don’t speak for them here, just riffing on ideas and encouraging them to look at what people might pay for AND very much experimenting with permissioned data).
So, can we make a community lexicon for archived web content?
community.lexicon.bookmark.archive might be a spot for it. Here are some other ideas of what to include. The URL field from `community.lexicon.bookmark
link to a wayback machine entry – meaning, as part of bookmarking an app can ping WBM to archive, and then fetch the latest link
link to a blob, which might be of type PDF, markdown, text, html, json etc (or WACZ or other formats, I am not an archiving expert – for my own purposes, PDF and markdown would be most useful)
time of archiving
an optional description e.g. “snapshot of page after the story about the goose was removed from home page”
Wayback Machine is public, and some of these entries could be public, but I imagine permissioned archiving, especially for individuals, would be very valuable and not run into any issues around copyrighted content.
Apps could build all sorts of things around the aggregating version of such things – search, a timeline view of web pages that’s much like Wayback Machine, etc etc.
Yeah, there are lots of services that could hook in here.
Personally, other than wayback machine, I’d love to have the archive in my own PDS so I have self-contained archives.
or, I guess, I’m pointing / sharing at someone else’s blob? That’s more an app level concern, could index everyone, and I don’t even need to snapshot a website – someone else has a blob stored from 3 years ago, and that works for me, and I copy the blob to my PDS.
Yeah, I have no idea what bsky is going to do here. I hope they refactor to actually store it. But are they going to store a copy, or just a pointer to the original, in which case it if it is deleted it is gone.
They will probably do the latter.
I can see value in at uri archiving, but feels like a very special case. @mfzx.net any ideas how this would be represented in a lexicon different from what I specced? Or just at:// URI in the URI record?
The only additional fields that might be useful in the case of archiving records would be historical information about hosting, i.e. which PDS the creator of the archived record was using at the time, a reference to the commit that introduced that particular version of the record, etc. Besides that, it’d be the same as for web content, just with an AT-URI and the raw record CBOR as the archived contents.