Adapter Lexicons for representing one lexicon in the shape of another

Hi all,

Been noodling on something that I want to share and get some feedback on. The basic problem I’m trying to solve is the idea that if you have a post of some lexicon 1 which is usually displayed in app A and you want to instead have it display it natively in app B (which handles entirely different lexicons), then app B currently has to specifically implement a UI for lexicon 1. Instead, what I think should happen is the creator of lexicon a should be able to write an “adapter lexicon”, let’s call it 1to2 for simplicity, that lets app B display records of lexicon 2 within the schema of lexicon 2.

Let’s get more concrete. Let’s say we collectively as a community created a lexicon for a scientific Figure in a scientific research paper. That Figure might have the lexicon community.lexicon.science.figure, and look like this:

{
  "lexicon": 1,
  "id": "community.lexicon.science.figure",
  "defs": {
    "main": {
      "type": "record",
      "description": "A scientific figure.",
      "key": "tid",
      "record": {
        "type": "object",
        "required": ["title", "image", "createdAt"],
        "properties": {
          "title": {
            "type": "string",
            "maxLength": 500,
            "description": "Display title of the figure."
          },
          "caption": {
            "type": "string",
            "maxLength": 5000,
            "description": "Full figure caption."
          },
          "image": {
            "type": "blob",
            "accept": ["image/png", "image/jpeg", "image/svg+xml", "image/tiff"],
            "maxSize": 20971520,
            "description": "Full-resolution figure image. May be large; not intended for direct embedding."
          },
          "embedPreview": {
            "type": "blob",
            "accept": ["image/png", "image/jpeg"],
            "maxSize": 976562,
            "description": "Downsampled preview (≤1 MB) intended for use in social platform embeds."
          },
          "doi": {
            "type": "string",
            "description": "DOI of the associated publication, if any. Not a validated URI — store as-is (e.g. '10.1234/example')."
          },
          "authors": {
            "type": "array",
            "items": { "type": "string" },
            "maxLength": 100,
            "description": "Author names or DIDs."
          },
          "dataUri": {
            "type": "string",
            "format": "uri",
            "description": "Optional link to underlying data or supplementary materials."
          },
          "createdAt": {
            "type": "string",
            "format": "datetime"
          }
        }
      }
    }
  }
}

Then, we have the lexicon for a Bluesky microblog post, here:

{
  "lexicon": 1,
  "id": "app.bsky.feed.post",
  "defs": {
    "main": {
      "type": "record",
      "description": "A Bluesky microblog post.",
      "key": "tid",
      "record": {
        "type": "object",
        "required": ["text", "createdAt"],
        "properties": {
          "text": {
            "type": "string",
            "maxGraphemes": 300,
            "maxLength": 3000,
            "description": "Post body text."
          },
          "embed": {
            "type": "union",
            "refs": [
              "app.bsky.embed.images",
              "app.bsky.embed.external",
              "app.bsky.embed.record",
              "app.bsky.embed.recordWithMedia"
            ],
            "description": "Optional media or record embed."
          },
          "createdAt": {
            "type": "string",
            "format": "datetime"
          }
        }
      }
    }
  }
}

// ---- app.bsky.embed.images (separate lexicon, shown for reference) ----
{
  "lexicon": 1,
  "id": "app.bsky.embed.images",
  "defs": {
    "main": {
      "type": "object",
      "required": ["images"],
      "properties": {
        "images": {
          "type": "array",
          "items": { "type": "ref", "ref": "#image" },
          "maxLength": 4
        }
      }
    },
    "image": {
      "type": "object",
      "required": ["image", "alt"],
      "properties": {
        "image": {
          "type": "blob",
          "accept": ["image/png", "image/jpeg"],
          "maxSize": 976562
        },
        "alt": {
          "type": "string",
          "maxGraphemes": 2000,
          "description": "Alt text for accessibility."
        },
        "aspectRatio": {
          "type": "ref",
          "ref": "app.bsky.embed.defs#aspectRatio"
        }
      }
    }
  }
}

And finally, here’s what an adapter lexicon might look like:

{
  "lexicon": 1,
  "id": "com.atproto.lexicon.adapter",
  "defs": {
    "main": {
      "type": "record",
      "description": "Declares a transformation from records of one lexicon type into records of another, enabling cross-platform display or import.",
      "key": "tid",
      "record": {
        "type": "object",
        "required": ["sourceLexicon", "targetLexicon", "outputTemplate"],
        "properties": {
          "sourceLexicon": {
            "type": "string",
            "format": "nsid",
            "description": "NSID of the lexicon being adapted from."
          },
          "targetLexicon": {
            "type": "string",
            "format": "nsid",
            "description": "NSID of the lexicon being adapted into."
          },
          "label": {
            "type": "string",
            "maxLength": 100,
            "description": "Human-readable label for this adapter, shown in the 'Import from' picker UI."
          },
          "outputTemplate": {
            "type": "unknown",
            "description": "A partial target-record object. Any leaf value may be replaced with an interpolation node (see #srcRef, #srcBlobRef, #srcTemplate) to pull in values from the source record at render time. Non-interpolation leaves are passed through as literals."
          }
        }
      }
    },

    "srcRef": {
      "type": "object",
      "description": "Interpolation node. Replaced with the value of a field from the source record.",
      "required": ["$src"],
      "properties": {
        "$src": {
          "type": "string",
          "description": "Dot-notation path into the source record. Examples: 'caption', 'authors.0', 'doi'."
        }
      }
    },

    "srcBlobRef": {
      "type": "object",
      "description": "Interpolation node. Replaced with the blob CID reference from the source record. The blob itself stays on the source PDS; this just copies the reference.",
      "required": ["$srcBlob"],
      "properties": {
        "$srcBlob": {
          "type": "string",
          "description": "Field name of the blob in the source record (e.g. 'embedPreview')."
        }
      }
    },

    "srcTemplate": {
      "type": "object",
      "description": "Interpolation node. Replaced with a string rendered from a template, with {fieldName} placeholders resolved against the source record.",
      "required": ["$template"],
      "properties": {
        "$template": {
          "type": "string",
          "description": "Template string with {fieldName} placeholders. Example: 'Figure by {authors.0}: {title}'"
        }
      }
    }
  }
}

Here’s what an example adapter record might look like:

{
  "$type": "com.atproto.lexicon.adapter",
  "sourceLexicon": "community.lexicon.science.figure",
  "targetLexicon": "app.bsky.feed.post",
  "label": "Share figure to Bluesky",
  "outputTemplate": {
    "text": {
      "$src": "caption"
    },
    "embed": {
      "$type": "app.bsky.embed.images",
      "images": [
        {
          "image": { "$srcBlob": "embedPreview" },
          "alt": { "$src": "title" }
        }
      ]
    }
  }
}

Basically what I’m doing here is creating a lexicon that defines how to compose the shape of the app.bsky.feed.post lexicon from the community.lexicon.science.figure lexicon.

When a user clicks on a button that says “Import from ” (such as “Import from science.lexicon.community”) in the Bluesky post composer they’d be shown all of the types of records they can select, and can search from them. They’d select a specific Figure record, and then the client:

  1. Fetches the adapter record (from the community.lexicon authority’s PDS, collection com.atproto.lexicon.adapter, filtered by sourceLexicon + targetLexicon)

  2. Walks the outputTemplate tree, replacing interpolation nodes with values from the selected figure record

  3. Presents the resulting draft app.bsky.feed.post in the composer for the user to review/edit

  4. Publishes a normal Bluesky post — no foreign NSID anywhere in the created record

Use Cases

  1. Like the example above: a scientific Figure that lives in a lexicon community.lexicon.science.figure could have an adapter to a Bluesky post so that if a scientist wants to post their Figure on Bluesky, Bluesky doesn’t need to implement the community.lexicon.science.figure lexicon. Instead, when giving the user the list of options they have for records to import as Bluesky posts, Bluesky would look at records in the user’s PDS and cross-reference those records with adapters that are available for those records to Bluesky’s post lexicon.
  2. Another use case might be that if I have a lexicon that stores information about when and where a public food distribution might happen (like FoodNotBombs does), the lexicon adapter would tell Bluesky that it can display that information if it wants, it should just do it in this certain way in a Bluesky thread (<> denotes a field in the sourceLexicon):
🥘 Food Distribution

Where: <address>
When: <startsAt> to <endsAt>

(next post…)

Location Notes: <locationNotes> (this is stuff like "behind the library" or "east entrance, look for the Student Hall banner," which a structured address field can't capture)
Accessibility: <accessibilityInfo>
(...any other important info...)

Open questions

  • Discovery: How does a Bluesky client know where to look for adapters for a given NSID

    • One approach: the authoritative server that hosts the adapter lexicons for community.lexicon.science.figure is https://lexicon.community. So that domain could host adapter lexicons for community.lexicon.science.figure at the forward NSID subdomain of the domain, like normal, and the specific adapter lexicon could be hosted with the URI suffix. Ergo, https://figure.science.lexicon.community/app.bsky.feed.post would hold the definition of the adapter lexicon for community.lexicon community.lexicon.science.figureapp.bsky.feed.post.
    • Another approach: create a sidecar record in the com.atproto.lexicon.adapter namespace with the same rkey whenever a record is created that needs an adapter (you’d skip this if it already exists, of course). Simply read a record’s adapter lexicon from the com.atproto.lexicon.adapter namespace in a user’s repo whenever you fetch that record. I’m less fond of this solution, but maybe there’s an optimization I’m not thinking of.
  • Trust boundary: Should users be able to publish their own adapters for any lexicon? Or only the authoritative server of the lexicon itself?

  • Blob ownership: When the resulting Bluesky post is created, the embedPreview blob CID still lives on the original author’s PDS. If that PDS goes away, the image breaks. Maybe a good follow-up step would be for the client to re-upload the blob to the user’s own PDS before posting.

  • Two-way adapters: An adapter record could declare sourceLexicon and targetLexicon in reverse to allow importing from Bluesky into a foreign lexicon. The same schema should be able to handle the backwards case, I think.

  • Partial field validation: The outputTemplate is unknown, so a strict Lexicon validator can’t verify that the rendered output will actually conform to app.bsky.feed.post. Clients would need to validate the rendered output against the target lexicon schema before presenting it to the user.

3 Likes

Great minds think alike!

I haven’t been actively working on this in the past couple of months (things have been so crazy busy!!) but this is something I prototyped a bunch with relationaltext.org, and which Aaron Steven White (who I don’t think is on here :eyes:) has extended a lot with panproto.dev.

I’ve been using the approach both in pre-atproto past work, and in pannacotta.org, ionosphere.tv, and others - it works really well and I think is going to be an important tool in avoiding “lexicon centralization” and keeping people feeling creative and not stuck in having to come up with the “perfect” lexicon. I gave a talk about these things in Vancouver at AtmosphereConf (videos appeart to not be currently loading, @iame.li?)

More (pre-atproto) background reading is Ink & Switch’s Cambria Project.

1 Like

Nice!! The Cambria Project looks like it’s got the similar vision of being a “schema to map one lexicon to another”. I think one reason I lean towards preferring my implementation above is that it’s not another schema language, it’s just Lexicon itself. So the Lexicon for transforming one Lexicon to another is a Lexicon (proposed NSID is com.atproto.lexicon.adapter). I also personally prefer the term “adapter” to “lens”, but I realize that that’s likely going to change from person to person.

Curious what other thoughts you and others have :slight_smile:

Yes, absolutely. This is how panproto works - the translations are all defined as a lexicon, and are pretty comprehensive in their definitions.

There’s a lot there, but I’d encourage you to take a look since what Aaron has built with panproto is a very robust implementation of what you’re proposing.

(Cambria predates atproto by several years, or it would have almost certainly been done as a lexicon implementation, but serves as a useful reference for the high level concept)

1 Like