Is there a way to convert content to bytes for use in plugins?

Hi all,

It would be nice to have the option to convert content to bytes using bytes(…). This would be useful when developing plugins. For instance, if one develops a plugin using Rust, I imagine that one could use the Typst libarary and then deserialize the bytes. It would open up a lot of possibilities for future plugins.

For now, is there a better way to convert content to a string than using repr(…)?

Best regards, F

There is no built-in way, but it’s not too hard to build one yourself (with limitations). There’s been this post on handling content, which I think is at least somewhat relevant:

A main takeaway here is that, whenever you can, you should strive to not have content as the basis of your computation. Use structured data as long as possible and then format it.

In your case, you want to specifically create bytes to communicate arbitrary content to a plugin. The approach will be similar to what is presented in that post, and is based on Extracting plain text - Typst Examples Book (which is in the section “Typstonomicon, or The Code You Should Not Write” – be warned!)

You don’t actually need to go all the way to bytes, since once you have converted content into plain data, you can just use CBOR. To convert content into a data structure (I’m calling it AST because it at least resembles an abstract syntax tree and naming is hard) you can recursively go through content:

#let content-to-ast(it) = {
  // function (as string representation) and fields is what makes some content
  let func = repr(it.func())
  let fields = it.fields()

  // children (for content containing many things)
  // and body (for stuff containing a single thing)
  // are the ones we need to recursively transform
  if "children" in fields {
    fields.children = fields.children.map(content-to-ast)
  } else if "body" in fields {
    fields.body = content-to-ast(fields.body)
  }

  // return a dictionary, which should no longer contain content
  (function: func, ..fields)
}

#let ast-to-content(it) = {
  // separate function and fields
  let (function: func, ..fields) = it

  // recursively un-transform
  if "children" in fields {
    fields.children = fields.children.map(ast-to-content)
  } else if "body" in fields {
    fields.body = ast-to-content(fields.body)
  }

  // convert to content according to the function
  // TODO other types
  if func == "sequence" {
    fields.children.join()
  } else if func == "linebreak" {
    linebreak()
  } else if func == "space" {
    [ ]
  } else if func == "text" {
    [#fields.text]
  } else if func == "box" {
    let (body, ..fields) = fields
    box(body, ..fields)
  } else if func == "smartquote" {
    smartquote(..fields)
  }
}

and you can now translate content to bytes and back — at least the few things I chose to support here:

Example
#let example = [Hello\ #box["World"]]
// serialize
#let example-ast = content-to-ast(example)
#let example-bytes = cbor.encode(example-ast)
// deserialize
#let example-ast = cbor.decode(example-bytes)
#let example-roundtrip = ast-to-content(example-ast)

#table(
  columns: 2,
  [
    #example
    
    #repr(example)
  ],
  [
    #example-roundtrip
    
    #repr(example-roundtrip)
  ],
)


However…

Of course it would be nice if things were that easy, but in general they are not:

#let example2 = {
  set text(1em)
  [Hello]
}

#repr(example2)
// styled(child: [Hello], ..)
#example2.styles
// ..

The styled element (for example) which is generated by using rules contains a styles field, which is opaque, so we can’t convert it. I’m not an expert on Typst intrinsics, but I’m pretty sure there is simply no way these elements could be represented that doesn’t rely on also referencing some part of Typst’s Rust runtime – we’re simply out of luck. Even if we could represent set rules, a show rule can can contain user-written Typst code, so we’d need to convert any kind of Typst code into a plain Typst value.


So to recap:

  • Regular content (the kind of content you could also write in e.g. Markdown or HTML (excluding CSS)) can be recursively converted into Typst dictionaries.
  • This is necessarily limited and lossy, which is why the basic technique is part of the Typstonomicon.
  • You should generally prefer to work on structured data, and only convert that to content after everything is done.
  • Nonetheless, I think this (ideally coupled with a Rust library that handles the other side of the conversion) could be fairly useful in general. In one of my showcases I used Serde to convert Rust structs and enums to Typst values, so that they can be formatted on the Typst side. This would be very similar, except that the values passed to Rust wouldn’t be strings to parse, but already prepared data that can simly be deserialized.
1 Like

btw @frisbro, I will move this post to Questions (and give it tags and a title that is a question you’d ask a friend). I know you wrote about Typst providing this and only asking for a workaround as a last resort, but for the limitations I described and because stabilizing a more complex Plugin protocol is very unlikely in the near future, I think it’s more likely that we handle this in userspace for the time being.

1 Like

Thank-you very much for the detailed reply. This helped me enormously.