How can I parse content into a custom structure?

Is it possible to have a function parse content into a custom structure like so?

#parse[
  [a[b[c][d]]]
] == (body: [a], children: (
  (body: [b], children: (
      (body: [c], children: ()),
      (body: [d], children: ()),
    )
  ),
))

I know how to do this for strings and have done so, but would like to write a solution which allows for any content as the nodes’ bodies. I could try going into the sequence and messing around, but I wanted to see if there’s a better way to approah this first.

As a note, square brackets are typically used for this (linguistics syntax trees), but if it’s not possible I’d still be interested in how to do this for other delimiters.

Since you’ve already got a way to turn a string like “a[b[c]]” into that type of structure, maybe all you need is a way to convert content to string:

#let parseAsString(it) = {
  let asString = it.fields().at("children")
    .map(c => c.fields())
    .map(v => v.at("text", default: none))
    .filter(v => v != none).join()
  
  return asString
}

A more general solution for converting content to string was provided by GitHub user lvjr here:

That would work if I only wanted string output. I was able to find a solution that preserves content (ie. allows [$x$[*y*][@z]]). As a result of using square brackets, all content passed into the function ends up being a sequence (as far as I can tell), where [ and ] are never grouped with anything else (they’re their own child). So I ended up taking the sequence’s children and matching on [[] and []] instead of "[" and "]", then concatenating the children which make up the body.

I ended up needing this for an extension to my parser, so it was super helpful. Thanks! I modified it for my use case to preserve quotation marks, so here’s that for posterity.

let stringify(it) = {
  for child in it.children {
    if child.func() == smartquote {
        if child.double { "\"" } else { "'" }
      } else {
        child.at("text", default: none)
      }
  }
}
3 Likes

For converting content to string, there is t4t package:

#import "@preview/t4t:0.4.2": get
#get.text[content]

Although it doesn’t always work as expected, like for math mode, but I think there are other packages that can handle math conversion.