How to best create a list of marked terms?

Rik · August 31, 2025, 3:01am

I am trying to produce a list of terms mentioned in a text. Each term can be marked, as #x[term], and the list should preserve the formatting of the term (regular, italic, bold, …). Terms may be single words or phrases. I have found three ways to do this, each with slightly different results. The typst code below demonstrates the three mechanisms I have composed.

My questions:

Is there a better way to dynamically build an array or dictionary than using state?
Is there a better way to extract the string value than the content-to-string function used here?
Are there more efficient ways to produce any of these variants?

#let collectedwords = state("unique_name", ())
#let content-to-string(content) = {
  if content.has("text") {
    lower(content.text)
  } else if content.has("children") {
    content.children.map(content-to-string).join("")
  } else if content.has("body") {
    content-to-string(content.body)
  } else if content == [ ] {
    " "
  }
}
#let x(word) = {
  word
  collectedwords.update(
    wd => {
      wd.push((content-to-string(word), word))
      return wd// return updated state
    }
  )
}
#let showwords1() = context {// Array method
  let words = ()
  for (word) in collectedwords.get() {
    words.push(word)
  }
  return words.slice(2,)
    .sorted(key: k => k.at(0))
    .map(((a,b)) => (b))
    .join(", ", last: ", and ")
}
#let showwords2() = context {// Dedup array mtethod
  let words = ()
  for (word) in collectedwords.get() {
    words.push(word)
  }
  return words.slice(2,)
    .dedup(key: ((a,b)) => a)
    .sorted(key: k => k.at(0))
    .map(((a,b)) => (b))
    .join(", ", last: ", and ")
}
#let showwords3() = context {// Dictionary method
  let words = ()
  for (word) in collectedwords.get() {
    words.push(word)
  }
  return words
    .slice(2,)
    .sorted(key: k => k.at(0))    
    .to-dict().values()
    .join(", ", last: ", and ")
}

= Mechanism to list marked words in a text

The marked words in order of occurence in the text: #x[def], #x[_def_], #x[ghi], #x[ghi], #x[_jkl_], #x[मुदिता], #x[abc], and #x[Abc]. (In use they would be scattered throughout a larger text.) These routines each produce a sorted list of the words. Each variant has different characteristics.

Sorted lists of marked words:
- Array method: #showwords1().
- Dedup array method: #showwords2().
- Dictionary method: #showwords3().

Differences:
- The array method preserves the order in which spelling duplicates (#underline[Abc] and #underline[abc], here) are found.
- The dedup method shows the first occurrence of each unique spelling
- The dictionary method shows the last occurence of each unique spelling.

Andrew · September 2, 2025, 10:56am

Considering you don’t want to inline #arr.push(func[the thing]); every time you want to write the thing, no.

t4t – Typst Universe 's get.text, but currently you can’t do that for 100% of values, like context.

Well, there is a shorter way to update the state array:

#import "@preview/t4t:0.4.3": get

#let x(word) = {
  word
  collected-words.update(wd => wd + ((get.text(word), word),))
}

Also you don’t need for loops or parentheses in various cases.

/// Array method.
#let showwords1() = context {
  collected-words
    .get()
    .sorted(key: array.first)
    .map(((a, b)) => b)
    .join(", ", last: ", and ")
}

/// Dedup array method.
#let showwords2() = context {
  collected-words
    .get()
    .dedup(key: ((a, b)) => a)
    .sorted(key: array.first)
    .map(((a, b)) => b)
    .join(", ", last: ", and ")
}

/// Dictionary method.
#let showwords3() = context {
  collected-words
    .get()
    .sorted(key: array.first)
    .to-dict()
    .values()
    .join(", ", last: ", and ")
}

This is pretty efficient for what it does. Don’t concern yourself with performance unless you face performance issues. You can always create a Wasm plugin from Rust or whatever and hope that that implementation will be even faster, you never know.

The golden rule says “get it done first, optimize later”. And it normally should always work great. Unless you can’t get it done because it’s too slow to keep working on it, of course.

Rik · September 4, 2025, 10:48pm

Thank you very much for that. I had not known about t4t before, but see that the routine I have to convert content to strings is similar and, for my purposes sufficient. I also appreciate the simplification and loop elimination.

One new thing I have added is to use a new feature from the development version (August 27 2025): normalize, so

#let xn(word) = {
  words.update(
    wd => wd + ((lower(content-to-string(word)
      .normalize(form: "nfkd")// disable for typst 0.13.1 use
    ), word),)
  )
}

This improves the sortation to what I consider something more reasonable, at least until more control for sorting is provided to support other schemes.