How can I replace markup tags in a CSV file with Typst markup?

There’s a video game that stores its dialogue transcript in all the languages supported by the game inside a single .csv file. With Typst, I can turn this into a table, which helps me a lot in learning/practising some of the languages.

The .csv file contains some game-specific markup that makes pieces of the text show up as bold or italics, or with a different size or colour in the game. Here are a few examples:

{i}This{/i} should be emphasized. {i}Another{/i} example.

{color=#3761d9}Blue{/color} coloured text.

{size=20}This{/size} should be smaller.

My end-goal would be to turn the example above into this:

#emph[This] should be emphasized. #emph[Another] example.

#text(rgb("#3761d9"), [Blue]) coloured text.

#text(0.7em, [This]) should be smaller.
Output screenshot

typst-csv-formatting-goal

Here’s my attempt so far:

Source code with rules and turning the CSV into a table
#let transcript = csv("transcript.csv", row-type: dictionary)

// Italics and bold
#show regex("\{i\}.*\{/i\}"): it => {
  show regex("\{i\}"): ""
  show regex("\{/i\}"): ""
  emph[#it]
}
#show regex("\{b\}.*\{/b\}"): it => {
  show regex("\{b\}"): ""
  show regex("\{/b\}"): ""
  emph[#it]
}

// Size and colour
#show regex("\{size=\d+\}.*\{/size\}"): it => {
  show regex("\{size=\d+\}"): ""
  show regex("\{/size\}"): ""
  text(rgb("#666"), it)
}
#show regex("\{color=#......\}.*\{/color\}"): it => {
  show regex("\{color=#......\}"): ""
  show regex("\{/color\}"): ""
  text(rgb("#666"), it)
}

#table(
  columns: 4,
  stroke: 0.3pt,
  table.header([*Key*], [*English*], [*Spanish*], [*Japanese*],),
  ..transcript.map(row => (row.Key, row.Standard, row.Spanish, row.Japanese)).flatten()
)

For handling the markup for colour and size changes, I believe I would need to capture the values for that (like #3761d9 and 20), which I don’t know how to do (yet). So, for now, I’ve just made two rules that turn anything wrapped in a {color} or {size} into grey. (While not the ideal solution, at least this would be a clear indication that the passage in question is supposed to have some special markup.)

As for text wrapped in {i} or {b}, the rules mostly work. The only problem is when there are two non-consecutive pieces of text meant to be displayed in italics, everything between them is also turned into italics. (This might have something to do with how greedy the regex rule is.)

Output screenshot with the italics rule

typst-csv-formatting-italics

P.S.: I’ve also thought about manipulating the .csv file with sed in a way that Typst’s own markup is read from there instead of relying on show rules, but I’m not sure if Typst would recognise the markup that way.

Hey! I put together a little Typst snippet that does exactly what you described — it handles italics, bold, color, and size, and even extracts the actual values for color and size dynamically.

I wasn’t sure what font sizes the game actually uses apart from 20, so I just picked an arbitrary scaling factor for now. If there are only a few distinct sizes, it might make more sense to map them explicitly — but for that, I’d need a bit more info on what sizes actually occur.

// Italics and bold
#show regex("\{i\}.*?\{/i\}"): it => {
  emph[#it.text.replace(regex("\{\/?i\}"), "")]
}
#show regex("\{b\}.*?\{/b\}"): it => {
  strong[#it.text.replace(regex("\{\/?b\}"), "")]
}

// Size and colour
#show regex("\{size=\d+\}.*?\{/size\}"): it => {
  let size = int(it.text.match(regex("\{size=(\d+)\}.*?\{/size\}")).captures.at(0))
  text(size: size * 0.03em, it.text.replace(regex("\{\/?size(?:=\d+)?\}"), ""))
}

#show regex("\{color=#[0-9a-fA-F]+\}.*?\{/color\}"): it => {
  let color = rgb(it.text.match(regex("\{color=(#[0-9a-fA-F]+)\}.*?\{/color\}")).captures.at(0))
  text(color, it.text.replace(regex("\{\/?color+?=?#?[0-9a-fA-F]?+\}"), ""))
}

{i}This{/i} should be emphasized. {i}Another{/i} example.

{b}This{/b} should be strong. {b}Another{/b} example.

{color=\#3761d9}Blue{/color} coloured text.

{size=20}This{/size} should be smaller.

Which results in:

Let me know if you run into any issues adapting it to your use case! Also, if you can share a short sample of your CSV, that might help refine things even further. Would be happy to help you tweak it if needed!

2 Likes

Thank you very much for the solution! It seems to work perfectly!

Here’s a sample of the CSV file. I included the very first line from the original (which has the field names) and also 200 lines from somewhere in the middle, which also include lines with markup for italics, size, and color.

transcript-sample.csv (87.6 KB)

1 Like

Since data comes from strings, str.replace can be effectively used here:

#let process-text(text) = {
  import "@preview/oxifmt:0.2.1": strfmt
  let args(left, right, substitution) = (
    regex("\{" + left + "}(.*?)\{/" + right + "}"),
    x => strfmt(substitution, ..x.captures),
  )
  let str = text
    .replace(..args("i", "i", "#emph[{}]"))
    .replace(..args("b", "b", "#strong[{}]"))
    .replace(..args("size=(\d+)", "size", "#text({} * 0.03em)[{}]"))
    .replace(..args("color=#(\w+)", "color", "#text(rgb(\"{}\"))[{}]"))
    .replace(regex("[$]"), x => "\\" + x.text)
  eval(str, mode: "markup")
}

#let text = ```
{i}Just italic{/i}.
{b}Just bold{/b}.
{color=#00FF00}Just colored{/color}.
{size=20}Just resized{/size}.

Here {b} is {i}{size=20}a{/size} mixed{/i} {color=#3761d9}situation{/color}{/b}.
```.text

#process-text(text)

#let transcript = for row in csv("transcript.csv", row-type: dictionary) {
  (row.pairs().map(((k, v)) => ((k): process-text(v))).join(),)
}

#show table.cell.where(y: 0): strong
#table(
  columns: 4,
  stroke: 0.3pt,
  table.header[Key][English][Spanish][Japanese],
  ..transcript
    .map(row => (row.Key, row.Standard, row.Spanish, row.Japanese))
    .flatten(),
)

image

The solution is much shorter, but requires additional data processing and escaping of stuff that will error in markup mode.

Also, table header can be simplified.

note that this will struggle with strangely nested tags:

{i}{i}This{/i}{/i} should be emphasized.

Result:

This{/i} should be emphasized.

This happens because regex (regular expressions) can only parse regular languages, while arbitrarily nested formatting tags form a context-free language which is more general.

For more info, consult this classic text: html - RegEx match open tags except XHTML self-contained tags - Stack Overflow :wink:

Strangely nested tags would be strange to see, since they don’t do any good. So for this use case it doesn’t matter. For nested stuff, you need to do much more than just find and replace.

1 Like