How to format transcripts (GAT Style)

How would you format a transcript for Conversation analysis with line numbers and Speakers like this:

It could be done with a grid but I find those very cumbersome to edit for larger numbers of transcripts. Is there a better way?

It depends on the kind of API you are looking for, but this is what I came up with: Essentially, we do some string manipulation on raw blocks and add some line numbering.

Code
#show raw.where(lang: "transcript"): it => {
  let (lines, ..fields) = it.fields()
  let r = regex("^S\\d+:$")
  assert(
    lines.first().text.trim().contains(r),
    message: "must start with a speaker"
  )
  
  let num-spaces = 3
  let raw-font = text.font
  let raw-size = text.size

  set par.line(
    numbering: n => text(
      font: raw-font,
      size: raw-size,
      // HACKY!
      str(n - counter("numbering-offset").at(it.location()).first())
    )
  )
  
  let speakers = (lines.first().text.trim(),)
  let words = ()
  let current-word = ()
  for line in lines.slice(1) {
    let text = line.text
    if text.trim().contains(r) {
      speakers.push(text.trim())
      words.push(current-word)
      current-word = ()
      continue
    }
    current-word.push(text)
  }
  words.push(current-word)

  counter("numbering-offset").update(it => it + words.flatten().len())

  let num-inter-spaces = calc.max(..speakers.map(s => s.len()))
  
  words = words.map(
    w => w.join(
      "\n" + " " * (num-spaces + num-inter-spaces)
    )
  )
  speakers.zip(words).map(
    w => w.join(" " * num-spaces)
  ).join("\n")
}

You can use it like so, alternating speakers by adding a blank line with S<n>:

```transcript
S1:
What do you like to do in your free time?
S2:
I love going to the movies!
S1:
That's great!
What about aligning brackets?
S2:
uh
yeah, that's cool I guess [SCHEIden
S1:
                          [ja
```

You can adjust the font and size of the font through normal raw block show/set rules:

#show raw: set text(font: "Fira Code", size: 11pt)

Note that the current implementation uses the default line numbering, and as the counter for them cannot be adjusted this means that if you use line numbering elsewhere in your document these transcript blocks might have their numbers off (numbering can be implemented to be done “manually” in the code, but if this is note is not of concern to you, I personally think this way is best)

Thanks. Thats a start!

It’s quite unflexible though, ideally the transcript would be in a box so the line numbers are not outside the text margin:

I checked the par.line documentation and there doesn’t seem to be a good way around this…

So I played around with this a bit and got something I can live with:

#show raw.where(lang: "transcript"): it => {
  
  let (lines, ..fields) = it.fields()

  let is-speaker(line) = {
    let t = line.trim()
    t.ends-with(":")
  }

  let speaker-name(line) = {
    line.trim().slice(0, -1)
  }

  assert(
    lines.len() > 0 and is-speaker(lines.first().text),
    message: "transcript must start with a speaker label"
  )

  let rows = ()

  let current-speaker = speaker-name(lines.first().text + ":")
  let first-line = true

  for line in lines.slice(1) {
    let txt = line.text

    if is-speaker(txt) {
      current-speaker = speaker-name(txt) + ":"
      first-line = true
      continue
    }

    rows.push((
      if first-line { current-speaker } else { "" },
      txt,
    ))

    first-line = false
  }

  let speaker-width = calc.max(
    2pt,
    ..rows.map(r => measure(text(r.at(0))).width),
  )

  set par(leading: 1.15em)
  set par(spacing: 0pt)
  
  set par.line(numbering: n => str(n))  

  grid(stroke: none,
    columns: (speaker-width, 1fr),
    column-gutter: 1em,
    row-gutter: 1.15em,

    ..rows.map(((speaker, txt)) => (
      [#speaker],
      [#txt],
    )).flatten(),

    
  )


}

Question

Is there a way to reset the line counter after each transcript? I tried this but couldn’t get it to work:

Yeah unfortunately the default paragraph line numbering is always on the margins. On one hand this could be beneficial as it leaves more space for the text content, but I also see how this would be undesirable. One way to fix this is to not use line numbering, and add the number in the raw block manually:

Code
#show raw.where(lang: "transcript"): it => {
  let (lines, ..fields) = it.fields()
  
  let is-speaker(line) = {
    let t = line.trim()
    t.ends-with(":")
  }

  let speaker-name(line) = {
    line.trim().slice(0, -1)
  }

  assert(
    lines.len() > 0 and is-speaker(lines.first().text),
    message: "transcript must start with a speaker label"
  )
  
  let speakers = (lines.first().text.trim(),)
  let words = ()
  let current-word = ()
  
  for line in lines.slice(1) {
    let text = line.text
    if is-speaker(text) {
      speakers.push(text.trim())
      words.push(current-word)
      current-word = ()
      continue
    }
    current-word.push(text)
  }
  
  words.push(current-word)

  let numlines = words.flatten().len()
  let max-digits-amt = str(numlines).len()
  let num-spaces = 3

  let longest-speaker = calc.max(..speakers.map(s => s.len()))
  
  words = words.map(
    w => w.join(
      "\n" + " " * (num-spaces + longest-speaker)
    )
  )
  let unnumbered = speakers.zip(words).map(
    w => w.join(" " * (num-spaces + longest-speaker -  w.first().len()))
  ).join("\n")

  let numbered = ""
  for (idx, line) in unnumbered.split("\n").enumerate() {
    let idx = idx + 1 // start numbering from 1
    let numzeros = max-digits-amt - str(idx).len()
    let formatted-num = "0" * numzeros + str(idx) + " " * num-spaces
    numbered += formatted-num + line + "\n"
  }

  numbered
}

The code also includes some slight adjustments to the function from the solution you provided, like using the is-speaker function.

This way, the numbers do not go beyond the margins:

This solution seems to also place the numering inside the margins, or am I misunderstanding?

Yes, actually my original solution bypassed this by keeping track of a custom counter and adjusting the displayed number manually :slight_smile:

set par.line(
  numbering: n => text(
    font: raw-font,
    size: raw-size,
    // HACKY!
    str(n - counter("numbering-offset").at(it.location()).first())
  )
)
...
counter("numbering-offset").update(it => it + words.flatten().len())