Sentence spacing

Will Typst have a concept of Sentence spacing - Wikipedia?

It seems that except for Supporting non-frenchspacing · Issue #1281 · typst/typst · GitHub, there has been little discussion about this.

In LaTeX, there are \frenchspacing and \nonfrenchspacing to switch larger inter-sentence spacing off and on, with \nonfrenchspacing the default for English-language documents.

Personally, I like the visual separation of sentences.

I guess the problem is to make a semantic concept of a sentence. It’s already a bit tricky with paragraphs (Paragraph Function – Typst Documentation, Semantic and non-semantic paragraphs and how `set par()` affects both · Issue #6416 · typst/typst · GitHub). Sentence seems even harder. Unless it’s basically [\w\s]+\.\s*, but it becomes very non-trivial when you start mashing random inline code/math stuff. But I guess the in-between sentence part is much easier to detect. So maybe this is enough?

#show regex("\\w\\. "): it => it.text.replace(" ", "") + h(1em)
#lorem(200)

P.S. I don’t need this feature.

Wow, I love this solution :slight_smile: Of course, special cases like Prof. Farnsworth will still require some manual specification of the correct spacing, but that is also the case with LaTeX.

Unfortunately, the regex selector can only capture text elements, so this will not work:

#show regex("\\w[\\.:!] "): it => it.text.replace(" ", "") + h(1em)
#lorem(50) So, we have $x=42$. That's a fascinating result #emoji.face!
#lorem(20)

Not a native speaker, but larger inter-sentence spacing seems to me to be the default in English-language typography. Are there opinions on whether that would warrant dedicated syntax/commands in the language?

Being the default is likely an exaggeration—see e.g. here for a strongly opposing view: One space between sentences | Butterick’s Practical Typography.

That said, supporting it is a reasonable ask. Ideally it wouldn’t require extra syntax or commands, but we’ll see.

Here you go:

#let space-after-sentence-ending-with-equetion(space: [ ], doc) = {
  let styled = text(red)[].func()
  let sequence = [].func()
  assert(doc.func() in (sequence, styled))
  let wrap
  let elements
  if doc.func() == sequence {
    wrap = inner => inner
    elements = doc.children
  } else {
    wrap = inner => styled(inner, doc.styles)
    elements = doc.child.children
  }
  let skip = false
  let inner = for (a, b) in elements.windows(2) {
    if skip {
      skip = false
      continue
    }
    if (
      a.func() == math.equation
        and not a.block
        and b.func() == text
        and b.text.starts-with(". ")
    ) {
      skip = true
      a
      "."
      space
      b.text.replace(". ", "", count: 1)
      continue
    }
    a
  }
  wrap(inner)
}

#show regex("[\\w\\p{Emoji}][\\.:!] "): it => it.text.replace(" ", "") + h(1em)
#show: space-after-sentence-ending-with-equetion.with(space: h(1em))
#lorem(50) So, we have $x=42$. That's a fascinating result #emoji.face!
#lorem(20)

You would kinda have to handle each element individually…unless you don’t care, that there is a longer space after block-level equation that has a period right after this, for example. Then you can just accumulate all supported elements in the array, or bite a bullet and just check for a.func() != text, which might have UB for some elements, idk.

#let space-after-sentence-ending-with-supported-elements(space: [ ], doc) = {
  let supported-sentence-ending-elements = (math.equation, ref, cite)
  let styled = text(red)[].func()
  let sequence = [].func()
  assert(doc.func() in (sequence, styled))
  let wrap
  let elements
  if doc.func() == sequence {
    wrap = inner => inner
    elements = doc.children
  } else {
    wrap = inner => styled(inner, doc.styles)
    elements = doc.child.children
  }
  let skip = false
  let inner = for (a, b) in elements.windows(2) {
    if skip {
      skip = false
      continue
    }
    if (
      a.func() in supported-sentence-ending-elements
        and b.func() == text
        and b.text.starts-with(". ")
    ) {
      skip = true
      a
      "."
      space
      b.text.replace(". ", "", count: 1)
      continue
    }
    a
  }
  wrap(inner)
}

#show regex("[\\w\\p{Emoji}][\\.:!] "): it => it.text.replace(" ", "") + h(1em)
#show: space-after-sentence-ending-with-supported-elements.with(space: h(1em))

#lorem(50) So, we have $x=42$, This also works @a. And this: #cite(<a>). That's
a fascinating result #emoji.face! #lorem(20)

// @typstyle off
#bibliography(bytes(```yaml
a:
  type: article
  author:
```.text,
))

I don’t know why there is a colon though. Also, you would have to change \w to \p{L}, I think, to support non-English/ASCII letters.

1 Like