How can I insert unbreakable space automatically after some characters?

Astra3 · February 24, 2025, 4:01pm

In Czech, when writing some more advanced documents, there is a rule, where conjunctions cannot line wrap, in another words, it cannot be the last character on a line. If the rendered output looked like this:

Dnes jsme přijeli domů a
rozbalili nákup.

it’d be incorrect as the first line ends with a. A correct example would have a on the start of the second line. This also applies if the characters are the first word in a sentence. This example:

Mánička dnes dojela domů. K
tomu všemu nestihla ani rozbalit nákup.

is also incorrect.

The conjunctions (not only conjunctions, but it’s the closest word) are these: ["a","A","k","K","i", "I", "u", "U", "s", "S", "o", "O", "v", "V"].

The most manual solution is to insert unbreakable space after each conjunction, like so:

Máma dnes dojela domů a~rozbalila nákup.

It gets very tedious and I was wondering, is there a way to do it automatically for every paragraph?

I tried using regex to match and replace, however typst’s regex cannot use lookahead and cannot replace only specific capture groups. Another solution is to use individual show rules, like so:

#show " a ": [ a~]
#show " i ": [ i~]

However, as you can see, it can get very tedious.

Tiggax · February 24, 2025, 7:29pm

If i understood correctly you need the conjucations (ish) to auto break with the next word?

Would something like this be helpfull?

#let const = ("a","A","k","K","i", "I", "u", "U", "s", "S", "o", "O", "v", "V")
#show regex(" [" + const.join(",") + "] " ): it => [ #it.text.trim()~]

Since there are spaces, you can just trim them, and don’t need to capture them. unless you have more complex stuff like having dots before it with no space and so on.

Astra3 · February 25, 2025, 11:58am

Your solution works great, however I don’t think the regex should contain commas, so I updated it to the following:

#let const = ("a","A","k","K","i", "I", "u", "U", "s", "S", "o", "O", "v", "V")
#show regex(" [" + const.join() + "] " ): it => [ #it.text.trim()~]

…and that does the job very well. There are no edge cases that I know of that need to be handled, so I will go with that. Thank you very much

sijo · February 25, 2025, 1:02pm

I think you can simplify your code to the following:

#show regex(" [aAkKiIuUsSoOvV] "): it => [ #it~]

gezepi · February 25, 2025, 3:02pm

Trimming the spaces off of it is necessary here because otherwise they start adding up. But putting all the “conjunction” characters together is more readable than joining a list.

Dnes jsme přijeli domů a rozbalili nákup.\
#show regex(" [aAkKiIuUsSoOvV] "): it => [ #it~]
Dnes jsme přijeli domů a rozbalili nákup.
#show regex(" [aAkKiIuUsSoOvV] " ): it => [ #it.text.trim()~]
Dnes jsme přijeli domů a rozbalili nákup.