This post is about the new capability of hy-dro-gen:0.1.2 (repo: Vanille-N/hy-dro-gen) to load custom hyphenation patterns for languages not natively supported by Typst.
If you sometimes write documents in languages that have non-builtin hyphenation rules, this is for you.
Disclaimer: I am not an expert on hyphenation.
Basically everything I know comes from this blog post abouthypher.
Background
Currently Typst can handle hyphenation for 34 (perhaps soon-to-be 35) languages. Hyphenation points are computed by the crate typst/hypher, by compiling TeX patterns into a finite automata.
As has been noted multiple times,
- The hyphenation patterns are not always licensed in a way that would permit their distribution embedded in the Typst compiler.
 - Even if they are permissively licensed, every additional language increases the size of the compiler, so it is impractical to hope that eventually every possible language/variant will be supported.
 
As far as I can tell, the main obstacles to beginning to resolve issue #5223 – requesting the ability to dynamically load user-specified hyphenation patterns – are simply simply that (1) it takes work to add to typst/hypher the capability to load user-specified patterns, and (2) it takes time to decide on a stable API for doing so. Since packages have much less inertia that the entire compiler for settling on an API, it seems to be a more effective approach in the short term.
I happen to be maintaining the package hy-dro-gen because it is a dependency for meander, and It occurred to me that having (1) working knowledge of Rust, (2) a hand on the already existing hyphenation package for Typst, and (3) a bit of free time, I was in a good position to be the person to implement dynamically loaded hyphenation patterns.
About hy-dro-gen
hy-dro-gen is a thin wrapper around a WASM module on top of typst/hypher. It provides bindings to the same library that Typst uses for native hyphenation, and thus guarantees consistent results.
For hy-dro-gen:0.1.2 specifically, I forked hypher, and added the abilities to dynamically load precompiled patterns and compile new patterns on the fly.
Setting up hyphenation for a new language
As an example, the Romanian language (ISO code “ro”) is not supported by Typst. Because its patterns are not licensed, it is unclear if it ever will be supported natively. Fortunately, this is feasible right now using hy-dro-gen.
1. Procure pattern files
From www.hyphenation.org you can get a list of languages for which patterns exist.
The patterns themselves can be downloaded from hyphenation/tex-hyphen. In this example, I grabbed hyph-ro.tex and saved it to a local folder patterns/.
2. Load the patterns into hy-dro-gen
#import "@preview/hy-dro-gen:0.1.2" as hy
#let trie_ro = hy.trie(
  // Downloaded from github:hyphenation/tex-hyphen
  tex: read("patterns/hyph-ro.tex"),
  // See column '(left,right)-hyphenmin' on hyphenation.org
  bounds: (2, 3),
)
// The patterns are compiled on the fly exactly once, and stored in
// hy-dro-gen's global registry of languages. Hypher is pretty efficient,
// so this one-time cost at startup is barely noticeable.
// It is technically possible -- though not convenient yet -- to precompile
// the patterns for even less loading time.
#hy.load-patterns(
  ro: trie_ro,
)
3. Apply patterns
Below is a comparison of how the same excerpt gets hyphenated with different settings.
Left:
// This would be the right solution if Romanian was natively supported.
// Because it is not, this doesn't work at all.
#set par(justify: true)
#set text(hyphenate: true, lang: "ro")
#excerpt
Middle:
// If we lie to Typst and pretend it's English, we get some hyphenation.
// However this is neither correct (hyphenation may occur where forbidden)
// nor pleasant (some lines have unnatural spacing).
// We could find another supported language that is more similar to Romanian,
// but that would just be another hack, and other features that depend on
// the language (e.g. selecting the right quotation marks) may be wrong.
#set par(justify: true)
#set text(hyphenate: true, lang: "en")
#excerpt
Right:
// This time we get the right hyphenation points.
// Fewer excessive spaces, and slightly more compact overall.
// Semantically correct in that it accurately states the text's language.
#set par(justify: true)
#set text(hyphenate: true, lang: "ro")
#show: hy.apply-patterns("ro")
#excerpt
You can view this example on typst.app.
Further reading
For more details, you can consult the full documentation of hy-dro-gen.
If you encounter an issue, please file a bug report.
