How to implement dictionary-based word recognition for Thai for line breaking/hyphenation?

I have a ~900 page Thai document that I am beginning to typeset. In the first paragraph I notice that the lines are sometimes breaking mid-word. Thai, like its cousin Lao, spaces at the phrasal and clausal level. Recently I typeset a 700 page Lao book using luaLaTeX. Using a script and working off of a dictionary file, I wrapped each word in a macro \lw{} (lao word) and put in weighted hyphenation points. That was the only work around I could find. The document turned out very nice. For this Thai book, I want to use typst. What are my options? Melchizedek, Chedorlaomer, and Nebuchadnezzer are not in most dictionaries. Also a word that I would expect to be in the dictionary, สดุดี, is splitting mid-word.

How can I get typst to recognize a dictionary of Thai words so that it doesn’t split in the middle of a word unless there is a hyphenation point there (at which point it enters a hyphen.)

Here is my current set up in case there is something of interest.

#set page(
  width: 176mm,
  height: 250mm,
  margin: (inside: 20mm, outside: 15mm, top: 15mm, bottom: 20mm)
)

#set text(font: "Sarabun", 
     size: 11pt,
     lang: "th",
     hyphenate: true)

#set par(justify: true, 
     justification-limits: (
       spacing: (min: 90% - 0.01em, max: 100% + 0.02em), 
       tracking: (min: -0.01em, max: 0.01em)),
     first-line-indent: (amount: 1.5em, all: true),
     linebreaks: "optimized",
     leading: 0.85em,
     spacing: 0.85em)

tags: thai, hyphenation, line-breaking, justification

Hi! Unfortunately, it looks like that hypher, Typst’s library for separating words into syllables, does not support Thai.
I suggest you open an issue in that repo, so that Thai people can have a central place to discuss it and get it implemented.

recognize a dictionary of Thai words

The hyphenation algorithm is based on patterns instead of a dictionary.
For example, I coin the word donatinoumous and paste it onto typst.app/tools/hyphenate. The result looks fine.

If you want to add Thai support to hypher, you can take Add Catalan patterns by reknih · Pull Request #15 · typst/hypher · GitHub as an example.

However, I can’t read Thai and I’m not sure if pattern-based hyphenation is theoretically possible for Thai…

Thank you for your reply. Being able to control where hyphens and word breaks are possible and where they are forbidden is necessary for Thai, Lao, Khmer type setting.

I followed the links in the repositories you linked to, but do not understand the patterns.

Here is the English hyphenation rules.

https://github.com/typst/hypher/blob/main/patterns/hyph-en-us.tex

Digging deeper, I see that they link to a tex repository which includes a folder with Thai dictionaries and exception rules.

https://github.com/hyphenation/tex-hyphen/tree/master/source/th

I can create the dictionary files, but knowing how to compile them for local use in a way that typst can access them is beyond my understanding. (My long list of proper names is not relevant to the majority of users and shouldn’t clunk up the general code.)

Would I need to recompile typst each time I add a word to a dictionary or change a hyphenation point? There must be a better way.