Unicode drafts a report on `text(cjk-latin-spacing)`

Typst provides the option text(cjk-latin-spacing: auto) to automatically insert spacing between CJK and Latin characters. (PR #2334 / v0.9.0)

Example
#text(cjk-latin-spacing: auto)[第4章介绍了基本的API。]  // default
#text(cjk-latin-spacing: none)[第4章介绍了基本的API。]

A month ago (2024-12-16), the Unicode Consortium (and W3C experts) published a Proposed Draft Technical Report, describing the algorithm and data for the spacing:

https://www.unicode.org/reports/tr59/

The draft report disagrees with typst’s current implementation in the following aspects.

  • It includes all East Asian scripts (Bopomofo, Tangut, Yi, etc.), not only Han/CJK.

    Most notably, it adds space for Hangul (≈ Korean, K). (Typst dropped K on purpose? See PR #2829.)

  • It suggests different treatment of 增加20%以后 (Chinese, C) and 進捗は20%です (Japanese, J).

  • It uses a systematic way to avoid adding double spacing.

UTR #59 is receiving public review until 2025-04-02, and is likely to be released with Unicode 17 in 2025 September.
Maybe we should provide feedback on the disagreement? In case someday, we update typst’s implementation according to a finalized UTR #59.

1 Like

I am wondering how grounded the spacing proposal by W3C is, they do insist that their algorithm matches current practices, but I would be interested in reading the inputs of native speakers of each script if a PR lands to implement it.

The property values are chosen first to match existing practice in East Asian contexts in their respective environments.

The Acknowledgments section lists three authors: Koji Ishii, Yasuo Kida, and Fuqiao Xue. The first two have Japanese names and are likely to be Japanese, and the third one is the editor of Requirements for Chinese Text Layout(《中文排版需求》). Therefore, I think UTR #59 worth considering.

Oh, it is by Unicode, not W3C, although “originated from a discussion at W3C” according to Acknowledgments.

Unicode is not the author of the proposal, all three authors work at different companies (Google, Apple and W3C). I did read too fast. It does seem worth considering but it’s probaby worth waiting the final version before starting any work

1 Like

Sounds strange. There should be spacings on both sides in Japanese, too, because % is a Latin character.

And there should be spacings on both sides of inline math, such as “エネルギーは$E$と表す”.

It looks like that jlreq does not distinguish % from other Western characters (cl-27) either.

Could you provide a sample image? A book page, newspaper, poster, or so.

According to Japanese Industrial Standard JIS X 4051 (2004) 日本語文書の組版方法, on which jlreq is mostly based:

横書きでは,和文と欧文との間の空き量は,四分アキを原則とする。(In horizontal writing, the amount of space between Japanese text and Latin text should be quarter em in principle.)

和文と欧文を区別する方法は,処理系定義とする。(Distinction between Japanese and Latin text is implementation defined.)

単位記号と和文との間の空き量は,四分アキを原則とする。(In principle, the amount of space between the unit symbol and the Japanese text should be quarter em.)

So the problem is whether the percent sign is Japanese or Latin (or unit symbol). I think ‘%’ is Japanese, ‘%’ is Latin, and both are unit symbols.

Various Japanese LaTeX document classes handle ‘%’ differently. Widely used jsarticle and jsbook etc (COI: I made them!) insert a quarter-em space after ‘%’, but older document classes don’t. Surprisingly, Microsoft Word doesn’t insert spaces around ‘%’!

In short, logically one should insert glues ~0.25em around a Latin text (including ‘%’ and inline math), but some software tools don’t. (For us, one of the serious weaknesses of Typst compared to LaTeX is that it does not insert spaces around inline math.)

BTW, according to ISO 80000, there should be space between figures and symbol of units, too. Knuth prefers \, (unbreakable thin space).

1 Like

I’ve just learned a lot from my Twitter followers and finally solved the mystery a bit. jlreq (and JIS X 4051) classifies “%” as a “postfixed abbreviation (cl-13)” rather than a Latin character, which doesn’t seem to require quarter-em spacing (according to Table 1 of jlreq). Unless it is a bug in the specification, there needs no automatic space between “20%” and “です”. Please ignore my previous statements. My apologies. :man_bowing:

1 Like

I’ve just checked relevant sections of jlreq more carefully. So character classes are NOT mutually exclusive?

Edit: I created Should character classes be mutually exclusive? · w3c/jlreq · Discussion #455 · GitHub.

1 Like

jlreq is not perfect :thinking:

As a compromise, I hit upon a solution that ‘%’ (U+FF05) is a postfix abbreviation and ‘%’ (U+0025) is a Western character, but jlreq does not distinguish these two points.