Unicode drafts a report on `text(cjk-latin-spacing)`

Y.D.X · January 17, 2025, 7:58am

Typst provides the option text(cjk-latin-spacing: auto) to automatically insert spacing between CJK and Latin characters. (PR #2334 / v0.9.0)

Example

#text(cjk-latin-spacing: auto)[第4章介绍了基本的API。]  // default
#text(cjk-latin-spacing: none)[第4章介绍了基本的API。]

A month ago (2024-12-16), the Unicode Consortium (and W3C experts) published a Proposed Draft Technical Report, describing the algorithm and data for the spacing:

https://www.unicode.org/reports/tr59/

The draft report disagrees with typst’s current implementation in the following aspects.

It includes all East Asian scripts (Bopomofo, Tangut, Yi, etc.), not only Han/CJK.

Most notably, it adds space for Hangul (≈ Korean, K). (Typst dropped K on purpose? See PR #2829.)
It suggests different treatment of 增加20%以后 (Chinese, C) and 進捗は20%です (Japanese, J).
It uses a systematic way to avoid adding double spacing.
…

UTR #59 is receiving public review until 2025-04-02, and is likely to be released with Unicode 17 in 2025 September.
Maybe we should provide feedback on the disagreement? In case someday, we update typst’s implementation according to a finalized UTR #59.

quachpas · January 17, 2025, 8:05am

I am wondering how grounded the spacing proposal by W3C is, they do insist that their algorithm matches current practices, but I would be interested in reading the inputs of native speakers of each script if a PR lands to implement it.

The property values are chosen first to match existing practice in East Asian contexts in their respective environments.

Y.D.X · January 17, 2025, 8:24am

The Acknowledgments section lists three authors: Koji Ishii, Yasuo Kida, and Fuqiao Xue. The first two have Japanese names and are likely to be Japanese, and the third one is the editor of Requirements for Chinese Text Layout（《中文排版需求》）. Therefore, I think UTR #59 worth considering.

Oh, it is by Unicode, not W3C, although “originated from a discussion at W3C” according to Acknowledgments.

quachpas · January 17, 2025, 8:31am

Unicode is not the author of the proposal, all three authors work at different companies (Google, Apple and W3C). I did read too fast. It does seem worth considering but it’s probaby worth waiting the final version before starting any work

Haruhiko_Okumura · February 10, 2025, 3:11pm

Sounds strange. There should be spacings on both sides in Japanese, too, because % is a Latin character.

And there should be spacings on both sides of inline math, such as “エネルギーは$E$と表す”.

Y.D.X · February 10, 2025, 4:31pm

It looks like that jlreq does not distinguish % from other Western characters (cl-27) either.

Could you provide a sample image? A book page, newspaper, poster, or so.

Haruhiko_Okumura · February 11, 2025, 1:42am

According to Japanese Industrial Standard JIS X 4051 (2004) 日本語文書の組版方法, on which jlreq is mostly based:

横書きでは，和文と欧文との間の空き量は，四分アキを原則とする。(In horizontal writing, the amount of space between Japanese text and Latin text should be quarter em in principle.)

和文と欧文を区別する方法は，処理系定義とする。(Distinction between Japanese and Latin text is implementation defined.)

単位記号と和文との間の空き量は，四分アキを原則とする。(In principle, the amount of space between the unit symbol and the Japanese text should be quarter em.)

So the problem is whether the percent sign is Japanese or Latin (or unit symbol). I think ‘％’ is Japanese, ‘%’ is Latin, and both are unit symbols.

Various Japanese LaTeX document classes handle ‘%’ differently. Widely used jsarticle and jsbook etc (COI: I made them!) insert a quarter-em space after ‘%’, but older document classes don’t. Surprisingly, Microsoft Word doesn’t insert spaces around ‘%’!

In short, logically one should insert glues ~0.25em around a Latin text (including ‘%’ and inline math), but some software tools don’t. (For us, one of the serious weaknesses of Typst compared to LaTeX is that it does not insert spaces around inline math.)

BTW, according to ISO 80000, there should be space between figures and symbol of units, too. Knuth prefers \, (unbreakable thin space).

Haruhiko_Okumura · February 11, 2025, 6:47am

I’ve just learned a lot from my Twitter followers and finally solved the mystery a bit. jlreq (and JIS X 4051) classifies “%” as a “postfixed abbreviation (cl-13)” rather than a Latin character, which doesn’t seem to require quarter-em spacing (according to Table 1 of jlreq). Unless it is a bug in the specification, there needs no automatic space between “20%” and “です”. Please ignore my previous statements. My apologies.

Y.D.X · February 12, 2025, 5:08am

I’ve just checked relevant sections of jlreq more carefully. So character classes are NOT mutually exclusive?

％ (U+0025) ∈ Postfixed abbreviations (cl-13) ∩ Western characters (cl-27).
! (U+0021) ∈ Dividing punctuation marks (cl-04) ∩ Western characters (cl-27).
， (U+002C) ∈ Commas (cl-07) ∩ Grouped numerals (cl-24) ∩ Western characters (cl-27).
…

Edit: I created Should character classes be mutually exclusive? · w3c/jlreq · Discussion #455 · GitHub.

Haruhiko_Okumura · February 12, 2025, 5:36am

jlreq is not perfect

As a compromise, I hit upon a solution that ‘％’ (U+FF05) is a postfix abbreviation and ‘%’ (U+0025) is a Western character, but jlreq does not distinguish these two points.

Y.D.X · May 6, 2025, 2:37am

Updates in May 2025

The specification for East Asian spacing will be changed from a Unicode Technical Report (UTR) to a Unicode Technical Standard (UTS).
This change is necessary because the document will be referenced by the CSS spec for text-autospace, which is in development and being implemented in web browsers.

For more information:

The Unicode Blog: Highlights from UTC #183.
Public Review Issues #510: Proposed Draft UTR #59, East Asian Spacing (closing date: 2025.07.01)
Status of UTR#59 - Google ドキュメント (L2/25-138)