Prism language definition for Typst

Out of need and lack of availability (see here), I’ve created a Prism language definition for Typst.

Mc-Zen/prism-typst: Syntax-highlighting Typst code with Prism (github.com)

It is far from being perfect or complete, so you are invited to make PRs and to add tests cases! I won’t be able to maintain this big-style, so any help is appreciated :)

4 Likes

You can view a demo here:

Typst/Prism (mc-zen.github.io)

It shows the capabilities as well as some of the limits.

2 Likes

I wrote textmate grammar for typst. This is another regex-based grammar. There are also some testcases for the grammar, and you might get interested: tinymist/syntaxes/textmate/tests/unit at main · Myriad-Dreamin/tinymist · GitHub. The testcases were accumulated during development and daily usage in last year.

2 Likes

Wow crazy, this is a loot of tests!

I wonder: can the grammar handle arbitary nesting of code/markup mode (this is used in the tinymist extension, right - if so I guess it can handle it fine)? Because in Prism this is impossible I think, because recursive regexes are not supported and there is also a known issue for other Prism definitions.

1 Like

textmate is limited as well, but we do have handled arbitrary nesting of code/markup/math mode. The fact of the grammar tells we can parse all typst syntax by sacrificing some corner cases. Among them, the most cursed thing is not nested mode code but accurate switch mode from expression to markup. For example, it cannot handle the following case:

#for idx in range(10) {} {}
// this is code brace ^^ ^^ that is plain text

In the example, whether a { is parsed as a brace is determined by whether it is after a “range expression of the for”. This requires indefinite backtracking if we parse the expression by regex.

But as I have said above, we can sacrifice some accuracy, either parse some text (at the boundary of markup text and code) as code, or skip some code (at the boundary) in the text. We can assume #for cannot be mixed in the markup content, then we can simply handle a for expression by:

for ([^\n\{\[]*|BRACE_EXPR|BRACKET_EXPR)*

It starts to handle a for expression if it sees a for keyword, it handles any content until a new line and will match braces and brackets in pairs in the content.

Yes, it has been used for a long time.

Oh yeah, I noticed the same problem when working on the Prism definition.

Anyway, you’ve done excellent work here! I’ve never noticed any issues with the highlighting actually.