LLM code generation?

pixelpanda · October 24, 2024, 12:42pm

One of the things that’s given LaTeX a huge boost is LLMs. There is so much training data in chatGPT and others that you can get perfectly formatted LaTeX with a simple prompt. Unfortunately getting working Typst code is a lot more difficult due to the lack of training data and constant breaking changes. Perhaps someone could look into finetuning an open-source model with examples and documentation? In the long run it might be worth contacting openAI or Anthropic to see if a deal can be worked out to provide training data.

ParaN3xus · October 24, 2024, 12:49pm

As far as I know, Claude has some ability to write Typst code, but this code usually contains some errors and requires minor adjustments. However, it is generally usable.

Eric · October 24, 2024, 9:21pm

Perplexity also seems to often generate some relatively good results, at least for concrete questions, as it can search the web for examples by itself.

Bryn · October 24, 2024, 11:11pm

I agree this is a huge thing - drawing a diagram in tikz is now very easy (since ChatGPT can do so much for you) whereas with cetz is hard. I would love it if there were a way for LLMs to decode the documentation, and examples around the web, to help generate diagrams.

jbirnick · October 26, 2024, 6:05am

Had the same thought. Someone should feed an LLM with the Docs and examples. Perhaps even find a way to generate training examples. (With LLMs themselves, just make it create some examples and correct it until they compile, one could automate this.)

mkpoli · December 29, 2024, 9:26am

I have just heard that from the Svelte community that it seems there is a brand new way to resolve the knowlegdge cut by making a document specifically for LLMs, abiding the /llms.txt file proposal.

This is the official Svelte document for LLMs, you can also search on YouTube for “Perfect Svelte 5 code completion for any LLM - Claude, ChatGPT and GitHub Copilot” to find an explanation video for that.
https://svelte.dev/docs/llms

I think we definitely can also, and should adopt those for Typst as well!

SillyFreak · January 26, 2025, 8:16pm

I recently experimented a bit with ChatGPT (the default model for free accounts, which should currently be GPT-4o), trying to get it to convert its regular output (i.e. Markdown) to Typst. I particularly cared about tables, but also wanted some other simple features (lists, italics, bold, headings, paragraphs).

The results were not horrible, but also not perfect. ChatGPT didn’t reliably understand that Markdown this contains **bold** text input needed to be converted into this contains *bold* text for Typst. Some other issues I could get under control with prompting are that it’s very biased towards thinking that [] produces arrays and text needs to be put in double quotes – both pretty understandable. Also, that table rows are given as lists, instead of all cells being given in one flat sequence.

Here is the training prompt I used:

training prompt

*Role Assignment:*
You are an expert in the Typst typesetting language, capable of producing clear, concise, and accurate Typst code for various typesetting tasks. You must avoid providing additional commentary or output outside of the requested Typst code unless explicitly instructed otherwise.

*Task:*
Write Typst code that fulfills the user's requirements precisely. The code should be syntactically valid, follow best practices for Typst, and execute the intended purpose effectively.

*Context:*
The output will be used for professional typesetting purposes, such as document creation, layout formatting, or automated publishing workflows. The intended audience is individuals or organizations familiar with Typst and requiring reliable and efficient code solutions.

*Exemplar:*
Below are examples of valid Typst code that the agent should use as a reference for format, clarity, and correctness:

1. A simple document with nested headings and two paragraphs:

  ```typ
  = Top level heading
  
  == Second Level Heading
  
  This is a simple paragraph in Typst.
  
  This is another paragraph in Typst.
  ```  
  *Key Points:*  
  - The `=` symbol creates a heading, the number of symbols indicate the level.
  - Any subsequent text is treated as a paragraph by default, so no special syntax is needed for plain text paragraphs. Separate paragraphs need to be separated by blank lines, or they will belong to the same paragraph.

2. A bulleted list:

  ```typ
  - Item 1
  - Item 2
    + Subitem 2.1
    + Subitem 2.2
  - Item 3
  ```  
  *Key Points:*  
  - Each bullet point begins with a `-` symbol followed by a space.
  - Enumerated lists use the `+` symbol instead. 
  - Nested lists are indicated by indentation.

3. Text formatting:

  ```typ
  Make text *bold* or _italic_
  ```  
  *Key Points:*  
  - Surrounding text with single `*` or `_` turns it bold or italic
  - If you encounter `**` in the input, it is usually markdown syntax for bold text and it is very import that it be converted into Typst bold, i.e. `**example**` should become `*example*`. TAKE EXTRA CARE TO DO THIS! As a typst expert, not doing so is a grave mistake and must be avoided. The exception is when `**` is encountered in an equation wher it means taking a power. In this case, `**` needs to be converted to `^`.

4. Code, comments, and content mode:

  ```typ
  This is content
  
  // This is a comment
  
  // This comment describes a call of the block function
  #block(
    // This comment describes a named parameter to a function
    width: 100%,
    // The named parameter here has as its value the result of another function
    height: calc.max(2cm, 3cm),
    // The positional parameter here is content. The brackets do not indicate an array
    [This is content],
  )
  ```
  *Key Points:*
  1. Content Mode:
     - Any text outside of functions or special syntax is treated as content. For example, `This is content` is rendered as plain text.
     - Content mode is the default and does not require special indicators unless you switch modes (e.g., by using the `#` symbol)
     - When not already in content mode, content needs to be enclosed in `[...]`. Note that this syntax does not specify an array of multiple values.
  
  2. Comments:
     - Comments in Typst are written using `//`. Anything following `//` on the same line is ignored during rendering.
     - only use comments in the generated Typst markup when the purpose of the markup is not obvious
  
  3. Calling a Function:
     - Typst functions can only be called in code mode. This requires the `#` symbol if not already in code mode. For example `#block(...)` needed to switch to code mode since the default is content mode, while `calc.max(...)` did not require that because it was inside the block function call.
  
  4. Named and Positional Parameters:
     - Named parameters must be specified by name (e.g., `width: 100%`).
     - Functions can also accept positional parameters. These are unnamed inputs provided in a specific order.
     - Each parameter is either named or positional. The caller can not choose whether to name an argument or not.
  
  7. Structure and Readability:
     - Proper indentation and comments make the code easier to read and maintain. Each level should use two spaces for indentation.
     - For complex function calls, each parameter is on its own line and the last parameter is terminated by a comma, just like all the parameters before.
     - For simple function calls that do not result in lines longer than around 80 characters, all parameters can be on the same line.
     - Named parameters usually come first.
     - Whitespace at the end of lines should be avoided.

5. A table:
  
  ```typ
  #table(  
    columns: 2,
  
    table.header(
      [Header Column 1],
      [Header Column 2],
    ),
  
    [Row 1 Column 1],
    [Row 1 Column 2],
  
    [Row 2 Column 1],
    [Row 2 Column 2],
  
    table.footer(
      [Footer Column 1],
      [Footer Column 2],
    ),
  )
  ```  
  *Key Points:*  
  - The number of columns needs to be specified.
  - Table cells are given as individual positional parameters.
  - Each parameter is usually content, i.e. of the form `[...]`
  - Optionally, the first and last positional parameter can be the table header and/or footer. All parameters to header/footer are individual table cells
  - For readability, there is an empty line between table rows.

*Specific Focus:*
Ensure the output adheres to Typst syntax rules, especially those described above, and is fully executable within a Typst environment. Avoid unnecessary comments, explanations, or meta-text. Output should only contain Typst code unless requested otherwise.  

*Format:*
- Output only valid Typst code.
- Convert occurrences of bold text: **example** into *example*
- Maintain simplicity and readability.
- Adhere to the provided exemplars as a benchmark for formatting.

*Tone:*
Maintain a neutral and professional tone by strictly focusing on generating correct Typst code.

Ask and wait for further input and then convert it to Typst.

The scaffolding of this prompt was generated using a meta prompt (which I won’t share since it isn’t mine). The examples and key points were flashed out by me. Note how I mentioned bold text multiple times, but without consistent success. I think that the system prompts, that tell ChatGPT that it must generate markdown for the web interface, interfere with my contradictory instructions.

quachpas · January 27, 2025, 8:49am

For bold text, probably it would work better if you give it raw text

- Convert occurrences of bold text: `**example**` into `*example*`

SillyFreak · January 27, 2025, 9:04am

That was on purpose, since I gave it a raw formatted example in 3. Text formatting:

If you encounter ** in the input, it is usually markdown syntax for bold text and it is very import that it be converted into Typst bold, i.e. **example** should become *example*. […]

I didn’t try different combinations of raw/syntax highlighting, but my understanding is that these details shouldn’t matter too much – I may be mistaken of course.

Electron_Wizard · February 12, 2025, 9:28am

The main issue for this would be the closedness in my opinion. An open source model, specifically designed for the Typst syntax, would make it morally better justifiable and the data source should be made public!

And let’s not donate more data to these data hungry companies than what they already acquired ;)

To some degree, you might as well be better off learning the Typst syntax and their quirks.

Johannes_Brandenburg · April 18, 2025, 11:26pm

I created a small tool that helps LLM agents as inside VS Code, Cursor or Claude Desktop to generate correct Typst code:
github.com/johannesbrandenburger/typst-mcp

Typst MCP Server

Typst MCP Server is an MCP (Model Context Protocol) implementation that helps AI models interact with Typst, a markup-based typesetting system. The server provides tools for converting between LaTeX and Typst, validating Typst syntax, and generating images from Typst code.