I’m looking to convert a document containing headings, equations, figures, citations, and references into a LaTeX file using a programmatic approach. The layout is relatively simple, as it’s meant for research manuscripts.
Instead of manually parsing the Typst file, I’d like to know if there’s a way to extract these components—ideally into a structured format like JSON or plain text—so I can process them and generate a .tex file more easily.
I’ve tried exporting to HTML, but it skips some elements.
Is there any method or tool available to help with this kind of extraction?
May be using filtering and #show: doc => { } calls.
Pandoc? I haven’t seen any other solutions. Well, by specifically writing a converter for 2 formats, you can make it more precise. I don’t know how much better the Typst support has become in Pandoc.
I follow your opinion. I think your best bet is to use the typst crate directly (or use the tool mentionned, although with caveats). That will give you the “truth” about the document layout.
If you want a bastardized version (as long as you don’t have context), you can always print the repr.
Yep, my tool Frans Skarman / ttt · GitLab sounds like it would do what you want, either to use or as inspiration for your own tool.
The idea is to just use the typst compiler as a library and extracting the expanded but unlayouted document from the middle of the compilation proocess.
I have submitted a paper written using this tool and have not heard any complaints yet
There are some limitations which you can see in the readme
Elaborating on @quachpas’ answer: To get a repr-like output as structured data instead of a string, you can put a label on the whole document and retrieve it with typst query, for example in JSON format:
#show: it => [#metadata(it)<all>] + it
#set math.equation(numbering: "(1)")
= A
$ x $<x>
See @x.
Running typst query file.typ '<all>' --pretty gives a JSON representation of the document:
Note that the content is in a “styled” container, which corresponds to the set-rule for equation numbering. In the JSON we cannot see what this rule does. If you have meaningful styling rules in your document this might be a problem. And if you have opaque elements (anything that uses context) you also won’t see the content in the JSON.
This is quite good—having a JSON file makes it easy to extract the content and generate non-stylized versions of the document in other formats.
Thanks for sharing your approach.
I ran some tests, and here’s what I’ve observed so far:
Styles disappear: That’s fine, since they’re not needed.
Comments disappear: Acceptable, though preserving them would be preferable.
Custom functions are expanded: Great.
Equations using package-defined functions become “context”: Not ideal, but depending on complexity, it’s possible to define custom function equivalents, so it’s manageable.
Labels are retained: In your example they were removed, but in my tests they remained—which is great.
I intend to start using this method to generate LaTeX versions of moderately simple manuscripts.