I have been experimenting with typst as a report generation engine replacement and have noticed extremely large memory usage, gigabyte levels, when large tables are rendered. I am not the first one to have this problem, and the tips I have found and tried are:
Use any of the data loading functions, I personally am using a #json to parse an array that contains all the row data (4 MiB), and use the scripting language to build the cells to be used on the table.
Use fixed column widths.
I wish to know if there are any other things that could be tried, loading from JSON helps a little and fixed column widths didn’t show a noticeable improvement.
I may try using the API to build the cells in Rust instead of scripting their creation, but I presume the problem is related to the amount of data the compiler retain while the table is being laid out.
Note: I am currently using a Java library called iText that have the same problems if you programmatically build a single table. Its way of reducing memory consumption is that the application can add the table in batches, simulating that there are multiple tables and flushing state, but still functioning as a single one (headers and footers, etc). It maybe easier to do in a API designed for a program, that one that is more document based, but just a hint of how other PDF generation tools try to solve the problem
Hi, welcome to the forum! Could you provide further info?
Is the content of this table mainly numbers or texts, or both?
You used json, rather than binary formats like cbor, so I assume they are mostly texts?
Do you want to print the table on a single page (by set page(height: auto)) or split it into several pages? (Or don’t even care?)
The typst compiler may have done a lot of useless work on determining page breaks, I guess.
How many tables are there to be generated? In other words, what is the frequency of generating tables? Is it 100+ tables per single run, or a single table every day?
The GB-level memory usage might be caused by the cache for incremental compilation. For low frequency demands, perhaps it can be turned off.
Did you use the typst executable (CLI + sys api like exec(["typst", "compile", …], stdin=…)) or the typst library (rust/python/… package)?
Greetings, it is example data: names, cities and ages, just tried not using a number for age but directly text just in case some alignment algorithm was kicking in. same gigabytes memory usage. The data source doesn’t sound like the cause of memory usage since just loading it, and transforming it to a cell array without generating the table uses reasonable memory.
I am using the default spanning the table on standard sized pages.
It is a single table (1440 pages in my test). I don’t see an option for disabling incremental compilation. If that ends being the source of the problem, maybe the option is needed, for final compilation of documents. I would be ok if it is only at least on the typst Rust crate.
For testing I am using the CLI. I plan to experiment with the Rust crate later, probably not loading from JSON but building the cells using the Rust based model.
#let data = json("data.json")
#{
// Get all row data
let rows = data.map(element => {
(element.at("Name"), [#element.at("Age")], element.at("City"))
}).flatten()
// Alternative maybe slower
// rows = ()
// for element in data {
// rows.push(element.at("Name"))
// rows.push([#element.at("Age")])
// rows.push(element.at("City"))
// }
table(
columns: (4cm, 4cm, 4cm),
align: center,
table.header("Name", "Age", "City"),
..rows
)
}
And the data is just a large JSON array of:
{
"Name": "Alice",
"Age": 30,
"City": "New York"
}
Have you tried something like splitting the table into 10 or 100 smaller tables? What’s the effect on memory usage and document compile time in that case?
The use of #box really frees like 1 GiB of RAM of max RAM usage in this example, from 1.7 to around 0.7. a great improvement and it have lower CPU usage.
This could be an alternative, I would have to replicate a lot of the formatting the table and cell functions gives directly with the box, but it is doable, I will be translating a custom XML schema that defines the report format (cells. bold, fields, etc) to the typst model anyway, but it will be very hard to replicate the repetition of headers unless all cells are of the same height, probably cutting content.
About the usage of a report like this, It is not very frequent, but there are sometimes legal requirements to have, for example, a “book” of inventory movements, it used to be a literal book a long time ago. It is usually requested by fiscal authorities. It should have certain formatting and content, and in a Healthcare Information System, those movements can be pretty large when even the delivery of a single pill is accounted for.
Tried splitting the table and it didn’t help, and there is a problem in predicting where to split it if at least one cell content can be large and use more than one line.
I’m happy to hear that. I have no knowledge of the innards of Typst but I’m wondering if the extra memory use of table is caused by its design; table receives each cell as a separate argument to the table function. Is Typst building the entire argument list before trying to look at the individual cells?
I do not know what your formatting requirements are, but the regarding the headers, what I did propose was something like this: