Why does Typst crash with large amounts of content?

Hi all.

I started using Typst to generate pdfs for datatables in LiveTable.

I’m streaming all my table rows into a .typ file, with the following template-

    
  defp generate_typst_file(query, path, [header_keys, header_labels]) do
    typst_template = """
    #set page(
      paper: "a4",
      margin: (x: 0.5cm, y: 0.5cm),
    )

    #set text(
      font: "Libertinus Serif",
      size: 8pt,
      weight: "regular"
    )

    #table(
      columns: (auto, ) * #{length(header_labels)},
      inset: (x: 4pt, y: 3pt),
      align: left,
      stroke: (thickness: 0.4pt, paint: rgb(80, 80, 80)),
      fill: (col, row) => {
        if row == 0 { rgb(245, 245, 245) }
        else { white }
      },

      #{generate_table_header(header_labels)},
    """

    File.write!(path, typst_template)

    case stream_data_to_file(query, path, header_keys) do
      {:ok, _} ->
        File.write!(path, "\n)", [:append])
        {:ok, path}

      error ->
        error
    end
  end

Then using System.cmd("typst", ["compile", tp_path, pdf_path])(an elixir command) to compile the .typ file into a pdf. (Essentially running typst compile typst_path pdf_path) in terminal.

This works fine for running small exports like 5000-10,000 rows. But when I increase the export size to 1 Million rows, it takes a long time, and finally my terminal session crashes. (I’m using this in LiveTable to demonstrate Elixir’s scale of handling large data while keeping it realtime). I tried this on different systems with varying ram sizes, but to no avail.

Is this an issue with typst? Or is there a better way to interface typst with elixir?

Any help is appreciated :smile:

Hi. It sounds like you are pushing the limit of what Typst can do, similar to How to optimize a `query` that leads to severe performance problems?, but the code doesn’t actually do any computations, maybe.

Without a concrete example with a data sample, no one can know for sure. Your code snippet is missing parts of code and any data sample. See https://sscce.org/. Also, what is the “tp” file extension?

Here is what I think the whole template looks like with some data:

#set page(paper: "a4", margin: (x: 0.5cm, y: 0.5cm))
#set text(size: 8pt)

#let header-labels = ("a", "b", "c")
#let header-keys = ("x", "y", "z")

#table(
  columns: header-labels.len(),
  inset: (x: 4pt, y: 3pt),
  stroke: 0.4pt + rgb(..(80,) * 3),
  fill: (col, row) => if row == 0 { rgb(..(245,) * 3) } else { white },
  table.header(..header-labels),
  ..header-keys,
)

You can also use luma() for grayscale colors.

1 Like

The typst snippet I included just for clarity. Parts of the typst template are generated dynamically in my code, which were’nt included in the above snippet.
I stream all my data to a .typ file and compile it using typst compile input_path output_path

This setup works fine for small documents ~ some 50,000 rows or so, amounting to a hundred pages of compiled PDF.

But when I increase the scale to slightly more, thats when Typst is unable to handle it, and takes down my terminal instance.

It seems you are compiling a file with all the data inlined. Have you tried something else, like loading data from external files? What is the error message you are getting?

typst has some debugging(-ish) tools, like the --timings flag. Do the traces look normal?

If you are dealing with tens of gigabytes worth of data, this is probably a memory issue, that either your system or typst can’t manage. But from the minimal amout of information given, it is impossible to know.

If you are able to create multiple smaller files, would you be able to join these files together with other tools?

3 Likes

I can create documents with line count ca. 25,000 in 12-13 seconds. Typst input file is created from an UML model and contains also many diagrams. I guess, for a 40x larger document you need a lot of RAM (and some patience)

1 Like

Apologies if the information I’ve provided is insufficient.

I’m running an Ecto database query, that returns the x number of records as a list. These are streamed to the .typ file using elixir’s File.write!(path, "\n)", [:append]).

Here’s the full generation process- Github

The .typ file is then compiled using typst compile input_path output_path.

The records im dealing with are 1Million. For some 10,000 records, it took some 10-15 sec but pdf was compiled, and it was a 500 page pdf. When I increase the count to 1 million, the process goes on and on… and my system becomes slow, and ultimately terminal instance is killed automatically. (Probably ran out of memory. My system has 16gb ram).

The .typ file is getting generated. Takes some 30 seconds. But browser is unable to open such a large file. But when I try to compile it into a pdf, terminal crashes. How to fix this?

Is there any chance you could generate a file of, say, 10 records and post it here? It’s not clear to me what the final .tp Typst file looks like. I see the following in the file you linked to, but it makes me think that your table grows horizontally, not vertically which is not how I expected it to work when reading your questions.

#set page(
  paper: "a4",
  margin: (x: 0.5cm, y: 0.5cm),
)

#set text(
  font: "Libertinus Serif",
  size: 8pt,
  weight: "regular"
)

#table(
  columns: (auto, ) * /*#{length(header_labels)}*/,//How many columns will this have?
  inset: (x: 4pt, y: 3pt),
  align: left,
  stroke: (thickness: 0.4pt, paint: rgb(80, 80, 80)),
  fill: (col, row) => {
	if row == 0 { rgb(245, 245, 245) }
	else { white }
  },
  /*#{generate_table_header(header_labels)},*/ //What does this actually generate?

//How is the table finished?

Have you tried the --timings flag as @aarnent suggested? Can you verify with some system monitoring tools (top for instance) that Typst is consuming all system memory?

Assuming linux, I’d suspect out of memory and the oom-killer, but you could maybe check system logs to figure out what kind of error causes the program to crash?

1 Like

I still haven’t gotten the answer what tp extension is, but if it means Typst file, then the extension should be “typ”.

1 Like

This is the generated typst file for 10 records. With the .tp extension.

#set page(
  paper: "a4",
  margin: (x: 0.5cm, y: 0.5cm),
)

#set text(
  font: "Libertinus Serif",
  size: 8pt,
  weight: "regular"
)

#table(
  columns: (auto, ) * 8,
  inset: (x: 4pt, y: 3pt),
  align: left,
  stroke: (thickness: 0.4pt, paint: rgb(80, 80, 80)),
  fill: (col, row) => {
    if row == 0 { rgb(245, 245, 245) }
    else { white }
  },

  [*ID*], [*Product Name*], [*Description*], [*Price*], [*Category Name*], [*Category Description*], [*Image*], [*Amount*],
[1], [Premium Coffee Beans], [100% natural ingredients. No artificial preservatives.], [5.60], [Food], [Quality Food from trusted brands], [https://picsum.photos/seed/1/400/300], [4043.20],
[2], [Think and Grow Rich], [Bestseller with excellent reader reviews. Available in paperback and hardcover.], [4.19], [Books], [Quality Books from trusted brands], [https://picsum.photos/seed/2/400/300], [3569.88],
[3], [Basic Product], [Standard quality product with good value for money.], [6.35], [Health & Wellness], [Quality Health & Wellness from trusted brands], [https://picsum.photos/seed/3/400/300], [425.45],
[4], [Generic Item], [Standard quality product with good value for money.], [4.41], [Home & Kitchen], [Quality Home & Kitchen from trusted brands], [https://picsum.photos/seed/4/400/300], [3139.92],
[5], [Generic Item], [Standard quality product with good value for money.], [7.14], [Health & Wellness], [Quality Health & Wellness from trusted brands], [https://picsum.photos/seed/5/400/300], [3805.62],
[6], [Basic Product], [Standard quality product with good value for money.], [4.31], [Automotive], [Quality Automotive from trusted brands], [https://picsum.photos/seed/6/400/300], [4189.32],
[7], [Basic Product], [Standard quality product with good value for money.], [2.20], [Beauty & Personal Care], [Quality Beauty & Personal Care from trusted brands], [https://picsum.photos/seed/7/400/300], [1216.60],
[8], [Generic Item], [Standard quality product with good value for money.], [6.70], [Sports & Fitness], [Quality Sports & Fitness from trusted brands], [https://picsum.photos/seed/8/400/300], [6586.10],
[9], [Generic Item], [Standard quality product with good value for money.], [5.94], [Health & Wellness], [Quality Health & Wellness from trusted brands], [https://picsum.photos/seed/9/400/300], [4098.60],
[10], [Basic Product], [Standard quality product with good value for money.], [3.19], [Automotive], [Quality Automotive from trusted brands], [https://picsum.photos/seed/10/400/300], [3056.02],
[11], [Organic Honey], [100% natural ingredients. No artificial preservatives.], [8.31], [Food], [Quality Food from trusted brands], [https://picsum.photos/seed/11/400/300], [4196.55],

)

.tp extension isn’t typst?
I was thinking that the fastest way to compile a pdf was to write to the input.tp file and compile it with typst compile command.

Its working fine for small batches, so I figured that was it.

The official extension is “typ”: https://www.iana.org/assignments/media-types/text/vnd.typst#:~:text=3.%20File%20extension(s):%20.typ.

Have you tried writing data to a separate CSV/JSON file and use that for compilation?

1 Like

With the provided data I was able to replicate 10 s compilation time, but the amount of memory was used compared to reading from a CSV file… is a lot. I was getting around 2.6 GiB of RAM reported by btop. After changing to an external file it dropped significantly. With all optimizations it’s now at about 800 MiB and the time is slightly above 2 seconds.

#set page(margin: (x: 0.5cm, y: 0.5cm))
#set text(size: 8pt)

#let (header-row, ..data) = csv("data.csv")

#show table.cell.where(y: 0): strong
#table(
  columns: header-row.len(),
  inset: (x: 4pt, y: 3pt),
  stroke: 0.4pt + luma(80),
  table.header(..header-row.map(table.cell.with(fill: luma(245)))),
  ..data.map(row => (row.at(6) = link(row.at(6))) + row).flatten(),
)

Here, the fill callback function was removed because it was executed for each cell, which there are a lot of.

Moreover, with the added as is content I wasn’t able to get timings to save, because it was doing that for way too long.

In the end, I tried with the CSV file that has the same 10 rows repeated:

ID,Product Name,Description,Price,Category Name,Category Description,Image,Amount
1,Premium Coffee Beans,100% natural ingredients. No artificial preservatives.,5.60,Food,Quality Food from trusted brands,https://picsum.photos/seed/1/400/300,4043.20
2,Think and Grow Rich,Bestseller with excellent reader reviews. Available in paperback and hardcover.,4.19,Books,Quality Books from trusted brands,https://picsum.photos/seed/2/400/300,3569.88
3,Basic Product,Standard quality product with good value for money.,6.35,Health & Wellness,Quality Health & Wellness from trusted brands,https://picsum.photos/seed/3/400/300,425.45
4,Generic Item,Standard quality product with good value for money.,4.41,Home & Kitchen,Quality Home & Kitchen from trusted brands,https://picsum.photos/seed/4/400/300,3139.92
5,Generic Item,Standard quality product with good value for money.,7.14,Health & Wellness,Quality Health & Wellness from trusted brands,https://picsum.photos/seed/5/400/300,3805.62
6,Basic Product,Standard quality product with good value for money.,4.31,Automotive,Quality Automotive from trusted brands,https://picsum.photos/seed/6/400/300,4189.32
7,Basic Product,Standard quality product with good value for money.,2.20,Beauty & Personal Care,Quality Beauty & Personal Care from trusted brands,https://picsum.photos/seed/7/400/300,1216.60
8,Generic Item,Standard quality product with good value for money.,6.70,Sports & Fitness,Quality Sports & Fitness from trusted brands,https://picsum.photos/seed/8/400/300,6586.10
9,Generic Item,Standard quality product with good value for money.,5.94,Health & Wellness,Quality Health & Wellness from trusted brands,https://picsum.photos/seed/9/400/300,4098.60
10,Basic Product,Standard quality product with good value for money.,3.19,Automotive,Quality Automotive from trusted brands,https://picsum.photos/seed/10/400/300,3056.02

And came to the conclusion, that the memory usage and time spent compiling increases linearly for 10’000, 20’000, and 30’000 rows of data: 800/1600/2400 MiB, 2.2/4.5/6.8 s.

So, if we blindly divide 15 GiB by about 850 MiB and then multiply by compilation time for 10k rows, we get about 180k rows, 40 s and fully filled RAM. If swap is used, then you can handle bigger data.

For 1 million rows you would need only about 220 seconds, but…uhh, also about 80 GiB of RAM. I mean, with swap it’s probably achievable, but the compilation time with that will probably drastically increase, and the system can freeze.

P.S. Not sure if you need the links to the URLs, but I didn’t find that it affected the compilation time, though it might affect RAM usage and it most definitely increases file size: from 2.49 MiB to 16.80 MiB for 30k rows.

2 Likes

I would assume this is slower than loading the data from a JSON or similar file. If you present Typst with the data in a data file, it will load it into a more efficient format than if it is Typst source code. Typst source can contain way more varied content than json, toml, yaml, etc. so it will both take up more space in memory, and take more processing power when working through what the data mean.

In addition to this surface-level reason this may improve performance: Typst uses a lot of memoization for incremental compilation, and memoization works on distinct pieces of code. So reducing the amount of Typst code will likely also improve the situation in this regard.

Since it’s not, and I also was confused by this when reading through this thread, could you correct your posts to say .typ instead? That will help readers in the future. Thanks!

3 Likes

Sure. Ill build a csv file, then try to compile it into a pdf using typst. Hopefully it takes less memory. Will get back on this.