How to optimize a `query` that leads to severe performance problems?

a_w · April 9, 2025, 12:30pm

Hey I’m working with some fairly large documents (up to 1000 pages) where the headers must guide through the readers the sections, chapters and paragraphs.

(Example: Section 3 · Chap. F · Par. 25 — Section 3 · Chapter G · Par. 2)

Thanks to the support of three very helpful members here, I was able to implement a solution with query like so: How to get the number of the first and last paragraph on a page and show the result in the header?

The main part to get the paragraph numbers for the header looks like this (full code at the end):

// Helper Functions
#let get-short(label, x) = {
  let shorts = query(selector(label).before(x.location()))
  return shorts.at(-1, default: none)
}
#let shorts-and-id(x) = {
  let h1 = get-short(<h1-short>, x)
  let h2 = get-short(<h2-short>, x)
  return (h1: h1.value, h2: h2.value, id: x.value)
}

// Chunk of the query code
let pars = query(selector.or(<h4-start>, <h4-end>))
  .filter(x => x.location().page() == here().page())
  .map(shorts-and-id)
  .dedup()
if pars.len() == 0 {
  // No paragraph starting or ending on this page
  // -> check if a paragraph starts before and ends after this page
  let prevs = query(selector(<h4-start>).before(here()))
  let nexts = query(selector(<h4-end>).after(here()))
  if prevs.len() == 0 or nexts.len() == 0 {
    return
  }

The problem with the solution is, that compilation times grow so extensively (cubic or even higher), that the documents quickly become uncompileable.

I didn’t notice that at first, because I worked only with draft parts. When I tried to compile one of the actual documents, it didn’t finish in +5 hours and I had to cancel the compilation process.

The minimal example code (you’ll find a copy at the end) generates the following results on my M1 MacBook (measured with time for typst compile in zsh):

1 Section:    0,64s user  0,07s system  95% cpu 0,743 total
2 Sections:   4,99s user  0,46s system 127% cpu 4,264 total
3 Sections:  27,57s user  3,39s system 141% cpu 21,922 total
4 Sections:  72,40s user  7,25s system 145% cpu 54,725 total
5 Sections: 151,57s user 24,86s system 135% cpu 2:09,92 total
6 Sections: 257,02s user 60,35s system 133% cpu 3:57,47 total

So while the amount of content went up 6x, the CPU user time went up 400x. Showing a bottleneck that scales cubic or even higher, quickly leading to unworkable compile times for the actual documents…

Is there maybe an error or ineffieciency in my code that leads to this (maybe parts like .before(here() are to open ended)?
How can I optimize the query code to display the Section/Chapter/Paraphs with compile times that don’t feel like I’m mining Bitcoin on my poor laptop?

You can quickly test/reproduce this with the following MWE :

Full Code Example 📋

// Helper Functions
#let get-short(label, x) = {
  let shorts = query(selector(label).before(x.location()))
  return shorts.at(-1, default: none)
}
#let shorts-and-id(x) = {
  let h1 = get-short(<h1-short>, x)
  let h2 = get-short(<h2-short>, x)
  return (h1: h1.value, h2: h2.value, id: x.value)
}
// Output assembly
#let header-content(a, b) = {
  return [#a.h1, #a.h2, Par. #a.id --- #b.h1, #b.h2, Par. #b.id]
}

// Actual Query
#set page(
  width: 120mm,
  height: 100mm,
  header: context {
    let pars = query(selector.or(<h4-start>, <h4-end>))
      .filter(x => x.location().page() == here().page())
      .map(shorts-and-id)
      .dedup()
    if pars.len() == 0 {
      // No paragraph starting or ending on this page
      // -> check if a paragraph starts before and ends after this page
      let prevs = query(selector(<h4-start>).before(here()))
      let nexts = query(selector(<h4-end>).after(here()))
      if prevs.len() == 0 or nexts.len() == 0 {
        return
      }
      let prev = shorts-and-id(prevs.last())
      let next = shorts-and-id(nexts.first())
      if prev != next {
        return
      }
      return header-content(prev, next)
    }
    return header-content(pars.first(), pars.at(-1, default: none))
  },
)

// The Elements
#let my-h1(short: none, ..args) = {
  heading(level: 1, ..args)
  [#metadata(short)<h1-short>]
}
#show heading.where(level: 1): it => {
  pagebreak(weak: true)
  it
}
#let my-h2(short: none, ..args) = {
  heading(level: 2, ..args)
  [#metadata(short)<h2-short>]
}
#let my-par(id: none, body) = text({
  [#metadata([#id])<h4-start>]
  if id != none {
    heading(level: 4, id)
  }
  body
  [#metadata([#id])<h4-end>]
  hide("")
})
#show heading.where(level: 4): it => it.body + " "

// ⬇️ Test Content (setting num-sections below to 1,2,3…6 should roughly lead to the results in my post)
#let num-sections = 1
// Used chapter letters just for better overview in the example output
// ABCDEFGHIJKLMNOPQRSTUVWXYZ
#let chapter-letters = "ABCDEFGH"
#let num-paragraphs-per-chapter = 25

#let s = 1
#while s <= num-sections {
  my-h1(short: "Section " + [#s])[This is Section #s]
  let c = 1
  for letter in chapter-letters {
    my-h2(short: "Chap. " + [#letter])[This is Chapter #letter]
    let p = 1
    while p <= num-paragraphs-per-chapter {
      my-par(id: [#p], lorem(50))
      p = p + 1
    }
  }
  s = s + 1
}

(Note: It is set to output 1 section by default, so it doesn’t crash anyones tinymist. You can flexibly set the amount of test output is at the end to reproduce.)

bluss · April 9, 2025, 4:18pm

If it’s possible maybe you could share a project with the complete code (not necessarily the complete document) and then it would be easier to work with in practice.

It’s clear that the paragraph scan is at least O(n²) when performed once per page. I wonder now if it could be written like this:

let thispage = here().page()
let pagestart = label("page-begin-" + str(thispage))
let pageend = label("page-end-" + str(thispage))
let pars = query(selector.or(<h4-start>, <h4-end>).after(pagestart).before(pageend))

Where you use labels that delimit each page (you place them in header/footer so that they do). That is assuming of course that query before/after benefits from this.

If that works, it could be an improvement. Then the “no paragraph on this page” case needs the same kind of care too. Maybe you could disable that case and see if it contributes to the slowdown or not.

I wonder if none of that works, maybe it’s possible to yourself optimize by performing certain queries only once (and storing in state) instead of repeating variants of them each page.

Based on code from the previous thread, I mean that you can query certain information only once or once per chapter, and then you refer back to it later.

#context {
  query(selector.or(<h4-start>, <h4-end>)).map(x => (x.location().page(), shorts-and-id(x))).dedup()
}

Once you have that data in a dictionary, it should be efficient to use.

a_w · April 9, 2025, 4:46pm

You mean in addition to the full code example at the end (Update: I’ve tried to make it more visible in case it was too easy to overlook)? I thought it would be easier for anyone willing to help if the example was abstracted to the problem (thinking SSCCE) and flexibly scalable for trying different approaches, rather than having to deal with a huge file containing a large chunk of static (test) content? What form should I provide that would be easier to work with?

That would be the ideal case I guess, but combining state and query in a way that doesn’t make things worse is currently still a bit above my Typst skill level

But as you say (or rather as I was hoping), and from my limited understanding of how Typst works internally, if the scope of the query were as narrow as it can be (which I don’t know: Can an element that didn’t start, but instead only reaches to a given page can be still queried there?), it would likely speed up the processing considerably.

bluss · April 9, 2025, 5:02pm

I’m sorry, I missed that

laurmaedje · April 9, 2025, 5:21pm

I haven’t looked at the details here, but this could be relevant: Quadratic runtime of cache validation · Issue #5220 · typst/typst · GitHub

a_w · April 10, 2025, 12:16pm

Thank you, that’s good to know. Unfortunately, I can’t verify if that applies here, since I don’t know what the (temporary?) solution would look like in this case for comparison — kind of like a cache buster for the query?

bluss · April 10, 2025, 8:48pm

This page-start/end idea helps in the example, but only by increasing performance by 30% or so. It doesn’t change the scaling of the issue I think so it’s not a solution at all.

I have some success now with the cache strategy. I think this 1) speeds it up massively, 2) changes scaling to be better(?). It’s not blazingly fast, but it’s much better.

The pars.len() == 0 case is unimplemented, it doesn’t affect the example. But it needs to use the same cached datastructure too.

Code

// Helper Functions
#let get-short(label, x) = {
  let shorts = query(selector(label).before(x.location()))
  return shorts.at(-1, default: none)
}
#let shorts-and-id(x) = {
  let h1 = get-short(<h1-short>, x)
  let h2 = get-short(<h2-short>, x)
  return (h1: h1.value, h2: h2.value, id: x.value)
}
// Output assembly
#let header-content(a, b) = {
  return [#a.h1, #a.h2, Par. #a.id --- #b.h1, #b.h2, Par. #b.id]
}

// Return array. Index: page number. Value: Array of paragraph info on that page.
#let shorts-per-page() = {
    let pars = query(selector.or(<h4-start>, <h4-end>))
    let d = ()
    for par in pars {
      let pg = par.location().page()
      while d.len() <= pg { d.push(()) }
      d.at(pg).push(shorts-and-id(par))
    }
    return d
}

#let cache = state("parscache", none)


// Actual Query
#set page(
  width: 120mm,
  height: 100mm,
  header: context {
    let pg = here().page()
    let cachedpars = cache.get()
    if cachedpars == none {
      cachedpars = shorts-per-page()
      cache.update(_ => cachedpars)
    }
    let pars = cachedpars.at(pg, default: none)
    if pars == none {
      return
    }
    if pars.len() == 0 {
      return
    }
    header-content(pars.first(), pars.at(-1, default: none))
  }
)

// The Elements
#let my-h1(short: none, ..args) = {
  heading(level: 1, ..args)
  [#metadata(short)<h1-short>]
}
#show heading.where(level: 1): it => {
  pagebreak(weak: true)
  it
}
#let my-h2(short: none, ..args) = {
  heading(level: 2, ..args)
  [#metadata(short)<h2-short>]
}
#let my-par(id: none, body) = text({
  [#metadata([#id])<h4-start>]
  if id != none {
    heading(level: 4, id)
  }
  body
  [#metadata([#id])<h4-end>]
  hide("")
})
#show heading.where(level: 4): it => it.body + " "

// ⬇️ Test Content (setting num-sections below to 1,2,3…6 should roughly lead to the results in my post)
// Used chapter letters just for better overview in the example output
// ABCDEFGHIJKLMNOPQRSTUVWXYZ
#let chapter-letters = "ABCDEFGH"
#let num-paragraphs-per-chapter = 25

// use sys.inputs for command line parametrization
#let s = 1
#let num-sections = int(sys.inputs.at("N", default: "1"))
#while s <= num-sections {
  my-h1(short: "Section " + [#s])[This is Section #s]
  let c = 1
  for letter in chapter-letters {
    my-h2(short: "Chap. " + [#letter])[This is Chapter #letter]
    let p = 1
    while p <= num-paragraphs-per-chapter {
      my-par(id: [#p], lorem(50))
      p = p + 1
    }
  }
  s = s + 1
}

time typst compile patho.typ  --input N=1
0.20user 0.07system 0:00.27elapsed 105%CPU (0avgtext+0avgdata 54300maxresident)k
time typst compile patho.typ  --input N=2
0.41user 0.10system 0:00.39elapsed 130%CPU (0avgtext+0avgdata 78108maxresident)k
time typst compile patho.typ  --input N=3
0.77user 0.14system 0:00.58elapsed 156%CPU (0avgtext+0avgdata 107572maxresident)k
time typst compile patho.typ  --input N=4
1.34user 0.19system 0:00.83elapsed 184%CPU (0avgtext+0avgdata 129604maxresident)k
time typst compile patho.typ  --input N=5
2.05user 0.24system 0:01.11elapsed 207%CPU (0avgtext+0avgdata 150760maxresident)k
time typst compile patho.typ  --input N=6
3.35user 0.34system 0:01.64elapsed 225%CPU (0avgtext+0avgdata 185552maxresident)k

Time is still scaling nonlinearlly.

a_w · April 11, 2025, 10:10am

I think you are not giving yourself enough credit, so I will: You’ve just sped up the processing time by a couple of thousand percent thanks so much! As you mention, it still scales nonlinearly, but it is WAY faster than before and actually compileable. Out of curiosity… in the issue @laurmaedje mentioned…

What you are most likely observing is a problem with incremental compilation where validating a cache hit is much more expensive than recomputing from scratch in some cases.

Would this mean it could get even faster when switching the…

// Out of your code
let cachedpars = cache.get()

// As in his example (slow)
#for _ in range(n) {
  context bar()
}
// vs. (faster)
#for i in range(n) {
  context bar(i)
}

…for example every 10 or 20 pages?

Andrew · April 12, 2025, 11:52am

The syntax, especially for the loops, can be simplified to be more readable:

code

// Helper Functions
#let get-short(label, x) = {
  query(selector(label).before(x.location())).at(-1, default: none)
}
#let shorts-and-id(x) = {
  let h1 = get-short(<h1-short>, x)
  let h2 = get-short(<h2-short>, x)
  (h1: h1.value, h2: h2.value, id: x.value)
}
// Output assembly
#let header-content(a, b) = {
  [#a.h1, #a.h2, Par. #a.id --- #b.h1, #b.h2, Par. #b.id]
}

// Actual Query
#let header = context {
  let pars = query(selector.or(<h4-start>, <h4-end>))
    .filter(x => x.location().page() == here().page())
    .map(shorts-and-id)
    .dedup()
  if pars.len() != 0 {
    return header-content(pars.first(), pars.at(-1, default: none))
  }
  // No paragraph starting or ending on this page
  // -> check if a paragraph starts before and ends after this page
  let prevs = query(selector(<h4-start>).before(here()))
  let nexts = query(selector(<h4-end>).after(here()))
  if prevs.len() == 0 or nexts.len() == 0 { return }
  let prev = shorts-and-id(prevs.last())
  let next = shorts-and-id(nexts.first())
  if prev != next { return }
  header-content(prev, next)
}

#set page(width: 120mm, height: 100mm, header: header)

// The Elements
#let my-h1(short: none, ..args) = {
  heading(level: 1, ..args)
  [#metadata(short)<h1-short>]
}
#show heading.where(level: 1): it => pagebreak(weak: true) + it

#let my-h2(short: none, ..args) = {
  heading(level: 2, ..args)
  [#metadata(short)<h2-short>]
}

#let my-par(id: none, body) = {
  [#metadata[#id]<h4-start>]
  if id != none { heading(level: 4, id) }
  body
  [#metadata[#id]<h4-end>]
  hide("")
}
#show heading.where(level: 4): it => it.body + " "

// ⬇️ Test Content (setting num-sections below to 1,2,3…6 should roughly lead to the results in my post)
#let num-sections = 1
// Used chapter letters just for better overview in the example output
// ABCDEFGHIJKLMNOPQRSTUVWXYZ
#let chapter-letters = "ABCDEFGH"
#let num-paragraphs-per-chapter = 25

#for s in range(1, num-sections + 1) {
  my-h1(short: [Section #s])[This is Section #s]
  for letter in chapter-letters {
    my-h2(short: [Chap. #letter])[This is Chapter #letter]
    for p in range(1, num-paragraphs-per-chapter + 1) {
      my-par(id: [#p], lorem(50))
    }
  }
}

And for the last solution:

code

// Helper Functions
#let get-short(label, x) = {
  query(selector(label).before(x.location())).at(-1, default: none)
}
#let shorts-and-id(x) = {
  let h1 = get-short(<h1-short>, x)
  let h2 = get-short(<h2-short>, x)
  (h1: h1.value, h2: h2.value, id: x.value)
}
// Output assembly
#let header-content(a, b) = {
  [#a.h1, #a.h2, Par. #a.id --- #b.h1, #b.h2, Par. #b.id]
}

// Return array. Index: page number. Value: Array of paragraph info on that page.
#let shorts-per-page() = {
  let pars = query(selector.or(<h4-start>, <h4-end>))
  let page-pars-array = ((),) * (counter(page).final().first() + 1)
  for par in pars {
    page-pars-array.at(par.location().page()).push(shorts-and-id(par))
  }
  page-pars-array
}

#let cache = state("parscache", none)

// Actual Query
#let header = context {
  let page = here().page()
  let cachedpars = cache.get()
  if cachedpars == none {
    cachedpars = shorts-per-page()
    cache.update(_ => cachedpars)
  }
  let pars = cachedpars.at(page, default: none)
  if pars == none or pars.len() == 0 { return }
  header-content(pars.first(), pars.at(-1, default: none))
}

#set page(width: 120mm, height: 100mm, header: header)

// The Elements
#let my-h1(short: none, ..args) = {
  heading(level: 1, ..args)
  [#metadata(short)<h1-short>]
}
#show heading.where(level: 1): it => pagebreak(weak: true) + it

#let my-h2(short: none, ..args) = {
  heading(level: 2, ..args)
  [#metadata(short)<h2-short>]
}

#let my-par(id: none, body) = {
  [#metadata[#id]<h4-start>]
  if id != none { heading(level: 4, id) }
  body
  [#metadata[#id]<h4-end>]
  hide("")
}
#show heading.where(level: 4): it => it.body + " "

// ⬇️ Test Content (setting num-sections below to 1,2,3…6 should roughly lead to the results in my post)
// Used chapter letters just for better overview in the example output
// ABCDEFGHIJKLMNOPQRSTUVWXYZ
#let chapter-letters = "ABCDEFGH"
#let num-paragraphs-per-chapter = 25

// Use sys.inputs for command line parametrization.
#let num-sections = int(sys.inputs.at("N", default: "1"))
#for s in range(1, num-sections + 1) {
  my-h1(short: [Section #s])[This is Section #s]
  for letter in chapter-letters {
    my-h2(short: [Chap. #letter])[This is Chapter #letter]
    for p in range(1, num-paragraphs-per-chapter + 1) {
      my-par(id: [#p], lorem(50))
    }
  }
}

However, from limited testing, this doesn’t affect performance.

a_w:

Would this mean it could get even faster when switching the…
// Out of your code
#let cachedpars = cache.get()

// As in his example (slow)
#for _ in range(n) {
  context bar()
}
// vs. (faster)
#for i in range(n) {
  context bar(i)
}
…for example every 10 or 20 pages?

Using a bar(page) for cache.get() doesn’t seem to make a difference, at least for 1 s runs.

By the way, you should check out hyperfine for benchmarking and --timings typst compile flag.

hyperfine --warmup 1 'typst c file.typ'

a_w · April 14, 2025, 6:30am

Thank you for the refactoring, @Andrew; I’ve learned a couple of random shortcuts there too. I’ve marked the answer of @bluss as a solution, as everything else builds upon his cache strategy. Which I want to thank him for again; I can now compile the actual documents. It is still somewhat slow, but ~10 minutes is MUCH better than +5 hours (without finishing).

And also thanks for the tip with hyperfine, @Andrew, I’ve checked it out. About the…

Can you show me how you tested it? I would like to try to fiddle with --jobs and the “busted” cache for the longer runs. My “idea” (rather my understanding of what Laurenz said in the issue) would have been to try “recomputing from scratch” every couple of pages, but I don’t know WHEN validating the cache hit becomes more taxing than recomputing (so e.g. if it makes sense to recompute everytime or just after x pages).

bluss · April 14, 2025, 7:51am

I’m happy that it helps! I didn’t realize you had documents that were even bigger than these examples, 10 minutes of compile time is still an insane scale of document

I did some experiments in the direction of - what if you use different label names in different chapters of the document <h4-start-chapter1> and so on, but from early testing it seems like the count of each label type doesn’t matter, only the total number of labels or total count of pages where the query is used. (But maybe there is some kind of improvement that can be found?)

I think that at this scale, every label and locatable element has some overhead, so it’s worth thinking about and testing every little detail.

bluss · April 14, 2025, 7:54am

There might have been a small win if you - using Andrew’s code and function name - call it like header(here().page()) and remove the here().page() lookup from the inside header function. I saw in timings that there was a small improvement that way. I would expect that the caching in a state variable we did solves most of the cache validation problem, but I don’t know.

Andrew · April 16, 2025, 6:12pm

I changed let cachedpars = cache.get() to let cachedpars = bar(page) with let bar(_) = cache.get() and run hyperfine.

a_w · April 17, 2025, 6:14am

Thank you I will test everything mentioned and report back.

SillyFreak · April 18, 2025, 8:19pm

I played with this a bit as well, and @bluss’s solution basically adds two big improvements: the caching, and removing the two queries

For me the runtime (real, N=3) goes from ~25.6s with the original code to ~10.7s without caching, only the additional queries removed, to ~0.5s with bluss’s code.

This makes sense: even without caching, not performing these two queries per page (which are distinct due to the before(here()) filter) will add up.

So I tried to also get rid of the queries in get-short; that’s four queries per paragraph (shorts-and-id() performs two queries, and it’s called for both paragraph start and end). I ended up with the following shorts-per-page():

// Return array. Index: page number. Value: Array of paragraph info on that page.
#let shorts-per-page() = {
    let elems = query(selector.or(<h1-short>, <h2-short>, <h4-start>, <h4-end>))

    let h1 = none
    let h2 = none
    let d = ()
    for elem in elems {
      let lbl = elem.label
      if lbl == <h1-short> { h1 = elem }
      else if lbl == <h2-short> { h2 = elem }
      else {
        let pg = elem.location().page()
        while d.len() <= pg { d.push(()) }
        d.at(pg).push((h1: h1.value, h2: h2.value, id: elem.value))
      }
    }

    return d
}

(The helpers are no longer needed.) Overall this means that Typst is now only running a single query per layout iteration – just that that query has more results.

This brought the runtime (real, N=10) from ~3.1s to ~2.1s. So the speedup is not as extreme as what bluss was already able to achieve, but it probably will still add up.

a_w · April 24, 2025, 3:23pm

The easter holidays delayed my tests a bit, but I want to begin with thanking @SillyFreak for his effort on top of the work of Andrew and bluss, thank you very much! Your solution cuts the rendering time almost in half again.

Here is an overview of what I get in terms of real rendering time for one of my documents. I didn’t use hyperfine as I only did 1-2 runs each, because it still takes 1,5-2 Min. to render. So the following timings are the total time given by the time command on a M1 Max Mac.

1. Original version
Not compilable (5h+)

2. Optimized version of Bluss
2:34,80 total

3. Optimized version of SillyFreak
1:21,97 total

4. Cache busting attempt

let bar(_) = cache.get()
let cachedpars = bar(page)

→ No change

5. Moving Code to header() function
→ Made it 5-6 seconds slower in my tests (probably “No change”)

6. Removing heading page break

#show heading.where(level: 1): it => {
  pagebreak(weak: true)
  it
}

→ 1:13,24 total

7. Running on M2 instead of M1 Max
→ 1:08,04 total

8. Rendering without any paragraph count
→ 20,644 total

9. Removing ToC as well
→ 19,830 total

10. Removing page numbers as well
→ 9,683 total

11. --jobs=2 vs. --jobs=10
→ No Change

The biggest remaining levers were throwing more single-core performance on the problem and removing the default page break for level 1 headings. But we probably can’t expect more speed ups even if the memoization in comemo improves as the cache busting didn’t affect the compilation times. The query, page numbers and headline page break add about ~1:10 of overhead in total.

But I can now render the documents! Thanks so much again to everyone involved.

Andrew · April 24, 2025, 5:45pm

hyperfine --runs 2