Do function calls clone arguments, and can I avoid it?

In order to make Pergamon faster, I’m looking for a way in which I can construct trees from subtrees without copying the subtrees.

The roadblock I’m hitting is this: When I pass an array or dictionary as an argument to a function, Typst seems to make a copy of it. In the code below, when I use the nop(arr) version, the compilation time grows linearly with n; when I use the result = arr version, it stays constant.

Can someone confirm that function arguments are cloned when the function is called? I couldn’t find anything about this in the Typst documentation. I suppose this makes sense, given that the function could call push on the array it received as argument, but the function as a whole still needs to stay pure.

Does anyone have ideas on how to work around this? Is there any mechanism for passing a function argument by reference? I’d be okay with using only immutable data structures (like an array that doesn’t support push).

#let nop(data) = data

#let n = 10000 // vary this from 100 to 10000
#let arr = range(n) 

#for i in range(10000) {
  // let result = arr
  let result = nop(arr)
}

I guess this is probably it:

So, you can’t reference/avoid copying, IIUC.

Typst is also memoizing, so that might be another explanation (I don’t know which one is correct); reading the whole array parameter to check it against previous calls is also at least O(n).

Functions arguments are cloned, but shallowly with reference counting, so this is likely not the problem. The problem is likely (as @bluss mentioned) the fact the the array is repeatedly hashed for memoization purposes.

2 Likes

Thank you for the definitive answer!

I don’t suppose it is in the cards that we could have immutable arrays that only need to calculate their hash values once, so subsequent access to the hash is constant time?

Do you see any other way to accumulate values generated within different functions in time linear in the size of the accumulated structure?

Passing an accumulator object into the different functions will run into the same hashing complexity issue, and accumulating the values in a state (s.update(it => { it. push(value); it })) seems to have quadratic runtime.

I don’t suppose it is in the cards that we could have immutable arrays that only need to calculate their hash values once, so subsequent access to the hash is constant time?

It is in the cards, though not with an explicit immutable array type. I’ve been thinking about having a lazily initialized hash in arrays and dictionaries that is cleared upon mutation. (It works the same way already for content.) However, I don’t want this to incur two allocations, so it would need to share the same backing allocation as the array itself, which is somewhat difficult to realize in the current implementation. But something like this could happen in the future.

Do you see any other way to accumulate values generated within different functions in time linear in the size of the accumulated structure?

The hashing only happens at call sites of user defined functions (and not for built-ins like array.push), so perhaps you could solve the problem iteratively in a loop, without recursion or any kind of user-space function call.

s.update(x => x + (value,))

s.update(x => (..x, value))

Well, that’s actually a huge surprise to me - you’re right, the code below might have linear runtime in n. (Measured the runtimes, plotted a linear and quadratic interpolation - the quadratic interpolation is a slightly better fit, but only barely.)

How is this possible? I would have bet a lot that x + ("a",) is linear time in the length of x. Doesn’t Typst have to keep all intermediate values of a state around so they can be accessed later?

#let n = 5000000 // vary this from 1000 to 10000000
#let s = state("s", ())
#for i in range(n) {
  s.update(x => x + ("a",))
}

That would work equally well, thanks! This would be a big step forward in our ability to write efficient code in Typst packages. (Pergamon is at ~5000 lines of code right now and takes ~500 ms on first compilation to typeset a moderately-sized bibliography.)

In this case, Pergamon renders a bibliography, and the pieces of each reference are constructed in different functions. Squishing all the code into a single function would force me into code duplication and make it difficult for users of the package to customize the bibliography style. That’s why I need a mutable data structure that can be accessed from different functions - whether it’s passed around as a function argument or held as a global variable in a state.

I just know how to make the code look clean.

It does. But I assume the closures are lazily evaluated upon access. Which you don’t do. And only all closures up to the requested location. The rest are probably ignored. Though .push closures should work the same way.