I’m trying to make a tool that fetches data from public databases on chemicals, proteins, microorganisms etc so you can display hazard and precautionary statements automatically, just input smiles, or other kinds of formulas and it gets the name/other way around write the name and it gets the formula. I’m not sure if it would make sense to include these databases inside the library, and would prefer it if it was able to fetch and cache these things on change/first compilation. I’m not seeing anything in the documentation about networking and making web requests…
I’m sure there’s plenty of other tools which would benefit from this functionality as well.
This post will probably be helpful:
From the image documentation I didn’t see any mention of importing images from the web, so I just want clarification on whether or not this is possible in Typst. Thanks.
If you want to know more about why this is not one of Typst’s features, I find this comment on a proposed feature illuminating:
typst:main
← tetofonta:main
This is unfortunately quite problematic for security reasons. Typst puts a stron… g focus on being safe to run on untrusted input. There are two sides to this:
- **Input:** We try to make sure that Typst code has as little access to files on your computer as possible. For this reason, you can only read files within the root directory of your project.
- **Output:** Even if your Typst code managed to read secret information, we aim to make it as hard as possible to exfiltrate that information.
Both sides are important because both have some inherent weaknesses:
- On the input side, it's sometimes hard to define what the sandbox should even be! If the users compiles in their home directory, the project sandbox is all of their home directory (which is a lot). If the project contains a symlink pointing out of it, that might have been intentional, or it could be part of an attack (though transferring a symlink to a person is already more complicated than a plain file).
- On the output side, we are in a pretty good position because Typst forbids any kind environment interaction. The only exfiltration route I'm currently aware of is by embedding the information directly in the PDF and getting the user to send it to you. That requires some social engineering, but it's not impossible. It's something we can't really prevent, underlining the importance of sandboxing on the input side.
So with these things in mind, it may become clear why this PR is problematic: It opens up an exfiltration route on the output side. Concretely, in this implementation, if a script gets ahold of valuable information, it could encode that information as a base64 package name and then try to import that from a custom server that processes that information. This means effectively a total breakdown of our security promises.
Now, upon thinking further, one might come up with the idea of only allowing static string literals in import statements: Then, the string can't be data dependant and we're good, right?
Unfortunately we aren't. Let me give one example of a possible attack: You prepare an array with 400 functions containing import statements á la `@git:git@myserver.com/myuser/package:0.0.1` .. `@git:git@myserver.com/myuser/package:0.0.400`. Then, you transmit a byte sequence by calling the function at the index corresponding to that byte, and then removing that index from the array. That way, you can unambigiously transmit $400-256+1$ bytes. There are probably myriads of further ways that could be used to exfiltrate information (e.g. timings attacks). In short: Any kind of code-triggered remote fetching to an untrusted server is very problematic.
I'm not yet certain of its safety, but there is at least an _idea_ for how to support Git imports without such problems. The idea is to only support Git or other kinds of remote non-Typst-Universe imports in a `typst.toml` manifest. That way, we can fetch all Git dependencies upfront (including, very importantly, transitive ones!) with no data-dependance and then do not allow for any network access during code evaluation. We've [discussed it a bit on Discord](https://discord.com/channels/1054443721975922748/1176122103355953162/1191795494326906880) before, but it's still very much in an early design stage, so more discussion is necessary!
This would probably still be opt-in, since it's still a network access and the remote party would still know _that_ you compiled the document, but I think it would solve the data exfiltration problem.