Is there any way to get a character count of the entire body text? I can’t find anything in the docs, and GitHub Copilot keeps suggesting things like body.text().len()
which do not exist
The answer depends on what the body
actually means in this context.
For a string of text, it’s simply "text".len()
(or "".clusters().len()
for grapheme counting instead of UTF-8 bytes). For a content that consists only of plain text, it will be #[text].text.clusters().len()
.
For more complex examples, a more verbose solution is needed. A solution by @Eric for something similar mentioned here would be:
#let to-string(content) = {
if content.has("text") {
content.text
} else if content.has("children") {
content.children.map(to-string).join("")
} else if content.has("body") {
to-string(content.body)
} else if content == [ ] {
" "
}
}
#to-string[This _*cool*_ project] // => "This cool project"
I haven’t tested this, it might still miss some text, but it’s a great start nevertheless. So you would use it like this:
#let content-variable = [This _*cool*_ project]
#to-string(content-variable).clusters().len()
// Output: 17
Here’s another method of counting each character in a document that uses regular expressions:
#let charCounter = counter("charCounter")
#show regex("[a-zA-Z]"): it =>{
charCounter.step()
it
}
Total characters: #charCounter.display()
In this example the result is 15 characters, the number present in “Total characters:”.
The regex string can be adjusted to include spaces and special characters as well.
I don’t know what the performance impact of doing it this way is, but I assume it’s not a good option for large documents.
Aside from the large performance impact, it will also negatively affect text layout. Each letter is seen individually by the layout engine (at least currently), which prevents e.g. ligatures. If the performance is acceptable, this can be used temporarily, but I would not use this for a final deliverable.
Just came across this plugin: wordometer.
I haven’t tried it, but it looks like it may help.
I’ve been using wordometer for word count. It does character count as well.
There are some things it can’t count properly. Look at some open issues on GitHub.
A follow up question: what is your use case? Do you need the count in your rendered document, or outside just as a metric?
I think at the moment the more accurate way to go about a whole document character count will be to use an external script after rendering a PDF. So, have a toggle to disable some things you don’t want counted (page numbers, etc), compile it, then convert the PDF to text, do a count.
Thanks! Yeah, I can see that wordometer
has some issues that are dealbreakers in my case (I need to included spaces, for one). Seemed perfect otherwise.
Yeah, my use case is including the character count in the actual document. So I think you’re right that doing a character count on a rendered PDF is probably the best solution
Thanks for all of your suggestions! What a great community
I’ll experiment with your suggestions, but I have a feeling that the most trustworthy solution in the end is probably to run a character count on the final PDF.
Still trying to get a handle on the Typst scripting language, but it is quite powerful!