How to text wrap inside a table cell?

Marius · March 25, 2025, 2:34am

I am struggling since several hours when trying to let words within a table cell wrap so that they don’t overlap the table cell’s width. I would appreciate any hint in the right direction.

#set page(
  width: 297mm,   // A4 landscape width
  height: 210mm,  // A4 landscape height
  margin: 10mm    // 10mm margin on all sides
)

#set text(size: 6pt)

#let results = csv("example.csv")

// Filter to match the exact 14 columns
#let expected-headers = (
  "id", "volume", "asset_group", "asset_id", "date", "source_type", 
  "gain_loss", "gain_loss_within_1_year", "gain_loss_ge_1_year", 
  "consumed_volumes", "years_passed_list", "consumed_ledger_ids", 
  "gain_loss_list", "source_reference"
)
#let headers = results.at(0)  // Extract the first row as headers
#let header-indices = headers.enumerate().filter(((i, h)) => expected-headers.contains(h)).map(((i, _)) => i)
#let filtered-headers = expected-headers  // Use exact headers
#let data = results.slice(1)  // Remaining rows as data
#let filtered-data = data.map(row => header-indices.map(i => row.at(i)))

// Adjusted table-columns for 14 columns, summing to ~277mm (297mm - 20mm margins)
#let table-columns = (
  15mm,  // id
  20mm,  // volume
  20mm,  // asset_group
  20mm,  // asset_id
  25mm,  // date
  18mm,  // source_type
  18mm,  // gain_loss
  23mm,  // gain_loss_within_1_year
  23mm,  // gain_loss_ge_1_year
  20mm,  // consumed_volumes
  20mm,  // years_passed_list
  20mm,  // consumed_ledger_ids
  20mm,  // gain_loss_list
  15mm   // source_reference
)  // Total: 277mm

#let wrap-long-content(content) = {
  // Debug: Log the type and raw content
  let debug-prefix = "TYPE: " + type(content) + " | "
  if type(content) == str {
    // Debug: Confirm function is processing strings
    let result = debug-prefix
    // Handle timestamp-like strings
    if content.match(regex("\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\+\d{2}")) != none {
      let parts = content.split(" ")
      if parts.len() == 2 {
        result += parts.at(0) + "\n" + parts.at(1)
        return result
      }
    }
    // Handle empty or NULL arrays
    if content == "{}" or content == "{NULL}" {
      return result + ""
    }
    // Handle arrays (split on commas)
    if content.starts-with("{") and content.ends-with("}") {
      let trimmed = content.trim("{", "}")
      if trimmed.len() == 0 { return result + "" }
      let parts = trimmed.split(",")
      result += parts.join("\n")
      return result
    }
    // Handle long strings with no spaces
    if content.len() > 10 and content.match(regex("\s")) == none {
      let chars = content.clusters()
      let broken = ""
      for (i, c) in chars.enumerate() {
        broken += c
        if i > 0 and calc.rem(i, 10) == 0 { broken += sym.zws }
      }
      result += broken
      return result
    }
    return result + content
  } else if type(content) == content {
    if content.has("text") {
      return wrap-long-content(content.text)
    }
  }
  // Fallback for non-string/non-text content
  return debug-prefix + repr(content)
}

// General cell styling with wrapping
#show table.cell: it => block(
  width: 100%,
  breakable: true,
  inset: 2pt
)[
  #set text(hyphenate: true)
  #box(width: 100%, wrap-long-content(it))
]

// Specific header row styling with forced breaking on underscores
#show table.cell.where(y: 0): it => block(
  width: 100%,
  breakable: true,
  inset: 2pt
)[
  #set text(hyphenate: true, size: 7pt, weight: "bold")
  #let header-text = if type(it.body) == str { it.body } else if it.body.has("text") { it.body.text } else { repr(it.body).trim("[]") }
  #box(width: 100%, header-text.replace("_", "_\n"))
]

#table(
  columns: table-columns,
  table.header(
    ..filtered-headers.map(h => [#h]),  // Convert headers to content
    repeat: true  // Ensure the header repeats on every page
  ),
  ..filtered-data.flatten()  // Flatten the filtered data rows
)

gezepi · March 25, 2025, 6:38am

Could you provide some data that exhibits the behavior you are running into? It’s much easier to answer questions with a minimum working example (MWE). Currently I can’t recreate your problem because compilation fails on the line FILENAME because the variable isn’t defined anywhere (plus I don’t have any data to display in the table).

Andrew · March 25, 2025, 6:53am

A website that can be helpful: https://sscce.org.

The set page rule can be simplified like this:

#set page(flipped: true, margin: 1cm)

I suspect that you just need to use these:

The problem the way it’s phrased should be very trivial and the fact that there is so much code is very surprising.

Marius · March 25, 2025, 7:25am

Thanks for your replies. My template code reflects multiple iterations with the Groke AI because I thought that would be the easiest to get quick results. For instance I would prefer to let the header rows auto-break on underscore but only got it working with the custom code for that. While I can live with that the cell overlapping is a real issue.

I simplied the “#set page” part as suggested and adapted my filename placeholder so you can more easily reproduce.

I am afraid this only works well for natural words but not for identifiers like 3ddGwdKkUuUzcxwbMTXXooHMBMDKUieSmb69sRiZzdsVez7xfr2aJhopWk5fV3UvebuZ7vwhheCwzFB2Yx24S7hi.

#set page(flipped: true, margin: 1cm)

#set text(size: 6pt)

#let results = csv("example.csv")

// Filter to match the exact 14 columns
#let expected-headers = (
  "id", "volume", "asset_group", "asset_id", "date", "source_type", 
  "gain_loss", "gain_loss_within_1_year", "gain_loss_ge_1_year", 
  "consumed_volumes", "years_passed_list", "consumed_ledger_ids", 
  "gain_loss_list", "source_reference"
)
#let headers = results.at(0)  // Extract the first row as headers
#let header-indices = headers.enumerate().filter(((i, h)) => expected-headers.contains(h)).map(((i, _)) => i)
#let filtered-headers = expected-headers  // Use exact headers
#let data = results.slice(1)  // Remaining rows as data
#let filtered-data = data.map(row => header-indices.map(i => row.at(i)))

// Adjusted table-columns for 14 columns, summing to ~277mm (297mm - 20mm margins)
#let table-columns = (
  15mm,  // id
  20mm,  // volume
  20mm,  // asset_group
  20mm,  // asset_id
  25mm,  // date
  18mm,  // source_type
  18mm,  // gain_loss
  23mm,  // gain_loss_within_1_year
  23mm,  // gain_loss_ge_1_year
  20mm,  // consumed_volumes
  20mm,  // years_passed_list
  20mm,  // consumed_ledger_ids
  20mm,  // gain_loss_list
  15mm   // source_reference
)  // Total: 277mm

#let wrap-long-content(content) = {
  if type(content) == str {
    // Handle timestamp-like strings
    if content.match(regex("\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\+\d{2}")) != none {
      let parts = content.split(" ")
      if parts.len() == 2 {
        return parts.at(0) + "\n" + parts.at(1)
      }
    }
    // Handle empty or NULL arrays
    if content == "{}" or content == "{NULL}" {
      return ""
    }
    // Handle arrays (split on commas)
    if content.starts-with("{") and content.ends-with("}") {
      let trimmed = content.trim("{", "}")
      if trimmed.len() == 0 { return "" }
      let parts = trimmed.split(",")
      return parts.join("\n")
    }
    // Handle long strings with no spaces
    if content.len() > 10 and content.match(regex("\s")) == none {
      let chars = content.clusters()
      let broken = ""
      for (i, c) in chars.enumerate() {
        broken += c
        if i > 0 and calc.rem(i, 10) == 0 { broken += sym.zws }
      }
      return broken
    }
    return content
  } else if type(content) == content {
    if content.has("text") {
      return wrap-long-content(content.text)
    }
  }
  // Fallback for non-string/non-text content
  return content
}

// General cell styling with wrapping
#show table.cell: it => block(
  width: 100%,
  breakable: true,
  inset: 2pt
)[
  #set text(hyphenate: true)
  #box(width: 100%, wrap-long-content(it))
]

// Specific header row styling with forced breaking on underscores
#show table.cell.where(y: 0): it => block(
  width: 100%,
  breakable: true,
  inset: 2pt
)[
  #set text(hyphenate: true, size: 7pt, weight: "bold")
  #let header-text = if type(it.body) == str { it.body } else if it.body.has("text") { it.body.text } else { repr(it.body).trim("[]") }
  #box(width: 100%, header-text.replace("_", "_\n"))
]

#table(
  columns: table-columns,
  table.header(
    ..filtered-headers.map(h => [#h]),  // Convert headers to content
    repeat: true  // Ensure the header repeats on every page
  ),
  ..filtered-data.flatten()  // Flatten the filtered data rows
)

example.csv

id,volume,asset_group,asset_id,date,source_type,gain_loss,gain_loss_within_1_year,gain_loss_ge_1_year,consumed_volumes,years_passed_list,consumed_ledger_ids,gain_loss_list,source_reference
530285709,121.0587935,abc,3ddGwdKkUuUzcxwbMTXXooHMBMDKUieSmb69sRiZzdsVez7xfr2aJhopWk5fV3UvebuZ7vwhheCwzFB2Yx24S7hi,2013-12-31 01:00:00+01,2,,0,0,{NULL},{NULL},{NULL},{NULL},3ddGwdKkUuUzcxwbMTXXooHMBMDKUieSmb69sRiZzdsVez7xfr2aJhopWk5fV3UvebuZ7vwhheCwzFB2Yx24S7hi

Renders to

quachpas · March 25, 2025, 7:40am

You can follow the answer at How to wrap long "unbreakable" text in a table cell? - #2 by Eric and use a regex to split the ID.

P.S.: I have renamed your topic to “How to tex wrap inside a table cell?”, see

Marius · March 25, 2025, 7:45am

I am already giving long strings are special treatment

#let wrap-long-content(content) = {
  if type(content) == str {
    // Handle timestamp-like strings
    if content.match(regex("\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\+\d{2}")) != none {
      let parts = content.split(" ")
      if parts.len() == 2 {
        return parts.at(0) + "\n" + parts.at(1)
      }
    }
    // Handle empty or NULL arrays
    if content == "{}" or content == "{NULL}" {
      return ""
    }
    // Handle arrays (split on commas)
    if content.starts-with("{") and content.ends-with("}") {
      let trimmed = content.trim("{", "}")
      if trimmed.len() == 0 { return "" }
      let parts = trimmed.split(",")
      return parts.join("\n")
    }
    // Handle long strings with no spaces
    if content.len() > 10 and content.match(regex("\s")) == none {
      let chars = content.clusters()
      let broken = ""
      for (i, c) in chars.enumerate() {
        broken += c
        if i > 0 and calc.rem(i, 10) == 0 { broken += sym.zws }
      }
      return broken
    }
    return content
  } else if type(content) == content {
    if content.has("text") {
      return wrap-long-content(content.text)
    }
  }
  // Fallback for non-string/non-text content
  return content
}

// General cell styling with wrapping
#show table.cell: it => block(
  width: 100%,
  breakable: true,
  inset: 2pt
)[
  #set text(hyphenate: true)
  #box(width: 100%, wrap-long-content(it))
]

Marius · March 25, 2025, 7:54am

this regex works good for me

#show regex("[A-Za-z0-9]+"): it => {
  it.text.codepoints().join(sym.zws)
}

quachpas · March 25, 2025, 7:55am

Don’t forget to tick the answer checkbox if you found an answer satisfying ;). This marks the topic as solved.

Marius · March 25, 2025, 8:09am

Allow me a follow-up question. When inserting sym.zws after each character selecting the whole string via double click does not work anymore which is different to behavior we are used to see on websites. Is there a workaround for that?

Andrew · March 25, 2025, 9:41am

A simpler regex would be \w+, but it also matches letters/numbers from any script.

Andrew · March 25, 2025, 9:47am

In Okular v24.12.3 selecting a whole line at a time works just fine.

#show regex("\w+"): it => it.text.clusters().intersperse(sym.zws).join()
sthciesatsctihascthiesathcsietahcstiehasctiheastchienastchiensatchiensatchieasnctiheasnctiheasntchieasntciheasnticheascitea

Oh, and also for some situations there is a difference between code points and code point clusters, so generally speaking .clusters() is superior, unless you specifically need to cut by code points.