How could I merge pairs with the same first member in an array?

I have an array of pairs (gloss, headword), generated as part of a function creating dictionary entries.

(
  ("tea", "Tee"),
  ("the", "das"),
  ("the", "der"),
  ("the", "die"),
  ("title", "Titel")
)

As you see, the same gloss can be used for different language words. I would like to transform this array into:

(
  ("tea", "Tee"),
  ("the", ("das", "der", "die")),
  ("title", "Titel")
)

So as to get one line per gloss in the final layout.

My real use case has 1290 pairs, so I am interested in solutions that don’t require the reading of the entire array 1290 times. Having first sorted it, I tried to find how to compare a pair with its immediate neighbour only, with no success.

Here’s a function that does this. It first produces a dictionary (because I had that kind of function lying around), and then translates it to an array again.

Grouping or not grouping the values in an array depending on how many there are might hinder or help the code that’s using the grouping, so you can choose depending, if that’s helpful.

Using a dictionary here ensures that the operation is efficient.

/// Take an array of (key, v) and create a dictionary mapping key to values where
/// values are all `v` with the same `key`
#let group-pairs(values) = {
  let result = (:)
  for element in values {
    let key = element.at(0) 
    if key not in result {
      result.insert(key, ())
    }
    result.at(key).push(element.at(1))
  }
  result
}

#group-pairs(
  (
    ("tea", "Tee"),
    ("the", "das"),
    ("the", "der"),
    ("the", "die"),
    ("title", "Titel")
  )
).pairs().map(((k, v)) => if v.len() == 1 { (k, ) + v } else { (k, v) })

Alternatively, a functional approach:

#let group-entries(data) = {
  data.fold((), (groups, (gloss, headword)) => {
    if groups.len() == 0 or groups.last().first() != gloss {
      // Start a new entry if the accumulator is empty or the gloss changes
      groups.push((gloss, headword))
    } else if type(groups.last().last()) == array {
      // If the current gloss already has multiple headwords, append to the existing array
      groups.last().last().push(headword)
    } else {
      // If this is the second headword for the gloss, convert the single headword into an array
      groups.last().last() = (groups.last().last(), headword)
    }
    groups
  })
}

See fold()) for details on the core of this. groups is the already processed part, and each iteration either adds a new pair, converts a single-word pair into one with a list of words, or adds an extra word to a pair that already had two or more.

1 Like