Optimizing Comparisons Using Combinatorial Design

TL; DR: I think we can do this but I am still working on finding the most efficient way of doing it.1 This post is a work-in-progress.

Suppose we need to compare every pair of elements from a list of length $n$. Then, we would need to make exactly $\frac{n(n-1)}{2}$ comparisons. Depending on our end goal, there are several ways to reduce this number or at least reach it more efficiently. For instance, if I compared the natural numbers 3 and 4, and then I compared 4 and 5, AND if I know that the underlying order is transitive, then, I do not need to compare 3 and 5. I already know that $5 > 3$ since $5 > 4$ and $4 > 3$ and $>$ is transitive. This can be very useful but in this post, I want to focus on another approach which can always be combined with this one.

Let me introduce some elements of the problem I am trying to solve so that it makes more sense. In my setup, I need to achieve the comparison through an API call. So, if I were to do this naively, I would send two elements for comparison in each call and then I would need to make $\frac{n(n-1)}{2}$ calls, which would be both expensive and time-consuming. However, I realized that I could in fact pass more elements in one call and receive their (linear) order with approximately the same cost and accuracy as doing this through binary comparisons. So, instead of sending $<A, B>$ and receiving $A < B$, I can send $<A, B, C>$ and receive $A < B < C$.2

Once I noticed this, I realized this can reduce the cost significantly since longer API calls carry significantly more information but cost roughly the same. To see this, note that each binary comparison $<A, B>$ resolves exactly one unknown3 whereas a comparison of $<A, B, C, D>$ would resolve six unknowns4 at once! In fact, I found out that I could go up to 20 elements at once without a significant decrease in quality or increase in cost. This is 195 more efficient than making binary comparison calls, if I could ensure that each binary comparison happens exactly once in my calls.

When I started thinking about how I could ensure that each binary comparison happens exactly once through these longer calls, I realized it is not that trivial. I first formulated my new problem as follows. Let $k<n$ be the maximum number of elements I can compare at once. Given a set $N$ with $n$ elements, I need to find a collection $C$ of its subsets such that:

The first condition is simple enough, it just says that we are looking at subsets of a certain size. We can re-write the second condition above as follows to unpack it easier:

If we achieve this, every pair would be included in one and only one subset so that each binary comparison would be done exactly once.

It took only a few minutes of playing with examples to realize that this is impossible in general for many combinations of $n$ and $k$. I considered the simplest example that would make sense: $N = { 1, 2, 3, 4}$ and $k=3$. (Unless it is ambiguous, I’ll use $ijk$ to denote the set ${i, j, k}$ in what follows.) Here, no matter which subsets we choose, we either cannot cover every binary comparison or some binary comparisons happen several times. For instance, if I choose $123$ (meaning ${1, 2, 3}$) as my first subset, this leaves out all comparisons with 4. But if I add any other subset (of cardinality 3), it would have to have two common elements with $123$.

At this point, I thought this is one of those things that you look up from CLRS so I reached my copy and started browsing the table of contents as well as index. I was looking for graph related algorithms that might be useful for my problem. There are several ways of mapping this problem to graph theory. Here is one:

Let $G = <V, E>$ be a complete graph (so that any two pair of vertices constitute an edge). Then, I want to find a set of simple cycles such that:

After not finding anything that seems close enough, I started browsing some graph theory texts but I couldn’t find it there either so I started googling and asking ChatGPT to do some research for me. This is how I came across Wikipedia pages for Block Design and Steiner System. Kirkman’s Schoolgirl Problem5 might be the simplest description of the issue at hand:

Fifteen young ladies in a school walk out three abreast for seven days in succession: it is required to arrange them daily so that no two shall walk twice abreast.

I found out that what I want can be stated as follows: I want to find edge-disjoint Cycle 1-Cover of a complete graph $G$ with the added constraint that the cycles are the same size.6 At this point, I have many statements of the problem but I can’t tell the same about any solutions.

Getting back on track, here is my current situation. I already knew from the example above with a 4 elements set that doing what I wanted is not feasible for any pair of $k$ and $n$. So, now the problem I want to solve is something like this:

Given a set $N$ with $n$ elements, I need to find the smallest collection $C$ of its subsets such that:

In other words, we drop the requirement that each binary comparison happens exactly once. Instead, we require that (i) each binary comparison happens at least once, (ii) our collection of subsets is minimal in the sense that it has the smallest cardinality among all collections that satisfies our other requirements. Given that we are dealing with finite sets, such a smallest collection always exist. What I need is an efficient way of finding one such set.

Looking for a computational/heuristic way of solving this problem, I found Lindner and Rodger’s textbook on design theory. There, they define minimal covering in the context of Steiner triple system, which means case of $k=3$ in my language in this post. They provide the relevant definition and consider some examples but ultimately, everything is for $k=3$ so this might be another deadend.

I went back to Google and started looking for ways of constructing these subsets/cycles. Intuitively, it should be possible to find a backtracking/DP solution but I also feel like there must be a better way than what I have in mind. I came across this thread about finding a cycle of a fixed length. Some random self-citing nerd explains that the best known algorithms for this is $O(n^2)$ for even $k$ and basically $O(n^3)$ for odd $k$. Note that this is for finding one single cycle. Considering the number of potential cycles that I would need to extract, this does not seem to be a very realistic approach for large $k$ and $n$.7 However, he also mentions another result which shows that we can find paths in linear time but it is linear in $|V| + |E|$ and since the graph is complete, $|V|$ is essentially $n^2$, although it decreases as we extract cycles.

Not sure what will be the most fruitful direction from here. Will update the post when I make any improvements, hopefully soon.


  1. I could have implemented a sub-optimal solution and moved on with my life much faster than it will take me to resolve this issue but that’s clearly not happening. ↩︎

  2. We don’t need to worry about indifferences for now so I am using strict inequalities. ↩︎

  3. Since there is only one combination of size 2 in a doubleton set. ↩︎

  4. Suppose we find out that $A < B < C < D$. We now know that $A < B$, $B < C$, $C < D$, $A < C$, $A < D$, and $C < D$. Equivalently, there are 6 ways of choosing 2 elements out of 4 (without replacement). ↩︎

  5. This page seems to be a great introduction to the history and development of the field. ↩︎

  6. Cycle k-cover of a graph is apparently a family of cycles that cover every edge of $G$ exactly $k$ times. Edge-disjoint means what you’d expect it to mean. ↩︎

  7. My recently deceased kernel (which passed away while trying to solve the problem this way) agrees. ↩︎