TLDR: I wrote a small algorithm that efficiently approximates the Pareto front for a narrow type of multi-objective optimization problems. Despite it being narrow, it has many possible applications. There's nothing novel to the algorithm itself, so you won't get anything out of this post if you are knowledgeable about optimization algorithms. If not, there's a non-null probability you'll learn something new. The algorithm is written in Rust, but packaged as a Python library.
A Pokémon team has 6 Pokémons in it. Each Pokémon is of a certain type (water, fire, etc.), has a different set of moves, and stats. You want your team to have the best possible set of properties: have good attack, good defense, good type coverage, etc. But those properties are often at odds with each other. And, if you think about it, the best Pokémon team might not be the collection of the best individual Pokémons. It might pay off, for instance, to choose a mix of defensive and offensive Pokémons, rather than choosing to have every single one of them balanced.
How do you build the best possible Pokémon team?
This is what I have been doing at work lately. Well, not quite. Actually, I have been trying to choose the best possible management scenarios for different forest stands. But deep down they are the same thing. You have a whole (Pokémon team; country) made up of a bunch of stuff (Pokémons; forest stands) for which you have some options (moves/types/stats; management options), and some target objectives (be the very best, like no one ever was; more wood, less CO2 emissions). And your goal is to optimize the whole.
What else follows the same pattern?
| Domain | Problem | Groups (i) | Options per group (j) | Objectives (k) |
|---|---|---|---|---|
| Pokémon | Be the very best | Teams | Pokémons with moves | Offensive output, speed, type coverage |
| Land use & Environment | Forestry land allocation | Forest stands | Management scenarios | Yield, fertilizer cost, water use, soil carbon |
| Portfolio & Resource Allocation | Investment portfolio construction | Asset classes | Position sizes / fund choices | Return, volatility, ESG score, liquidity |
| Energy Systems | Microgrid configuration | Network nodes | Equipment configurations | Resilience, cost, peak shaving, renewable fraction |
| Transportation & Logistics | Fleet composition | Routes / segments | Vehicle types | Operating cost, emissions, coverage, comfort |
| Healthcare & Public Health | Hospital resource configuration | Departments | Staffing / equipment configs | Throughput, cost, quality, staff workload |
| Manufacturing & Product Design | Modular product configuration | Product subsystems | Component variants | Unit cost, performance, weight, manufacturability |
| Urban Planning & Infrastructure |
Zoning policy | City districts | Zoning options | Housing units, green space, traffic, tax revenue |
Indeed, this kind of optimization problem may arise in many places. But it is actually a very narrow slice of all optimization problems. Here are the crucial properties that it needs to have in order to qualify as a problem that takes the same shape as the one I have been working out.
If you are not afraid of math spells, here's the actual incantation (if you are or don't have the time, you may skip this section):
The optimization problem may formally be categorized as "multi-objective multiple-choice combinatorial optimization problem with additive separable objectives". We have:
This is a purely combinatorial product space:
with cardinality
which is large but finite.
The structure of the objective has the following properties:
In each group there are variables of interest, indexed by :
The individual objectives for each group
and scenario
are precomputed, which means we have the following vector for each
pair:
where
is the contribution of branch
of group
to objective
.
It is in this sense that the problem is a "table-lookup" problem,
because the
are precomputed.
The objective function depends only on the sum of each objective
independently. The total objective vector for a configuration
is
We are interested in finding the Pareto
front, so we are minimizing2 the following
functions:
I really wanted to make this a example about Pokémons, but I found out mid-writing that I had to make too many pirouettes to massage that problem into the specific problem that my library solves, to the point that it made the piece less understandable. Instead, I present you the problem that provided the original motivation.
You have stands, and each stand can be managed in different ways (where the number might differ from stand to stand). For instance, you could raise the water level of the nearby ditches, you could apply fertilizer, etc. Then, crucially, you also have a model that tells you the values of the variables of interest for each management scenario. Let's restrict ourselves to because screens have a hard 2 dimensional limit, and I do not want our brains to hurt too much with the plots I am going to show. So, for instance, we might be interested in total wood volume produced and the total C emitted by our management choices.
Then, we might end up with a table like this (keep it handy, we'll use it later on)
| stand number | management | wood volume (m3/ha) | CO2 emissions (kg/ha/year) |
|---|---|---|---|
| 1 | default | 424.2 | 256.3 |
| 1 | fertilizer A | 449.11 | 280.3 |
| 1 | fertilizer B | 450.12 | 293.4 |
| 2 | default | 482.12 | 154.2 |
| 2 | fertilizer A | 519.49 | 180.3 |
| 2 | fertilizer B | 534.56 | 170.5 |
| 3 | ... | ... | ... |
and similarly for all stands.
The question is: what management option should we choose for each stand such that the volume growth is maximized and the CO2 emissions are minimized?
Now that the problem is set up, we can explain how the algorithm works. I will do that by explaining every piece of the convoluted title of this post, bit by bit.
The title is:
Pareto front approximation via dynamic programming for multiple-choice combinatorial optimization problems.
I have already explained what I mean by "multiple-choice combinatorial optimization problems". Now let's understand the rest. Here are the steps:
But before that, let me tell you why the title of the post is not different.
This seems like an easy enough problem. A computer will surely be able to go through all options, right? That table is not that large, after all!
Let's put some quick numbers. Let's say we have 100 stands, with 3 management options each. That means we have different combinations. No computer on Earth can look at all of them and finish before tomorrow, which is usually as far as my patience usually goes.
So, if we cannot evaluate every possible option, then it looks like we are stuck with some heuristic optimization algorithm. And, at first glance, this problem looks like a textbook case for a genetic algorithm. You encode each stand-management choice as a chromosome, define a fitness function, and let evolution do its thing.
And in fact, that was my first implementation.
The only reason I was not convinced of the result was that the code I
had written with the pymoo library in Python didn't look
very elegant and performant to me. And so I began studying the problem a
bit closer.
It turns out that there's a ton of structure in the problem (independent groups, added to get the objective function, precomputed target vectors) that can be exploited. And, as I will show in the following sections, this allows to investigate the whole design space deterministically (yes!).
In comparison, heuristic evolutionary algorithms are solving a much more general class of problems than the one we actually have. They explore the design space stochastically, hoping to discover good regions through mutation and recombination. But our problem, as we have seen, has an enormous combinatorial space. Even if evolutionary algorithms are clever and can perform well in gigantic search spaces, they will only ever inspect a small fraction of them.
Moreover, evolutionary algorithms are usually designed to converge toward some good region of the Pareto front. Depending on initialization and random seeds, they may miss large regions entirely. Two runs of the same algorithm can return different approximations.
As promised, let's dive into the algorithm, one part of the title at a time.
Before discussing the algorithm itself, we need to clarify what we are actually optimizing.
Remember we wanted to simultaneously maximize volume growth while minimizing CO2 emissions. Take a look at the previous table again, and focus on the points for stand 1. Try to pick the best management choice for stand 1. You will run into a problem. If you use any of the fertilizers, you will get more wood, but you will also emit more carbon.
The problem boils down to this: how much extra CO2 are we willing to tolerate for one extra cubic meter of wood? And this is a philosophical problem! There is no mathematical answer to that question.
Different people in different situations might value wood volume vs CO2 emissions differently. So choosing weights beforehand is undesirable. The Pareto front solves exactly this problem. It shows what are the available optimal solutions, before making any decisions about how to put relative values to different objective variables.
We need some more concepts to nail this down. A solution is called Pareto-dominated if there exists another solution that is strictly better in at least one objective and no worse in all the others. And the Pareto front is simply the collection of all non-dominated solutions.
Take another look at the numbers for stand 2 in the table above.
For stand 1, every point is Pareto-optimal because improving wood production always increases emissions. You can imagine the Pareto front as a line connecting all three points.
However, stand 2 has a Pareto-dominated point: fertilizer A is dominated by fertilizer B. Adding fertilizer B to the second stand results in both higher wood yield and lower emissions than fertilizer A.
Let's now try to understand how to compute this Pareto front for this set of problems.
Now comes the fun part.
In our example of a Pareto-dominated point above, we saw that there's no point in ever choosing fertilizer A for the second stand, as we could always improve our solution by choosing to manage that stand with fertilizer B, regardless of the choices for the rest of the stands. So we should permanently discard fertilizer A for stand 2 before the algorithm even begins.
This is where we are using the structure of our problem. This "pruning" of options is only possible because the stands are independent and the objectives are additive. If one stand affected another, we could not safely remove choices this way.
The key observation to construct the algorithm is that we can construct the Pareto front incrementally, one stand at a time. And, at each step, we can prune out dominated options.
Let's give a name to the Pareto front obtained using only the first stands. We'll call it call .
We start with stand 1. Its management options define a small Pareto front of three non-dominated management options (see table above). This is .
Then we process stand 2. Take every point in , and combine it with every management option for stand 2. Then prune the dominated ones. The survivors form . Then repeat until you get to , the solution Pareto front.
The whole algorithm is basically:
for pareto_point in pareto_front:
for scenario in stand:
new_points.append(
pareto_point + scenario
)
pareto_front = pareto_epsilon_prune(new_points)
Isn't it elegant?
I am not exaggerating, this snippet is taken directly from the Rust code:
pub fn build_pareto_front(data_table: &DataTable, epsilon: f64) -> Vec<ParetoFrontSolution> {
let positive_data_table = data_table.shift_to_positive_values();
let mut pareto_front = get_first_stand_scenarios(&positive_data_table.data[0]);
// Remove stand 0 from loop: already considered in the initialization
for stand_ix in 1..positive_data_table.n_stands {
let mut pareto_front_new: Vec<Rc<PartialParetoPoint>> = vec![];
for pareto_point in &pareto_front {
for (scenario_ix, scenario_vector) in
positive_data_table.data[stand_ix].iter().enumerate()
{
let new_target_vector = add_vectors(&pareto_point.target_vector, scenario_vector);
pareto_front_new.push(Rc::new(PartialParetoPoint {
target_vector: new_target_vector.into(),
parent_point: Some(Rc::clone(pareto_point)),
current_scenario_choice: scenario_ix,
}));
}
}
pareto_front = pareto_epsilon_prune(pareto_front_new, epsilon);Ignoring Rust’s type and ownership boilerplate, you get the algorithm above.
So the crucial point is that the algorithm never needs to remember partial solutions that are Pareto-dominated. Once a partial configuration becomes dominated, we know that no future additions can rescue it.
I always find it helpful to put some example numbers in to verify my understanding of algorithms. Here are the first two steps using our table above.
We start computing :
| choice | wood | CO2 |
|---|---|---|
| D | 424.2 | 256.3 |
| A | 449.11 | 280.3 |
| B | 450.12 | 293.4 |
None are dominated; all survive.
Now we combine these with the points for stand 2 by computing all pairwise sums. Remember that we already pruned fertilizer A for stand 2 because it was Pareto-dominated by fertilizer B.
| stand 1 | stand 2 | wood | CO2 |
|---|---|---|---|
| D | D | 906.32 | 410.5 |
| D | B | 958.76 | 426.8 |
| A | D | 931.23 | 434.5 |
| A | B | 983.67 | 450.8 |
| B | D | 932.24 | 447.6 |
| B | B | 984.68 | 463.9 |
There are some dominated points that we can prune. For example, , which corresponds to the choice is Pareto-dominated by , which corresponds to the choice , because it has lower wood and higher emissions.
After pruning all dominated options we are left with:
| surviving point | wood | CO2 |
|---|---|---|
| D1 + D2 | 906.32 | 410.5 |
| D1 + B2 | 958.76 | 426.8 |
| A1 + B2 | 983.67 | 450.8 |
| B1 + B2 | 984.68 | 463.9 |
Even though there were possible combinations, only 4 survived the pruning step. Dynamic programming repeatedly applies this compression after every stand.
You now understand the dynamic
programming bit. It describes this fact that we recursively reuse
the Pareto-optimal partial solutions from the first
groups to construct the solutions for the first
groups. There's one last catch, though, and it is how
pareto_epsilon_prune() works.
Unfortunately, exact Pareto pruning does not cut it.
Even after aggressive pruning, the frontier may explode combinatorially. The fundamental reason is that floating-point objective vectors can differ by arbitrarily tiny amounts.
Imagine two solutions:
| solution | volume growth (m3/ha) | CO2 emissions (kg/ha/year) |
|---|---|---|
| solution 1 | 100.001 | 20.001 |
| solution 2 | 100.002 | 20.002 |
Technically, they do not dominate each other, and thus they should be part of the Pareto front. However, from a planning perspective, they are effectively indistinguishable.
If we have different combinations (as we saw above, this is what we get with 100 stands and 3 management options each), and we naively keep every microscopic variation, the Pareto front will not fit into our processor's memory.
This is where the "approximation" part enters the picture. Instead of requiring exact dominance, we deliberately introduce a small tolerance.
A point -dominates another point if: Note that in the case we recover the exact Pareto-dominance equation.
Geometrically, this means that each Pareto point occupies a little box around itself 34. Points falling into the same box are treated as essentially equivalent. So rather than representing objective space with infinite precision, we quantize it into coarse-grained regions.
And since objective space is quantized into sized regions, only one representative solution must be retained per region. This converts a continuum of nearly identical floating-point vectors into a finite approximation grid.
This has some very useful consequences.
So: yes, we are deciding not to evaluate every single point in search space, but we are not randomly skipping parts of it as heuristic algorithms would. Instead, we are explicitly deciding the resolution at which we care to distinguish solutions. The algorithm therefore behaves more like an ajustable microscope than than a series of coin tosses.
pareto_dp libraryYou may install this Python library as you would any other in PyPI. And you can poke around in the source code and criticize my Rust skills.
There are some technicalities here with the magnitudes of each variables. The most elegant solution is to use logarithms and make epsilon multiplicative. But I won't go into details here.↩︎
Internally, all objectives are be converted into minimization form without loss of generality.↩︎
There are some technicalities here with the magnitudes of each variables. The most elegant solution is to use logarithms and make epsilon multiplicative. But I won't go into details here.↩︎
You might still be a little confused. If exact Pareto already gets rid of points, why do we need this version? The reason is that in high dimensions the number of non-dominated points can still grow exponentially. Approximate pruning is therefore not just a shortcut to make optimization go faster; it is what keeps the algorithm practically feasible in higher dimensions.↩︎
In the worst case, the Pareto frontier can still grow exponentially with the number of objectives and groups. The algorithm is therefore not polynomial-time in the general case. Its practical usefulness comes from the fact that approximate pruning often keeps the frontier surprisingly small in real-world datasets.↩︎