iurzainki.com

TLDR: I wrote a small algorithm that efficiently approximates the Pareto front for a narrow type of multi-objective optimization problems. Despite it being narrow, it has many possible applications. There's nothing novel to the algorithm itself, so you won't get anything out of this post if you are knowledgeable about optimization algorithms. If not, there's a non-null probability you'll learn something new. The algorithm is written in Rust, but packaged as a Python library.

A Pokémon team has 6 Pokémons in it. Each Pokémon is of a certain type (water, fire, etc.), has a different set of moves, and stats. You want your team to have the best possible set of properties: have good attack, good defense, good type coverage, etc. But those properties are often at odds with each other. And, if you think about it, the best Pokémon team might not be the collection of the best individual Pokémons. It might pay off, for instance, to choose a mix of defensive and offensive Pokémons, rather than choosing to have every single one of them balanced.

This is what I have been doing at work lately. Well, not quite. Actually, I have been trying to choose the best possible management scenarios for different forest stands. But deep down they are the same thing. You have a whole (Pokémon team; country) made up of a bunch of stuff (Pokémons; forest stands) for which you have some options (moves/types/stats; management options), and some target objectives (be the very best, like no one ever was; more wood, less CO2 emissions). And your goal is to optimize the whole.

Indeed, this kind of optimization problem may arise in many places. But it is actually a very narrow slice of all optimization problems. Here are the crucial properties that it needs to have in order to qualify as a problem that takes the same shape as the one I have been working out.

The problem in mathspeak (optional)

If you are not afraid of math spells, here's the actual incantation (if you are or don't have the time, you may skip this section):

The optimization problem may formally be categorized as "multi-objective multiple-choice combinatorial optimization problem with additive separable objectives". We have:

Domain	Problem	Groups (i)	Options per group (j)	Objectives (k)
Pokémon	Be the very best	Teams	Pokémons with moves	Offensive output, speed, type coverage
Land use & Environment	Forestry land allocation	Forest stands	Management scenarios	Yield, fertilizer cost, water use, soil carbon
Portfolio & Resource Allocation	Investment portfolio construction	Asset classes	Position sizes / fund choices	Return, volatility, ESG score, liquidity
Energy Systems	Microgrid configuration	Network nodes	Equipment configurations	Resilience, cost, peak shaving, renewable fraction
Transportation & Logistics	Fleet composition	Routes / segments	Vehicle types	Operating cost, emissions, coverage, comfort
Healthcare & Public Health	Hospital resource configuration	Departments	Staffing / equipment configs	Throughput, cost, quality, staff workload
Manufacturing & Product Design	Modular product configuration	Product subsystems	Component variants	Unit cost, performance, weight, manufacturability
Urban Planning & Infrastructure	Zoning policy	City districts	Zoning options	Housing units, green space, traffic, tax revenue

This is a purely combinatorial product space:

\mathcal{X} = \prod_{i=1}^n {1,\dots,m_i},

with cardinality

|\mathcal{X}| = \prod_{i=1}^n m_i,

which is large but finite.

Example: a set of forest stands

I really wanted to make this a example about Pokémons, but I found out mid-writing that I had to make too many pirouettes to massage that problem into the specific problem that my library solves, to the point that it made the piece less understandable. Instead, I present you the problem that provided the original motivation.

You have

n

stands, and each stand can be managed in

m

different ways (where the number

m

might differ from stand to stand). For instance, you could raise the water level of the nearby ditches, you could apply fertilizer, etc. Then, crucially, you also have a model that tells you the values of the

v

variables of interest for each management scenario. Let's restrict ourselves to

v=2

because screens have a hard 2 dimensional limit, and I do not want our brains to hurt too much with the plots I am going to show. So, for instance, we might be interested in total wood volume produced and the total C emitted by our management choices.

Then, we might end up with a table like this (keep it handy, we'll use it later on)

The question is: what management option should we choose for each stand such that the volume growth is maximized and the CO2 emissions are minimized?

The algorithm

Now that the problem is set up, we can explain how the algorithm works. I will do that by explaining every piece of the convoluted title of this post, bit by bit.

I have already explained what I mean by "multiple-choice combinatorial optimization problems". Now let's understand the rest. Here are the steps:

Why not exhaustive search?

This seems like an easy enough problem. A computer will surely be able to go through all options, right? That table is not that large, after all!

stand number	management	wood volume (m3/ha)	CO2 emissions (kg/ha/year)
1	default	424.2	256.3
1	fertilizer A	449.11	280.3
1	fertilizer B	450.12	293.4
2	default	482.12	154.2
2	fertilizer A	519.49	180.3
2	fertilizer B	534.56	170.5
3	...	...	...

Let's put some quick numbers. Let's say we have 100 stands, with 3 management options each. That means we have

3^{100} = 5\times 10^{47}

different combinations. No computer on Earth can look at all of them and finish before tomorrow, which is usually as far as my patience usually goes.

Why not heuristic evolutionary algorithms?

So, if we cannot evaluate every possible option, then it looks like we are stuck with some heuristic optimization algorithm. And, at first glance, this problem looks like a textbook case for a genetic algorithm. You encode each stand-management choice as a chromosome, define a fitness function, and let evolution do its thing.

The only reason I was not convinced of the result was that the code I had written with the pymoo library in Python didn't look very elegant and performant to me. And so I began studying the problem a bit closer.

It turns out that there's a ton of structure in the problem (independent groups, added to get the objective function, precomputed target vectors) that can be exploited. And, as I will show in the following sections, this allows to investigate the whole design space deterministically (yes!).

In comparison, heuristic evolutionary algorithms are solving a much more general class of problems than the one we actually have. They explore the design space stochastically, hoping to discover good regions through mutation and recombination. But our problem, as we have seen, has an enormous combinatorial space. Even if evolutionary algorithms are clever and can perform well in gigantic search spaces, they will only ever inspect a small fraction of them.

Moreover, evolutionary algorithms are usually designed to converge toward some good region of the Pareto front. Depending on initialization and random seeds, they may miss large regions entirely. Two runs of the same algorithm can return different approximations.

Understanding the title: Pareto front

Before discussing the algorithm itself, we need to clarify what we are actually optimizing.

Remember we wanted to simultaneously maximize volume growth while minimizing CO2 emissions. Take a look at the previous table again, and focus on the points for stand 1. Try to pick the best management choice for stand 1. You will run into a problem. If you use any of the fertilizers, you will get more wood, but you will also emit more carbon.

The problem boils down to this: how much extra CO2 are we willing to tolerate for one extra cubic meter of wood? And this is a philosophical problem! There is no mathematical answer to that question.

Different people in different situations might value wood volume vs CO2 emissions differently. So choosing weights beforehand is undesirable. The Pareto front solves exactly this problem. It shows what are the available optimal solutions, before making any decisions about how to put relative values to different objective variables.

We need some more concepts to nail this down. A solution is called Pareto-dominated if there exists another solution that is strictly better in at least one objective and no worse in all the others. And the Pareto front is simply the collection of all non-dominated solutions.

For stand 1, every point is Pareto-optimal because improving wood production always increases emissions. You can imagine the Pareto front as a line connecting all three points.

However, stand 2 has a Pareto-dominated point: fertilizer A is dominated by fertilizer B. Adding fertilizer B to the second stand results in both higher wood yield and lower emissions than fertilizer A.

Let's now try to understand how to compute this Pareto front for this set of problems.

Understanding the title: Dynamic programming

In our example of a Pareto-dominated point above, we saw that there's no point in ever choosing fertilizer A for the second stand, as we could always improve our solution by choosing to manage that stand with fertilizer B, regardless of the choices for the rest of the stands. So we should permanently discard fertilizer A for stand 2 before the algorithm even begins.

This is where we are using the structure of our problem. This "pruning" of options is only possible because the stands are independent and the objectives are additive. If one stand affected another, we could not safely remove choices this way.

The key observation to construct the algorithm is that we can construct the Pareto front incrementally, one stand at a time. And, at each step, we can prune out dominated options.

Let's give a name to the Pareto front obtained using only the first

i

stands. We'll call it call

P_i

We start with stand 1. Its management options define a small Pareto front of three non-dominated management options (see table above). This is

P_1

Then we process stand 2. Take every point in

P_1

, and combine it with every management option for stand 2. Then prune the dominated ones. The survivors form

P_2

. Then repeat until you get to

P_n

, the solution Pareto front.

pub fn build_pareto_front(data_table: &DataTable, epsilon: f64) -> Vec<ParetoFrontSolution> {
    let positive_data_table = data_table.shift_to_positive_values();

    let mut pareto_front = get_first_stand_scenarios(&positive_data_table.data[0]);

    // Remove stand 0 from loop: already considered in the initialization
    for stand_ix in 1..positive_data_table.n_stands {
        let mut pareto_front_new: Vec<Rc<PartialParetoPoint>> = vec![];
        for pareto_point in &pareto_front {
            for (scenario_ix, scenario_vector) in
                positive_data_table.data[stand_ix].iter().enumerate()
            {
                let new_target_vector = add_vectors(&pareto_point.target_vector, scenario_vector);

                pareto_front_new.push(Rc::new(PartialParetoPoint {
                    target_vector: new_target_vector.into(),
                    parent_point: Some(Rc::clone(pareto_point)),
                    current_scenario_choice: scenario_ix,
                }));
            }
        }
        pareto_front = pareto_epsilon_prune(pareto_front_new, epsilon);

So the crucial point is that the algorithm never needs to remember partial solutions that are Pareto-dominated. Once a partial configuration becomes dominated, we know that no future additions can rescue it.

I always find it helpful to put some example numbers in to verify my understanding of algorithms. Here are the first two steps using our table above.

Now we combine these with the points for stand 2 by computing all pairwise sums. Remember that we already pruned fertilizer A for stand 2 because it was Pareto-dominated by fertilizer B.

choice	wood	CO2
D	424.2	256.3
A	449.11	280.3
B	450.12	293.4

stand 1	stand 2	wood	CO2
D	D	906.32	410.5
D	B	958.76	426.8
A	D	931.23	434.5
A	B	983.67	450.8
B	D	932.24	447.6
B	B	984.68	463.9

There are some dominated points that we can prune. For example,

(931.23, 434.5)

, which corresponds to the choice

(A,D)

is Pareto-dominated by

(958.76, 426.8)

, which corresponds to the choice

(D,B)

, because it has lower wood and higher emissions.

surviving point	wood	CO2
D1 + D2	906.32	410.5
D1 + B2	958.76	426.8
A1 + B2	983.67	450.8
B1 + B2	984.68	463.9

Even though there were

3\times 3=9

possible combinations, only 4 survived the pruning step. Dynamic programming repeatedly applies this compression after every stand.

You now understand the dynamic programming bit. It describes this fact that we recursively reuse the Pareto-optimal partial solutions from the first

i

groups to construct the solutions for the first

i+1

groups. There's one last catch, though, and it is how pareto_epsilon_prune() works.

Understanding the title: approximation

Even after aggressive pruning, the frontier may explode combinatorially. The fundamental reason is that floating-point objective vectors can differ by arbitrarily tiny amounts.

Technically, they do not dominate each other, and thus they should be part of the Pareto front. However, from a planning perspective, they are effectively indistinguishable.

solution	volume growth (m3/ha)	CO2 emissions (kg/ha/year)
solution 1	100.001	20.001
solution 2	100.002	20.002

If we have

5\times 10^{47}

different combinations (as we saw above, this is what we get with 100 stands and 3 management options each), and we naively keep every microscopic variation, the Pareto front will not fit into our processor's memory.

This is where the "approximation" part enters the picture. Instead of requiring exact dominance, we deliberately introduce a small tolerance.

A point

q

\epsilon

-dominates another point

p

if:

q_k \le (1+\epsilon) p_k \quad \forall k

Note that in the case

\epsilon =0

we recover the exact Pareto-dominance equation.

Geometrically, this means that each Pareto point occupies a little box around itself ³ ⁴. Points falling into the same box are treated as essentially equivalent. So rather than representing objective space with infinite precision, we quantize it into coarse-grained regions.

And since objective space is quantized into

\epsilon

sized regions, only one representative solution must be retained per region. This converts a continuum of nearly identical floating-point vectors into a finite approximation grid.

So: yes, we are deciding not to evaluate every single point in search space, but we are not randomly skipping parts of it as heuristic algorithms would. Instead, we are explicitly deciding the resolution at which we care to distinguish solutions. The algorithm therefore behaves more like an ajustable microscope than than a series of coin tosses.

The pareto_dp library

You may install this Python library as you would any other in PyPI. And you can poke around in the source code and criticize my Rust skills.

There are some technicalities here with the magnitudes of each variables. The most elegant solution is to use logarithms and make epsilon multiplicative. But I won't go into details here.↩︎
Internally, all objectives are be converted into minimization form without loss of generality.↩︎
There are some technicalities here with the magnitudes of each variables. The most elegant solution is to use logarithms and make epsilon multiplicative. But I won't go into details here.↩︎
You might still be a little confused. If exact Pareto already gets rid of points, why do we need this $\epsilon$ version? The reason is that in high dimensions the number of non-dominated points can still grow exponentially. Approximate pruning is therefore not just a shortcut to make optimization go faster; it is what keeps the algorithm practically feasible in higher dimensions.↩︎
In the worst case, the Pareto frontier can still grow exponentially with the number of objectives and groups. The algorithm is therefore not polynomial-time in the general case. Its practical usefulness comes from the fact that approximate pruning often keeps the frontier surprisingly small in real-world datasets.↩︎

Pareto front approximation via dynamic programming for multiple-choice combinatorial optimization problems. And Pokémons, and forests

The problem in mathspeak (optional)

Example: a set of forest stands

The algorithm

Why not exhaustive search?

Why not heuristic evolutionary algorithms?

Understanding the title: Pareto front

Understanding the title: Dynamic programming

Understanding the title: approximation

The `pareto_dp` library

The problem in mathspeak (optional)

Example: a set of forest stands

The algorithm

Why not exhaustive search?

Why not heuristic evolutionary algorithms?

Understanding the title: Pareto front

Understanding the title: Dynamic programming

Understanding the title: approximation

The pareto_dp library

The `pareto_dp` library