GA: Waterborne Urban Logistics in Amsterdam#
Note:
Because of time constraints during the Friday session, we continue working with the same problem size as before (300 demand zones and 36 TPs). For this scale, the performance benefits of a GA may be limited. Nonetheless, metaheuristics become especially valuable for larger instances, where exact methods struggle (e.g., For 1500 HoReCa entities, GA obtained near-optimal solutions in less than 10 minutes, while MILP was not able to reach the optimal solution within an hour.)
Genetic Algorithm#
As we discussed, it is challenging to use MILP for large-scale location and flow assignment problems. Therefore, in this assignment, we’re going to use a genetic algorithm to address this problem.
Genetic Algorithms (GAs) are powerful optimization techniques inspired by the process of natural evolution. They have gained prominence in solving complex problems across various fields, ranging from engineering and economics to artificial intelligence. Here, we give a brief overview of Genetic Algorithms, highlighting their fundamental principles, components, and applications in solving optimization problems. At the heart of a Genetic Algorithm are populations of potential solutions, represented as individuals or chromosomes. These individuals evolve over generations to improve their fitness with respect to the given optimization objective. Basic Components of a Genetic Algorithm:
Population: A collection of individuals representing potential solutions to the problem.
Objective Function (or fitness function): Quantifies the quality of each individual’s solution with respect to the optimization objective.
Selection: Individuals are chosen based on their fitness to serve as parents for the next generation.
Crossover: Genetic material from parents is combined to create offspring with potentially improved traits.
Mutation: Random changes are introduced in offspring to maintain diversity and explore new solution spaces.
Replacement: New offspring replace some of the least fit individuals in the population.
Termination Criteria: Conditions under which the algorithm stops, e.g., a maximum number of generations or satisfactory fitness level.
Adaptation for GA#
In order to make the GA implementation efficient and manageable, we decompose the problem into two linked sub-problems:
Network design sub-problem (handled by GA): The GA decides which transfer points (TPs) are opened.
Flow assignment sub-problem (solved for each GA solution):Given the selected TPs, HoReCa demand is assigned optimally using a simpler linear optimization model.
This decomposition significantly reduces the size of the search space and ensures stable evaluation of candidate solutions.
Network Design sub-problem (Handled by GA)#
The Network Design sub-problem determines the set of TPs to be opened. The GA attempts to minimise the total system cost:
In the objective function, only the first term depends directly on the chromosome. The remaining two terms depend on the flows \(y_{ij}(x)\), which are unknown until the flow-assignment sub-problem is solved.
Each GA individual consists of a binary vector: $\(x=(x_1,x_2,...,x_{|J|}), \qquad x_j \in \{0,1\}\)$
where \(x_j=1\) indicates that TP \(j\) is opened. The GA explores different combinations of these TP-opening decisions using crossover and mutation.
Flow Assignment sub-problem (MILP)#
For each candidate TP configuration \(x\) produced by the GA, we solve the following linear optimisation problem to compute the optimal flows:
This model determines how HoReCa demand is distributed across open TPs such that transport cost and emissions are minimized, while respecting TP capacities and service distance limits.
Feasibility#
Since the GA generates terminal-opening patterns freely, some candidate solutions will violate constraints. We distinguish between two types of infeasibility and treat them differently.
Flow infeasibility#
Flow infeasibility occurs when no valid cargo assignment exists for a given TP configuration. In these cases, the flow-assignment sub-problem has no solution and the objective function cannot be evaluated. These situations must therefore be corrected before evaluation. Two common causes are:
Insufficient total capacity: If the total capacity of selected TPs is less than total demand, there is no feasible way to serve all demand.
If a demand zone \(i\) has no open terminal within the maximum service distance, then its demand cannot be assigned.
In such cases, a repair operator is applied before evaluation. This operator modifies the solution by activating additional TPs until sufficient capacity is available, and every demand zone has at least one reachable open terminal.
The choice of which TPs to activate can follow a simple rule, such as selecting the nearest or lowest-cost feasible terminal. After repair, the flow-assignment problem becomes solvable and the full objective function can be evaluated.
Policy infeasibility#
Policy constraints reflect strategic or institutional limits rather than physical impossibility, including budget and maximum number of established TPs. If these constraints are violated, a feasible flow assignment can still be computed: all demand can still be routed and flow balance can be satisfied. Therefore, these solutions remain technically evaluable. Instead of repairing such solutions, we apply a penalty approach: $\(Z_{final}(x)= Z(x)+P. v(x)\)\( where \)v(x)$ represents the magnitude of violation and P is the violation penalty factor.
GA steps in summary#

Data#
Task 1:
Run the cells below to:
- Import the required Python packages
- Load the required data
- Transform the data into the inputs required for the GA
Note: You are not expected to fully understand all details of the data processing code at this stage.
These cells make use of Pandas and DataFrames, which will be explained in more depth later in the course.
For now, focus on how the processed data becomes structured inputs for the GA implementation (similar to MILP version)
# Task 1
# Step 1: Import required packages
import math
import pandas as pd
import os
from urllib.request import urlretrieve
import numpy as np
import time
# Task 1
# Step 2: Define hub location and distance function
hub_lat = 52.405141
hub_lon = 4.872129
def haversine_km(lon1, lat1, lon2, lat2):
"""
Compute great-circle distance between two points on Earth (km).
Inputs and outputs are in decimal degrees and kilometers.
"""
# convert degrees to radians
lon1_rad, lat1_rad = math.radians(lon1), math.radians(lat1)
lon2_rad, lat2_rad = math.radians(lon2), math.radians(lat2)
dlon = lon2_rad - lon1_rad
dlat = lat2_rad - lat1_rad
a = math.sin(dlat/2.0)**2 + math.cos(lat1_rad)*math.cos(lat2_rad)*math.sin(dlon/2.0)**2
c = 2 * math.asin(math.sqrt(a))
R = 6371.0 # Earth radius in km
return R * c
# Task 1
# Step 3: Load and define required data
# Global transport cost parameters (€/unit/km)
c_road = 0.03 # LEV last-mile
c_water = 0.01 # vessel hub -> TP
# Global emission parameters (kg CO2 / unit / km)
e_road = 0.045
e_water = 0.015
# Load data
def findfile(fname):
if not os.path.isfile(fname):
print(f"Downloading {fname}...")
urlretrieve('http://files.mude.citg.tudelft.nl/'+fname, fname)
findfile("GA_2_5_data.xlsx")
data_file = "GA_2_5_data.xlsx"
df_lg_dem = pd.read_excel(data_file, sheet_name="Large_Demand")
df_lg_tp = pd.read_excel(data_file, sheet_name="Large_TP")
df_lg_dem = df_lg_dem.rename(columns={
"number": "zone_id",
"latitude": "lat",
"longitude": "lon",
"demand": "demand"
})
df_lg_tp = df_lg_tp.rename(columns={
"number": "tp_id",
"latitude": "lat",
"longitude": "lon",
"capacity": "capacity",
"establishment_cost": "fixed_cost"
})
I_large = df_lg_dem["zone_id"].tolist()
J_large = df_lg_tp["tp_id"].tolist()
d_large = dict(zip(df_lg_dem["zone_id"], df_lg_dem["demand"]))
cap_large = dict(zip(df_lg_tp["tp_id"], df_lg_tp["capacity"]))
f_large = dict(zip(df_lg_tp["tp_id"], df_lg_tp["fixed_cost"]))
# Water distances hub -> TP
t0j_water_large = {}
for _, row in df_lg_tp.iterrows():
j = row["tp_id"]
dist_km = haversine_km(hub_lon, hub_lat, row["lon"], row["lat"])
t0j_water_large[j] = dist_km
# Road distances TP -> demand zone
tij_road_large = {}
for _, dz in df_lg_dem.iterrows():
i = dz["zone_id"]
for _, tp in df_lg_tp.iterrows():
j = tp["tp_id"]
dist_km = haversine_km(tp["lon"], tp["lat"], dz["lon"], dz["lat"])
tij_road_large[(i, j)] = dist_km
D_max_large = 2
CT_large = {}
Em_large = {}
delta_large = {}
for i in I_large:
for j in J_large:
t_water = t0j_water_large[j]
t_road = tij_road_large[(i, j)]
CT_large[(i, j)] = c_water * t_water + c_road * t_road
Em_large[(i, j)] = e_water * t_water + e_road * t_road
delta_large[(i, j)] = 1 if t_road <= D_max_large else 0
M_large = sum(d_large.values())
B_large = 2000
K_large = 20
lambda_large = 1
Run the following cell, which builds a reusable assignment model that will be incorporated inside GA to solve the flow-assignment sub-problem.
# Task 2: Build a reusable LP assignment model
from pyomo.environ import (
ConcreteModel, Set, Param, Var, Objective, Constraint,
NonNegativeReals, minimize, SolverFactory, value
)
from pyomo.opt import TerminationCondition
from pymoo.core.problem import ElementwiseProblem
from pymoo.algorithms.soo.nonconvex.ga import GA
from pymoo.optimize import minimize as ga_minimize
from pymoo.termination import get_termination
from pymoo.operators.crossover.pntx import TwoPointCrossover
from pymoo.operators.mutation.bitflip import BitflipMutation
from pymoo.operators.sampling.rnd import BinaryRandomSampling
# Ordered index lists for the large instance
I_large_list = list(I_large)
J_large_list = list(J_large)
total_demand_large = sum(d_large.values())
# Use the same λ and planning parameters as in the large MILP
lambda_ga = lambda_large # emission penalty factor
penalty_factor = 1e4 # weight for budget / K violations
# Reuse one HiGHS solver for all LP subproblems
lp_solver = SolverFactory("highs")
def build_lp_model(lambda_value):
"""
Build the LP assignment model once, with a mutable parameter x_param[j]
that will be updated for each GA solution.
"""
m = ConcreteModel()
# Sets
m.I = Set(initialize=I_large_list)
m.J = Set(initialize=J_large_list)
# Parameters (using data from the large instance)
m.d = Param(m.I, initialize=d_large)
m.cap = Param(m.J, initialize=cap_large)
m.CT = Param(m.I, m.J, initialize=CT_large)
m.Em = Param(m.I, m.J, initialize=Em_large)
m.delta = Param(m.I, m.J, initialize=delta_large)
m.M = M_large
# Emission penalty
m.Lambda = Param(initialize=lambda_value)
# x_param[j] is mutable → GA changes this per solution
m.x_param = Param(m.J, initialize={j: 0 for j in J_large_list}, mutable=True)
# Decision variables: flows y[i,j]
m.y = Var(m.I, m.J, domain=NonNegativeReals)
# Objective: transport cost + λ * emissions
def obj_rule(m):
return (
sum(m.CT[i, j] * m.y[i, j] for i in m.I for j in m.J) +
m.Lambda * sum(m.Em[i, j] * m.y[i, j] for i in m.I for j in m.J)
)
m.obj = Objective(rule=obj_rule, sense=minimize)
# (1) Demand fulfillment
def demand_rule(m, i):
return sum(m.y[i, j] for j in m.J) == m.d[i]
m.demand = Constraint(m.I, rule=demand_rule)
# (2) TP capacity with x_param
def capacity_rule(m, j):
return sum(m.y[i, j] for i in m.I) <= m.cap[j] * m.x_param[j]
m.capacity = Constraint(m.J, rule=capacity_rule)
# (3) Service feasibility: y[i,j] <= M * delta[i,j] * x_param[j]
def service_feas_rule(m, i, j):
return m.y[i, j] <= m.M * m.delta[i, j] * m.x_param[j]
m.service_feas = Constraint(m.I, m.J, rule=service_feas_rule)
return m
# Build one LP model and reuse it in the GA
lp_model = build_lp_model(lambda_ga)
print("LP assignment model built.")
GA building blocks#
To use a GA, we need a way to evaluate each candidate solution. In this problem, a GA solution is a binary vector saying which TPs are opened.
To evaluate such a solution, we:
Repair it to make sure flows are possible (at least some TPs open with enough capacity, where each zone can reach a TP).
Solve a linear assignment problem (LP) to optimally assign flows from open TPs to demand zones.
Compute a fitness value = fixed cost + transport + emission penalty + (penalties for violating budget / max-TP constraints).
Run the following three cells, to build the repair operator, solve the LP problem, and compute the fitness value.
# Task 3
# Step 0: define a global cache so repeated chromosomes don’t re-solve the LP
# Global cache: maps repaired chromosome (tuple of 0/1) to evaluation result
eval_cache = {} # key: tuple(x_rep) → (fitness, fixed_cost, transport_cost, emissions, penalty)
# Task 3
# Step 1: define the repair operator
def repair_chromosome(x):
"""
Repair a 0/1 TP-opening vector x to avoid flow infeasibility:
- ensure at least one TP is open;
- ensure every demand zone has at least one reachable open TP;
- ensure total capacity ≥ total demand.
Input:
x : numpy array of 0/1 of length |J_large_list|
Output:
x_rep : repaired numpy array of 0/1
"""
x_rep = x.copy()
# 1) If no TP is open, open the cheapest one
if x_rep.sum() == 0:
costs = np.array([f_large[j] for j in J_large_list])
j_idx = np.argmin(costs)
x_rep[j_idx] = 1
# 2) Every demand zone must have at least one reachable open TP
for i in I_large_list:
has_feasible_open_tp = any(
x_rep[j_idx] == 1 and delta_large[(i, J_large_list[j_idx])] == 1
for j_idx in range(len(J_large_list))
)
if not has_feasible_open_tp:
# Choose the cheapest feasible TP (by fixed cost) and open it
feasible_js = [
(j_idx, J_large_list[j_idx])
for j_idx in range(len(J_large_list))
if delta_large[(i, J_large_list[j_idx])] == 1
]
if feasible_js:
j_best_idx = min(feasible_js, key=lambda t: f_large[t[1]])[0]
x_rep[j_best_idx] = 1
# 3) Total capacity of open TPs must be at least total demand
def total_capacity(x_vec):
return sum(
cap_large[J_large_list[j_idx]] * x_vec[j_idx]
for j_idx in range(len(J_large_list))
)
while total_capacity(x_rep) < total_demand_large:
# candidates: currently closed TPs
closed = [
(j_idx, J_large_list[j_idx])
for j_idx in range(len(J_large_list))
if x_rep[j_idx] == 0
]
if not closed:
# all TPs already open
break
# open TP with best capacity per unit fixed cost
j_best_idx, j_best = max(
closed,
key=lambda t: cap_large[t[1]] / (f_large[t[1]] + 1e-6)
)
x_rep[j_best_idx] = 1
return x_rep
print("Repair operator defined.")
# Task 3
# Step 2: solve LP assignment for a fixed TP pattern
def solve_flow_assignment_lp(x_rep):
"""
Solve the LP assignment subproblem for a fixed TP vector x_rep (0/1).
Inputs:
x_rep : repaired numpy array of 0/1 (which TPs are open)
Uses:
- the global LP model 'lp_model' built earlier
- updates lp_model.x_param[j] to match x_rep
Returns:
transport_cost : total transport cost
emissions : total emissions (kg CO2)
lp_obj_value : LP objective value (transport + λ * emissions)
feasible_lp : True/False (was the LP solved optimally?)
"""
# Update mutable params x_param[j] in the existing LP model
for j_idx, j in enumerate(J_large_list):
lp_model.x_param[j] = int(x_rep[j_idx])
res = lp_solver.solve(lp_model, tee=False)
if res.solver.termination_condition != TerminationCondition.optimal:
return None, None, None, False
# Compute transport cost and emissions based on the LP solution
transport_cost = sum(
value(lp_model.CT[i, j]) * value(lp_model.y[i, j])
for i in lp_model.I for j in lp_model.J
)
emissions = sum(
value(lp_model.Em[i, j]) * value(lp_model.y[i, j])
for i in lp_model.I for j in lp_model.J
)
# LP objective = transport cost + λ * emissions
lp_obj_value = value(lp_model.obj)
return transport_cost, emissions, lp_obj_value, True
print("LP assignment solver defined.")
# Task 3
# Step 3: combine everything into a fitness evaluation
def evaluate_chromosome(x, penalty_factor=1e4):
"""
Evaluate a GA solution x (0/1 TP-opening vector) by:
1. Cleaning and repairing x
2. Checking the cache
3. Solving the LP assignment (if not cached)
4. Adding fixed cost and penalties
Returns:
fitness, fixed_cost, transport_cost, emissions, penalty, x_rep
"""
# Make sure x is clean 0/1
x_bin = (np.round(x)).astype(int)
x_bin = np.clip(x_bin, 0, 1)
# 1. Repair to ensure flows are possible
x_rep = repair_chromosome(x_bin)
# --- CACHING: check if we have seen this repaired pattern before ---
key = tuple(x_rep.tolist())
if key in eval_cache:
fitness, fixed_cost, transport_cost, emissions, penalty = eval_cache[key]
return fitness, fixed_cost, transport_cost, emissions, penalty, x_rep
# 2. Solve LP assignment given x_rep
transport_cost, emissions, lp_obj, feasible_lp = solve_flow_assignment_lp(x_rep)
if not feasible_lp:
fitness = 1e9
fixed_cost = 0.0
penalty = penalty_factor * 1e5
# store in cache as well, so we don’t try again
eval_cache[key] = (fitness, fixed_cost, transport_cost, emissions, penalty)
return fitness, fixed_cost, transport_cost, emissions, penalty, x_rep
# 3. Fixed establishment cost
fixed_cost = sum(
f_large[J_large_list[j_idx]] * x_rep[j_idx]
for j_idx in range(len(J_large_list))
)
# LP objective already includes transport cost + λ * emissions
base_objective = fixed_cost + lp_obj
# 4. Policy constraint violations: budget & max number of TPs
total_fixed_cost = fixed_cost
num_open_tps = int(x_rep.sum())
v_B = max(0.0, total_fixed_cost - B_large) # € above budget
v_K = max(0.0, num_open_tps - K_large) # TPs beyond K_max
v_total = v_B + v_K
penalty = penalty_factor * v_total
fitness = base_objective + penalty
# store in cache
eval_cache[key] = (fitness, fixed_cost, transport_cost, emissions, penalty)
return fitness, fixed_cost, transport_cost, emissions, penalty, x_rep
print("Full GA evaluation function defined.")
Run the GA#
Now we connect our evaluation function to PyMOO:
Each GA solution is a binary vector saying which TPs are opened.
PyMOO generates and evolves these vectors.
For each solution, the function
evaluate_chromosome(...):Repairs the TP pattern,
Solves the LP assignment problem,
Computes the total objective (fixed cost + transport + emission penalty + policy penalties).
Run the GA (following cell) once with a baseline configuration and compare the result with the MILP solution for the large instance.
From the MILP part you should have:
MILP objective value for the large instance,
MILP runtime,
Set of open TPs in the MILP solution.
Compare these with the GA results above:
Is the GA objective value close to the MILP optimum?
Is the GA faster or slower than the MILP?
Does the GA open a similar set of TPs?
Note: The code below will take a few minutes to run completely. The results for each new generation is printed as the code runs. You can start with observing how the values change as the code runs.
# Task 4: run GA
class AmsterdamTPProblem(ElementwiseProblem):
"""
PyMOO problem for the large-instance Amsterdam HoReCa TP design.
Decision variables: x_j ∈ {0,1} for each TP j in J_large_list.
"""
def __init__(self, penalty_factor=1e4, **kwargs):
self.penalty_factor = penalty_factor
n_var = len(J_large_list)
super().__init__(
n_var=n_var,
n_obj=1,
n_constr=0, # constraints handled via penalties
xl=0,
xu=1,
type_var=int,
**kwargs
)
def _evaluate(self, x, out, *args, **kwargs):
# Use our evaluation pipeline
fitness, fixed_cost, transport_cost, emissions, penalty, x_rep = evaluate_chromosome(
x,
penalty_factor=self.penalty_factor
)
out["F"] = fitness
out["fixed_cost"] = fixed_cost
out["transport_cost"] = transport_cost
out["emissions"] = emissions
out["penalty"] = penalty
out["x_rep"] = x_rep
# Baseline GA settings
pop_size = 20
n_generations = 15
eval_cache.clear()
problem = AmsterdamTPProblem(penalty_factor=penalty_factor)
# We configure the GA to work with *binary* chromosomes:
# - BinaryRandomSampling(): creates an initial population of 0/1 vectors
# - TwoPointCrossover(): classic binary crossover (parents exchange segments)
# - BitflipMutation(): flips bits with some probability
#
# You are free to try *other* binary operators if you want to explore different GA behaviors. Examples you can plug in:
#
# from pymoo.factory import get_crossover, get_mutation
#
# # Uniform crossover (each bit copied from a random parent)
# crossover = get_crossover("bin_ux")
#
# # Half-uniform crossover (HUX): flips half of differing bits
# crossover = get_crossover("bin_hux")
#
# # Single-point or multi-point crossover:
# from pymoo.operators.crossover.pntx import SinglePointCrossover
# crossover = SinglePointCrossover()
#
# # Alternative mutation operators:
# mutation = get_mutation("bin_bitflip") # same as BitflipMutation()
#
# Feel free to swap these into the GA(...) call: crossover=<your choice>, mutation=<your choice>
algorithm = GA(
pop_size=pop_size,
sampling=BinaryRandomSampling(), # initial population: random 0/1
crossover=TwoPointCrossover(), # classic binary/point crossover
mutation=BitflipMutation(), # flips bits with some prob
eliminate_duplicates=True
)
termination = get_termination("n_gen", n_generations)
start_ga = time.time()
res = ga_minimize(
problem,
algorithm,
termination=termination,
seed=1,
save_history=True,
verbose=True
)
ga_time = time.time() - start_ga
print(f"\nGA completed in {ga_time:.2f} seconds")
print("Best fitness value found:", res.F[0])
# Decode best solution
best_x = (np.round(res.X)).astype(int)
fitness, fixed_cost, transport_cost, emissions, penalty, x_rep = evaluate_chromosome(best_x)
open_tps_ga = [J_large_list[j_idx] for j_idx in range(len(J_large_list)) if x_rep[j_idx] == 1]
print("\n=== Best GA solution (large instance) ===")
print("Open TPs:", open_tps_ga)
print(f"Number of open TPs: {len(open_tps_ga)}")
print(f"Fixed cost = {fixed_cost:.2f}")
print(f"Transport cost = {transport_cost:.2f}")
print(f"Emissions = {emissions:.2f} kg CO2")
print(f"Emission penalty (λ·Em)= {lambda_ga * emissions:.2f}")
print(f"Policy penalty = {penalty:.2f}")
print(f"Total GA objective = {fitness:.2f}")
print(f"GA runtime = {ga_time:.2f} seconds")
Effect of GA settings (population size)#
The performance of a Genetic Algorithm depends on its hyperparameters, including:
pop_size: how many solutions are kept in each generation,
n_generations: how many generations the GA evolves.
Larger values usually:
improve solution quality (more search),
increase runtime.
In the above cell, change
pop_size and rerun the GA.
For each configuration, note:
- The runtime
- The total GA objective value
pop_size = 50,n_generations = 15pop_size = 10,n_generations = 15
How do the results change when you change the population size?
By Nadia Pourmohammadzia and Gonçalo Homen de Almeida Correia, Delft University of Technology. CC BY 4.0, more info on the Credits page of Workbook.