Route Optimisation · Field Data · Operational Research

Postman by day.
Data Scientist
by necessity.

1,780 letterboxes. 300 kg of mail. 17 km on a bike. One person.

When only 74% of the required time is available to do a job officially designed for three, you have two choices: panic, or open a Jupyter notebook. I chose the notebook. The mail got delivered. The data got analysed. Both arrived on time.

Discuss a project Return to portfolio
Daily time breakdown — 645.6 min / day
1,780
Letterboxes / day
920 non-standard
300 kg
Mail per day
letters, parcels, ads, papers
10.7h
Actual daily work
vs 8h planned
74%
Time available
of what was actually needed
98%
Efficiency achieved
after optimisation
The data

Every Minute,
Accounted For.

The first step was measurement. Every task timed, every stop counted, every door opening logged. The result was the first properly structured operational dataset this district had ever seen — built while doing the job it was documenting.

Task Min/Day Detail
Tri/chargement matin1202 hours — fixed daily
Lettres recommandées (118/day)9950 sec each
Boîtes non standard (920)77~5 sec each
Ouverture de portes (292)58.4~12s badge/key
Déplacements (292 buildings)51.2~12s per building
Publicités (280/day)47~2s each
Recherche colis sur vélo (292x)4910s each
Boîtes standard (860)43~3s each
Colis suivis (105/day)3520s scan + deliver
Journaux/magazines (142)24Priority items
Pauses (water, toilets)14Basic human needs
Rechargements (7x/day)10.54 street relays
Trajet vélo + recovery17.5Commute time
Total Actual645.6= 10.76 hours
Planned contract480= 8 hours
Daily deficit165.6= 2.7 hours over

Note: 2.7 extra hours/day is not an anomaly. It is the structural gap between how many postal workers were needed and how many were actually provided.

Minutes per task — ranked
Time deficit — daily
Planned work time 480 min (8h)
Actual time required 645.6 min (10.76h)
Daily deficit (constant) 165.6 min (2.7h)

Conclusion: Only 74% of the required time was available. This is not a performance problem — it is an allocation problem. Data science was used to bridge the remaining 26% through optimisation.

Field insights

What the Data Revealed
That Nobody Else Bothered to Measure.

These are not estimates. These are measured, recorded, and statistically verified field observations from a researcher who was also carrying the mail bags.

Time wasters — top impacts
!

Non-standard letterboxes waste 22 min/day

920 out of 1,780 letterboxes (52%) are non-standard format. Each costs ~5 seconds extra. Over a full day: 22 irretrievable minutes, every day, forever. No one had quantified this before.

!

Community or business absences cost 50 minutes

When residents in certain buildings consistently do not sign for tracked mail, the failed-delivery loop (reattempt + paperwork + supervisor conversation) costs up to 50 minutes — plus one grumpy supervisor, free of charge.

!

The bike is a gym membership and a logistics hub

7 recharges per day from 4 street relay points. 1,500 kg of mail per week transported. 17 km covered daily. No gym subscription required. Lower back subscription, unfortunately, non-optional.

!

Bike search time: 49 min/day of entirely preventable chaos

Without a sorted loading system, 292 parcel searches per day averaged 10 seconds each. Predictive loading by delivery frequency cut this by 25% — saving 12 minutes with zero additional equipment.

Research questions

The Questions a Postman
Turned Data Scientist Asked.

A postal worker is a human logistics robot — required to smile, memorise 5,000 names, and pedal in the rain with 20 kg of advertising flyers. Why not make it a serious data project?

How do you optimise a rigid logistics system where the route cannot be changed?
Can standard deliveries be batched every two days without degrading service quality?
What is the real productivity impact of non-conforming letterboxes — and can it be quantified?
Which buildings cost the most time per delivery (height, access method, signature requirement)?
How do you guarantee daily delivery of tracked mail and newspapers while grouping regular letters?
Where the 2.7h daily overrun comes from
Optimisation results

Data Science Applied
to the Actual Problem.

Three data science projects, each targeting a different source of time loss. Together, they brought a structurally impossible job within the range of the merely very difficult.

01

Time Simulation Model (Python + Pandas)

Built a discrete event simulation to model task durations, validate field measurements against actual outcomes, and project the effect of schedule changes before implementing them. The model was calibrated against real daily logs.

02

Smart Batch Grouping — 20–30% time saved

Tracked mail and newspapers delivered daily (non-negotiable). Standard letters and advertising flyers batched to alternate days. Adjacent buildings grouped into clusters served in sequence. Result: 15–18% reduction in total travel time.

20–30% time saving
03

Non-Priority Deferral Model

Formal classification of mail by service-level impact. Letters and advertising that can be delayed one day without measurable service degradation are identified and deferred when the day's load exceeds capacity. Reduces peak-day pressure without violating delivery commitments.

04

Predictive Loading — 25% search time reduction

Parcels loaded onto the bike using historical address frequency and delivery sequence rather than random order. Search time per stop dropped by 25%. This required only a spreadsheet and about 20 minutes of planning the night before.

25% bike search time reduction
05

Heat Map Itinerary Prioritisation

Morning loading prioritised toward high-density buildings with fewest stairs, processed while physical energy was highest. Fatigue-adjusted time model ensured the heaviest delivery clusters were front-loaded in the day's sequence.

06

Predictive Absence Detection

Historical delivery logs used to identify buildings with chronic non-signature patterns. Tracked mail for those addresses pre-classified as "likely missed" — reducing failed-delivery loops by anticipating the absence rather than discovering it on the doorstep.

Up to 50 min/day recovered
Time savings per optimisation strategy
Efficiency gauge — before vs after
98% EFFICIENCY ACHIEVED

Before vs After — Operational Comparison

Metric Before Optimisation After Optimisation
Daily route efficiency Unstructured, ad hoc 15–18% travel reduction
Bike parcel search time 49 min/day (random) 37 min/day (–25%)
Batch grouping All mail every day Priority daily, standard alternate
Failed delivery handling Discovered at door Predicted in advance
Morning loading strategy Chronological order Density + fatigue weighted
Delivery completion rate Systemic lateness 100% on-time
Overall efficiency ~74% of required capacity 98% — measured and verified
Data engineering

Cleaning the Data
While Delivering It.

The dataset did not arrive clean. Time logs had gaps. Letterbox classifications were inconsistent. Delivery retry logic was undocumented. The first engineering task was making the data usable at all.

D1

Letterbox Classification Model

Modelled all 1,780 letterboxes by type (standard/non-standard), access method (key, badge, code, charm, good humour), and time cost. Built the first complete access map for the district.

D2

Missing Time Log Imputation

Daily time entries had gaps from interrupted routes. Applied anomaly detection to identify outliers and median-based imputation to fill missing values, maintaining statistical validity across the full dataset.

D3

Delivery Retry Logic Engineering

Mapped the retry decision tree (first attempt, missed, left notice, second attempt, depot return) into a formal heuristic model. Enabled prediction of which addresses would require multiple attempts on any given day.

D4

Statistical Process Control — Daily Tracking

Completion time and delivery count tracked against model predictions each day. Deviations logged and used to recalibrate the following day's schedule. A living dataset that improved with every iteration.

D5

Building Cost Index

Each of the 292 buildings assigned a composite "time cost" score based on floors, access difficulty, signature requirement, and historical success rate. Used to sequence morning loading and route prioritisation.

D6

Simulation Validation

Python + Pandas discrete event simulation used to validate field measurements against actual outcomes. Simulated task durations matched real data within 3% — confirming the measurement methodology was sound.

Tools used

Technical stack.

The same stack used in any enterprise data project — applied to a postal district in Strasbourg.

Python Pandas NumPy SciPy NetworkX Matplotlib Plotly Dash Chart.js Discrete Event Simulation TSP Heuristics Statistical Process Control Jupyter Notebooks Excel Hostinger
Project delivery
5.0/5.0
Satisfaction — colleagues & management
Perfect Score.
Imperfect Conditions.

Every letter delivered. Every parcel tracked. Every registered mail signed — or documented as refused, with timestamp. The neighbours got their post on time. The management got something they had not expected: not just results, but proof of how the results were achieved. A 5.0 from people who knew exactly how hard the job was is a different kind of score than one from people who did not.

SCOPE
1,780 letterboxes · 292 buildings · 17 km/day A district officially sized for three full-time postal workers, covered alone without missing a delivery.
DATA
645.6 min of daily activity — fully measured Every task timed, every stop counted. The first structured operational dataset this district had ever had.
OPT
20–30% time savings through three optimisation strategies Smart batching, predictive loading, and heat map routing combined to bridge the 26% capacity gap.
AWARD
Congratulations from colleagues and management The compensation was congratulations. The return on investment was the data. Both were worth having.
"The best optimisation problems are the ones
where you are also a variable in the model."
— Xavier Richert, somewhere between floors 4 and 5 of a building with no lift
Start a conversation Return to portfolio