1,780 letterboxes. 300 kg of mail. 17 km on a bike. One person.
When only 74% of the required time is available to do a job officially designed for three, you have two choices: panic, or open a Jupyter notebook. I chose the notebook. The mail got delivered. The data got analysed. Both arrived on time.
The first step was measurement. Every task timed, every stop counted, every door opening logged. The result was the first properly structured operational dataset this district had ever seen — built while doing the job it was documenting.
| Task | Min/Day | Detail |
|---|---|---|
| Tri/chargement matin | 120 | 2 hours — fixed daily |
| Lettres recommandées (118/day) | 99 | 50 sec each |
| Boîtes non standard (920) | 77 | ~5 sec each |
| Ouverture de portes (292) | 58.4 | ~12s badge/key |
| Déplacements (292 buildings) | 51.2 | ~12s per building |
| Publicités (280/day) | 47 | ~2s each |
| Recherche colis sur vélo (292x) | 49 | 10s each |
| Boîtes standard (860) | 43 | ~3s each |
| Colis suivis (105/day) | 35 | 20s scan + deliver |
| Journaux/magazines (142) | 24 | Priority items |
| Pauses (water, toilets) | 14 | Basic human needs |
| Rechargements (7x/day) | 10.5 | 4 street relays |
| Trajet vélo + recovery | 17.5 | Commute time |
| Total Actual | 645.6 | = 10.76 hours |
| Planned contract | 480 | = 8 hours |
| Daily deficit | 165.6 | = 2.7 hours over |
Note: 2.7 extra hours/day is not an anomaly. It is the structural gap between how many postal workers were needed and how many were actually provided.
Conclusion: Only 74% of the required time was available. This is not a performance problem — it is an allocation problem. Data science was used to bridge the remaining 26% through optimisation.
These are not estimates. These are measured, recorded, and statistically verified field observations from a researcher who was also carrying the mail bags.
920 out of 1,780 letterboxes (52%) are non-standard format. Each costs ~5 seconds extra. Over a full day: 22 irretrievable minutes, every day, forever. No one had quantified this before.
When residents in certain buildings consistently do not sign for tracked mail, the failed-delivery loop (reattempt + paperwork + supervisor conversation) costs up to 50 minutes — plus one grumpy supervisor, free of charge.
7 recharges per day from 4 street relay points. 1,500 kg of mail per week transported. 17 km covered daily. No gym subscription required. Lower back subscription, unfortunately, non-optional.
Without a sorted loading system, 292 parcel searches per day averaged 10 seconds each. Predictive loading by delivery frequency cut this by 25% — saving 12 minutes with zero additional equipment.
A postal worker is a human logistics robot — required to smile, memorise 5,000 names, and pedal in the rain with 20 kg of advertising flyers. Why not make it a serious data project?
Three data science projects, each targeting a different source of time loss. Together, they brought a structurally impossible job within the range of the merely very difficult.
Built a discrete event simulation to model task durations, validate field measurements against actual outcomes, and project the effect of schedule changes before implementing them. The model was calibrated against real daily logs.
Tracked mail and newspapers delivered daily (non-negotiable). Standard letters and advertising flyers batched to alternate days. Adjacent buildings grouped into clusters served in sequence. Result: 15–18% reduction in total travel time.
20–30% time savingFormal classification of mail by service-level impact. Letters and advertising that can be delayed one day without measurable service degradation are identified and deferred when the day's load exceeds capacity. Reduces peak-day pressure without violating delivery commitments.
Parcels loaded onto the bike using historical address frequency and delivery sequence rather than random order. Search time per stop dropped by 25%. This required only a spreadsheet and about 20 minutes of planning the night before.
25% bike search time reductionMorning loading prioritised toward high-density buildings with fewest stairs, processed while physical energy was highest. Fatigue-adjusted time model ensured the heaviest delivery clusters were front-loaded in the day's sequence.
Historical delivery logs used to identify buildings with chronic non-signature patterns. Tracked mail for those addresses pre-classified as "likely missed" — reducing failed-delivery loops by anticipating the absence rather than discovering it on the doorstep.
Up to 50 min/day recovered| Metric | Before Optimisation | After Optimisation |
|---|---|---|
| Daily route efficiency | Unstructured, ad hoc | 15–18% travel reduction |
| Bike parcel search time | 49 min/day (random) | 37 min/day (–25%) |
| Batch grouping | All mail every day | Priority daily, standard alternate |
| Failed delivery handling | Discovered at door | Predicted in advance |
| Morning loading strategy | Chronological order | Density + fatigue weighted |
| Delivery completion rate | Systemic lateness | 100% on-time |
| Overall efficiency | ~74% of required capacity | 98% — measured and verified |
The dataset did not arrive clean. Time logs had gaps. Letterbox classifications were inconsistent. Delivery retry logic was undocumented. The first engineering task was making the data usable at all.
Modelled all 1,780 letterboxes by type (standard/non-standard), access method (key, badge, code, charm, good humour), and time cost. Built the first complete access map for the district.
Daily time entries had gaps from interrupted routes. Applied anomaly detection to identify outliers and median-based imputation to fill missing values, maintaining statistical validity across the full dataset.
Mapped the retry decision tree (first attempt, missed, left notice, second attempt, depot return) into a formal heuristic model. Enabled prediction of which addresses would require multiple attempts on any given day.
Completion time and delivery count tracked against model predictions each day. Deviations logged and used to recalibrate the following day's schedule. A living dataset that improved with every iteration.
Each of the 292 buildings assigned a composite "time cost" score based on floors, access difficulty, signature requirement, and historical success rate. Used to sequence morning loading and route prioritisation.
Python + Pandas discrete event simulation used to validate field measurements against actual outcomes. Simulated task durations matched real data within 3% — confirming the measurement methodology was sound.
The same stack used in any enterprise data project — applied to a postal district in Strasbourg.
Every letter delivered. Every parcel tracked. Every registered mail signed — or documented as refused, with timestamp. The neighbours got their post on time. The management got something they had not expected: not just results, but proof of how the results were achieved. A 5.0 from people who knew exactly how hard the job was is a different kind of score than one from people who did not.