š Air pollutionās hidden link to diabetes
PLUS: Using generative AI for urban planning, and sensor-agnostic cloud detection
Hey guys, hereās this weekās edition of the Spatial Edge. If youāve ever considered naming your pet dog āGDALā, then we have one thing to say to you: welcome home⦠In any case, the aim is to make you a better geospatial data scientist in less than five minutes a week.
In todayās newsletter:
Pollution & Diabetes: Clinical data links PM2.5 to diabetes.
AI & Cities: Generative models propose adaptive urban plans.
Cloud Detection: New model works across multiple satellite sensors.
SAR Dataset: 130k SAR images paired with text captions.
Farmlands Data: Global dataset maps terraced agricultural parcels.
Research you should know about
1. What particulate pollution reveals about diabetes risk
As someone whoās based in Southeast Asia, I find myself reading a lot about air pollution. And a lot of the impacts of it are pretty widely known. But I was surprised to see a new link between air pollution and type II diabetes, as was recently unveiled in a Scientific Reports publication.
Researchers combined outpatient clinical records from the Italian Association of Diabetologists with municipality-level exposure estimates to PM2.5 and PM10 between 2013 and 2021. Unlike most studies that rely on self-reported health surveys, this dataset covers almost 700,000 patients and provides clinically verified diagnoses across the country.
The diabetes data came from the AMD network, which is the only national clinical dataset covering outpatient cases across 20 regions. Air pollution exposure came from ISPRA, Italyās environmental agency, which produced population-weighted estimates of PM2.5 and PM10 for more than 8,000 municipalities. These estimates were based on ground monitoring stations combined with Bayesian spatio-temporal models that use meteorological data and satellite aerosol optical depth to fill in gaps. The result is a set of high-resolution (1 km²) daily maps of particulate matter, which were then aggregated to annual averages for each municipality.
Their analysis shows a strong association between T2DM and fine particulate matter, especially PM2.5. While overall diabetes incidence declined over the study period, prevalence kept rising, reflecting Italyās ageing population and worsening lifestyle factors. Municipalities with higher PM2.5-to-PM10 ratios had consistently higher diabetes incidence and prevalence. In other words, where smaller, more harmful particles made up a greater share of air pollution, the diabetes burden was higher. Men and older groups were particularly affected.
2. Merging generative AI and urban planning
Urban planning has always been a slow and complex process, shaped by policies, social dynamics, and engineering constraints. But rapid urbanisation, climate change, and ageing infrastructure are pushing traditional methods to their limits. So to address this, a new paper explores the use of generative AI for urban planning. By combining geospatial data, mobility patterns, and environmental constraints, they aim to generate alternative land-use configurations that adapt to changing conditions while still aligning with policy goals.
Recent research frames planning as a conditional generation problem, where AI models learn to produce layouts of zones, streets, and land uses under diverse constraints. Early work has used models like GANs, VAEs, and transformers to generate city plans or zoning schemes, sometimes representing them as graphs of points of interest or grid-based layouts. These studies show promise, but they often overlook the tools and theories that professional planners already use. They also struggle with spatial hierarchies, trade-offs between objectives such as density and green space, and the need for adaptive, multi-step planning.
In this paper, researchers propose integrating multimodal foundation models and agentic AI, where systems can combine visual and textual inputs, simulate long-term outcomes, and refine plans through iterative decision-making. I guess the idea is that planners would interact with AI in natural language, asking it to propose higher-density housing that avoids flood-prone areas or to design greener neighbourhoods that balance resilience and affordability. This vision essentially places AI as a collaborative assistant: goal-driven, context-aware, and able to work alongside humans in shaping the future of cities.
3. A sensor-agnostic model for cloud detection
Most deep learning models in remote sensing work best when they are applied to the same satellite sensor and resolution they were trained on. Thatās sort of a problem, because it forces researchers to build separate models for each dataset, wasting time and effort. A new model, published in Remote Sensing of Environment, called OmniCloudMask (OCM) tackles this by being sensor-agnostic. It uses techniques that let it generalise across Sentinel-2, Landsat 8, and PlanetScope, despite their different resolutions and spectral properties.
To do this, the team introduced two key methods: dynamic Z-score normalisation, which standardises spectral differences on the fly, and mixed resolution training, which randomly resamples data so the model learns to recognise clouds and shadows at multiple scales. The result is a model trained only on Sentinel-2 that can still perform well on other sensors. This is particularly valuable because labelled cloud datasets are scarce and expensive to build.
OCM reaches state-of-the-art performance across platforms: 92% accuracy for clear pixels and 91% for cloud on Sentinel-2, 91.5% for both clear and cloud on Landsat, and almost 99% for cloud on PlanetScope. It also handles cloud shadows better than many existing approaches, which often struggle with this task. The model has been released as an open-source Python package on PyPI, which makes it nice and easy to use across a bunch of different remote sensing workflows.
You can access the data and code here.
Geospatial Datasets
1. SAR-Text dataset
SAR-TEXT is a large-scale dataset with over 130,000 Synthetic Aperture Radar (SAR) images paired with high-quality text descriptions, it was built using the SAR-Narrator auto-captioning framework.
2. Terraced farmlands dataset
The Global Terraced Parcel and Boundary Dataset (GTPBD) is the first fine-grained, high-resolution dataset focused on complex terraced farmlands worldwide. It contains over 200,000 manually annotated parcels from 47,537 images across seven Chinese regions and multiple global climates, with three-level labels for boundaries, masks, and parcels. You can access the dataset and code here.
3. Gridded emissions dataset
The Gridded Mobile-source Emission Dataset (GMED) offers a decade of detailed monthly emissions data for China (2011ā2020). It spans vehicles, ships, aircraft, and construction machinery. It captures regulated pollutants, greenhouse gases, and non-exhaust sources on a 36 km grid. You can access the dataset and code here.
Other useful bits
NASA satellites have recorded a huge spike in fire hotspots across Indonesia, with July 2025 detections up more than tenfold from June and haze already drifting into Malaysia. Many fires are burning in carbon-rich peatlands, raising fears of severe air pollution, health impacts, and a repeat of past regional haze crises like 2015.
EUMETSAT has taken control of Meteosat Third Generation Sounder 1, Europeās first geostationary sounder, now orbiting 36,000 km above the Equator. Over the next few months, scientists will test its cutting-edge instruments before it begins delivering new atmospheric data to improve weather forecasts and climate monitoring.
Airbus will build two PAZ-2 radar satellites for Hisdesat and Spainās Ministry of Defence, ensuring continuity of the PAZ Earth observation mission launched in 2018. The satellites will boost Spainās intelligence, surveillance, and civilian monitoring capabilities, delivering all-weather radar imagery day and night. The first PAZ-2 is set to launch by mid-2031.
Jobs
CoLAB Atlantic is looking for two Data Scientists and Geospatial Programmers (Junior) based in Portugal.
Esri is looking for a GIS Data Analyst based in Canada who will work closely with its Ratio.City Planning Data Team.
Mapbox is looking for a (1) Machine Learning Engineer II, Navigation API based in the UK and a (2) Machine Learning Engineer III, Routing Cost based in Germany.
The Nature Conservancy is looking for a SNAPP Research Fellow: Future Tidal Wetlands based in the US.
Just for Fun
The English Channel swim season is underway, with this yearās tracks already painting a lively picture across the map. Each line marks a crossing in progress or completed, showing just how busy this stretch of water can be for determined swimmers.
Thatās it for this week.
Iām always keen to hear from you, so please let me know if you have:
new geospatial datasets
newly published papers
geospatial job opportunities
and Iāll do my best to showcase them here.
Yohan