Flash floods kill more than 5,000 people each year. They are also among the hardest weather events to forecast — too short-lived and localized for traditional monitoring systems to capture consistently.
Google has built a model to address that gap, and it started with old news articles.
Mining 5 Million Articles for Flood Data
Researchers used Gemini, the company’s large language model, to scan 5 million news articles from around the world. The model isolated reports of 2.6 million individual flood events and converted them into a geo-tagged time series the team calls “Groundsource.”
According to the announcement, this is the first time Google has used a language model for this kind of data-extraction work. Gila Loike, a Google Research product manager, confirmed the approach is new for the company.
The data scarcity problem it solves is real. Deep learning models need historical records to learn from. Flash floods leave almost none — at least not in structured, machine-readable form. News reports, it turns out, fill that void.
“Data scarcity is one of the most difficult challenges in geophysics,” said Marshall Moutenot, CEO of Upstream Tech. “This was a really creative approach to get that data.”
How the Forecasting Model Works
With Groundsource as a real-world baseline, the team trained a Long Short-Term Memory (LSTM) neural network to ingest global weather forecasts and output flash flood probabilities for specific areas.
The model now flags risks across urban areas in 150 countries through Google’s Flood Hub platform. Its data feeds directly to emergency response agencies globally.
António José Beleza, an emergency response official at the Southern African Development Community, trialed the system and said it helped his organization respond to floods faster.
The model is not without limits. It operates at a resolution of 20-square-kilometer areas — coarser than the US National Weather Service‘s flood alert system, which draws on local radar data for real-time precipitation tracking. Google‘s model does not currently incorporate that radar layer.
That limitation is partly by design. The system was built for regions where governments lack the budget for weather-sensing infrastructure or have sparse meteorological records.
“Because we’re aggregating millions of reports, the Groundsource data set actually helps rebalance the map,” said Juliet Rothenberg, a program manager on Google‘s Resilience team. “It enables us to extrapolate to other regions where there isn’t as much information.”
The team says the same approach — using large language models to extract quantitative data from qualitative written sources — could extend to forecasting other hard-to-track events, including heat waves and mudslides.
Photo by Quinn Simonson on Unsplash
This article is a curated summary based on third-party sources. Source: Read the original article