How can machine learning help water utilities find lead service lines?
March 08, 2023
March 08, 2023
All utilities face October 2024 deadline to inventory water lines. Using artificial intelligence can save time, money, and reduce field excavation.
For the first time in 30 years, the Environmental Protection Agency (EPA) recently amended the Lead and Copper Rule (LCR) with the Lead and Copper Rule Revisions (LCRR). These recent changes took effect in December 2021 with a compliance date of October 16, 2024.
First up? All (71,000+) US water utilities must submit an inventory of their system’s service lines. This must include the material type for both the customer and the utility-owned portions of the line.
The challenge? Which lines contain lead?
And it’s not just lead. Galvanized steel pipe can also be a long-term source of lead as the surface zinc coating can contain up to 2 percent lead. Lead tends to accumulate in the rusted interior of aging galvanized iron pipe, which is often found downstream of lead service lines. Particles can detach during a change in water pressure or pipe vibration, thereby increasing lead levels even after the upstream lead service line has been removed.
The Natural Resources Defense Council estimates that there could be as many as 12 million lead service lines—potentially serving more than 22 million people across the nation. And that number could be underestimated.
The challenge? Excavating pipes to determine their material is costly and time-consuming. Most utilities have no idea where to even begin. To make matters worse, lead pipes are seldom evenly distributed across a city. And no leader would want a patchwork of potholes randomly dug across their municipality.
The good news? Even though the proportion of lines with lead is often small, it’s not a “needle in the haystack” problem anymore with the help of machine learning (ML).
In a previous blog, I shared how artificial intelligence and machine learning can help predict the useful life of water infrastructure. Now, using this same method, we can estimate the probability that a pipe contains lead. The next step? Findings are identified for further investigation; this lays the groundwork for funding needs and replacement discussions.
Water agencies tend to have incomplete data. It is found in a mix of paper records, spreadsheets, databases, PDF documents, and GIS data. Stantec has a team of data and GIS analysts that can assess, clean, and combine that data into a centralized database. From there, our ML tool can predict the presence of lead lines from such data with better accuracy, rather than intuitive guesses.
A best-known existing inventory is a good place to start. Performance of the model depends on the quality of the data—i.e., garbage in, garbage out. A good base for building an inventory—and the starting elements of our modelling approach—include relevant data available such as:
Machine learning is just that—learning based on data. Each city and water system is different. The ML algorithms must reflect that.
Next, we shift our attention to developing algorithms. Instead of searching aimlessly until results or discoveries arise, we combine the expertise of our subject-matter experts with statistical best practices to deliver a robust model that exploits patterns buried in the data to make predictions. Subtle distinctions in such data tends to be difficult for humans to easily grasp, but ML can. This is the “invisible” power of machine learning. This method won’t find all your pipes, but it will get you to the lead that’s out there more quickly.
By combining the data with AI-led automation on a transparent, integrated platform, our ML tool—part of the Stantec Altitude™ platform—helps take what was once raw data and turns it into something useful. But this is not a simple task. It requires a mix of persistence, creative data engineering, and mindfulness of real-world costs.
First, raw data must be extracted from a variety of analog and digital sources, transformed into a standardized schema, and loaded into a centralized inventory; or extract, transform, load (ETL) for short. Then the exploratory data analysis, or EDA, begins as we search for relationships between labels.
“Features” are a measurable input we can use to predict a pipe’s material. When it comes to machine learning, feature engineering is a method of making data easier to analyze. We use the data to find relationships between variables. Many are not obvious, as is the case with traditional deterministic methods. The potential range of features is vast, so we measure the importance of each and retain the best. ML helps us to measure a variable’s degree of usefulness.
All these steps enhance the model’s accuracy. The result? An efficient ML model that identifies lead service lines while requiring utilities to acquire only the features that matter most.
Using this information, our tool builds a library of the findings. By centralizing inventory data, modeling outcomes, and loading field data collection information into a user-friendly interface, a utility can gain a better appreciation for where to turn next. More to come on that topic in our next blog.
From this point, we develop a lead pipe replacement strategy for the utility owners. This is also where we can add more variables into the mix. Issues like regulatory constraints, equity, and environmental justice; school and daycare locations; and areas of veteran and elderly populations, are all variables that can help prioritize next steps.
Our strategy aims to consider all these observations and formulate a sound approach. The key? The data leads to the strategy, not the opposite.
Initially, our ML strategy uses historical service line data to start. This approach directs where to perform initial inspections—a key first step. Address by address and service line by service line, new sampling and verification information brings more certainty to the surface. Little by little, our predictive model evolves and improves.
A focused location-based, data-management system used in tandem with precise results from the model surely beats the alternative—a costly field-locating program.
Machine learning is just that—learning based on data. Each city and water system is different. The ML algorithms must reflect that. Although decisions are informed by the data—flexibility is built into each model to adapt to each community.
Once complete, we can use inventory data in many ways. First, it will be exported to the EPA and various state regulatory templates. It can also be used to run sampling and validation programs, execute LCRR compliance, communications, and identify and scope replacement programs.
Our team has supported clients for decades in these mandated efforts to provide safer drinking water by successfully removing lead from drinking water systems. The ML tools allow us to help more effectively.
We optimize efficiency through a proven technical approach. It is machine learning methods like these that can ultimately make lead service pipe identification and replacement faster and more cost-effective.
Time is ticking, October 2024 will be here before we know it.