Predicting Micropollutant Removal by Reverse Osmosis and Nanofiltration Membranes: Is Machine Learning Viable?
Environmental Science & Technology
Search for the Publication In:
Predictive models for micropollutant removal by membrane separation are highly desirable for the design and selection of appropriate membranes. While machine learning (ML) models have been applied for such purposes, their reliability might be compromised by data leakage due to inappropriate data splitting. More importantly, whether ML models can truly understand the mechanisms of membrane separation has not been revealed. In this study, we evaluate the capability of the XGBoost model to predict micropollutant removal efficiencies of reverse osmosis and nanofiltration membranes. Our results demonstrate that data leakage leads to falsely high prediction accuracy. By utilizing a model interpretation method based on the cooperative game theory, we test the knowledge of XGBoost on the mechanisms of membrane separation via quantifying the contributions of input variables to the model predictions. We reveal that XGBoost possesses an adequate understanding of size exclusion, but its knowledge of electrostatic interactions and adsorption is limited. Our findings suggest that future work should focus more on avoiding data leakage and evaluating the mechanistic knowledge of ML models. In addition, high-quality data from more diverse experimental conditions, as well as more informative variables, are needed to improve the accuracy of ML models for predicting membrane performance.