1- PhD Candidate, Department of Environmental Engineering, Faculty of Civil and Environmental Engineering, Tarbiat Modares University, Tehran, Iran
2- Assistant Professor, Department of Environmental Engineering, Faculty of Civil and Environmental Engineering, Tarbiat Modares University, Tehran, Iran , f.borhani@modares.ac.ir
Abstract: (88 Views)
Air pollution is a major challenge in megacities, and its management depends on high-quality data. In developing countries like Iran, accessing reliable ground-based data is difficult. Satellite data offers a promising solution, but incomplete and outlier data remain significant challenges. This study addresses the issue of incomplete air pollution data in Tehran by employing a hybrid approach for data refinement and reconstruction. The dataset includes NO₂, CO, and O₃ pollutants from the Sentinel-5p sensor and meteorological variables from ERA5-land, covering December 2018 to March 2025. Results indicate a high prevalence of incomplete data for all pollutants in December due to weather conditions, with CO showing the highest level of incompleteness. A two-stage process using univariate Robust Z-score and multidimensional Isolation Forest (IF) was applied to identify outliers. Analysis revealed that cold months had the highest number of outlier data for pollutants, with NO₂ exhibiting the most outliers compared to other pollutants. The LightGBM algorithm was used to reconstruct missing values, yielding (r²) of 0.61, 0.50, and 0.38 for NO₂, O₃, and CO, respectively. Despite data limitations and the absence of complex spatio-temporal algorithms compared to previous studies, the results, particularly for NO₂ and O₃, are considered satisfactory. This research demonstrates the potential of integrating satellite and meteorological data with machine learning to enhance air quality monitoring in data-scarce urban environments.
Article Type:
Original Research |
Subject:
Air pollution Received: 2025/05/9 | Accepted: 2025/05/31 | Published: 2025/05/31