Geospatial Analysis of Vegetation Encroachment Risk for Power Line Safety in Nigeria

Geospatial Analysis of Vegetation Encroachment Risk for Power Line Safety in Nigeria

Introduction

In 2024, Nigeria's national grid has experienced at least 10 major collapses, leaving millions who depend on it in days of darkness. These blackouts disrupt businesses, paralyze homes, and highlight the fragility of a system weakened by aging infrastructure, vandalism, and unreliable gas supply. Each collapse reminds us of our heavy reliance on a grid that often struggles to keep up.

While policies like the Service-Based Tariff Policy have improved electricity distribution and made power supply schedules more predictable—allowing me to plan my daily activities with greater confidence—a major challenge that could threaten the grid’s stability is vegetation encroachment. In April 2024, reports revealed that in the southern region, unchecked vegetation had encroached on grid facilities, contributing to infrastructure failures. Overgrown vegetation can spark wildfires, cause outages, and delay vital repairs, making it a hidden but significant risk to grid reliability.

To address this, we use geospatial analysis to monitor and map high-risk areas, ensuring proactive solutions to this growing challenge. A stable power grid is essential for Nigeria’s development, and tackling hidden threats like vegetation encroachment is key to uninterrupted electricity. But how do we pinpoint these risk areas and protect the grid? Let’s dive in.

Data and Methodology

To identify areas at high risk of vegetation encroachment near power lines, I used data analysis techniques and geospatial methods. This involved carefully collecting, processing, and analyzing multiple datasets to gather sufficient information. In this section, I'll guide you through the data acquisition process, the steps I took to prepare the data, and how I used these insights to identify potential risk areas.

To understand the relationship between vegetation and powerlines, I needed reliable information about their locations and the density of surrounding vegetation.

The first source of data was OpenStreetMap (OSM), a widely used platform for accessing spatial data. Using the Overpass Turbo API, I retrieved data about Nigeria’s powerline network. This dataset provided the exact spatial locations of powerlines, forming the basis for the entire analysis. Since the data was initially in JSON format, I converted it into GeoJSON and saved, a more compatible format for geospatial processing and analysis.

Next, I used Google Earth Engine (GEE) to extract NDVI (Normalized Difference Vegetation Index) values from the MODIS MOD13A1 dataset. NDVI is a well-known indicator of vegetation density, making it useful for identifying potential areas of encroachment. With a spatial resolution of 500 meters, the dataset offered a great balance of detail and coverage. I focused on data from January 2023 to December 2024, ensuring the analysis was based on recent vegetation trends. The NDVI data was obtained in raster format and saved as GeoTIFF files for further analysis.

Finally, to give regional context to the study, I obtained administrative boundary data of Nigeria in GeoJSON format. These boundaries outlined Nigeria’s states, allowing me to analyze vegetation risks in specific areas. By overlaying these boundaries with powerline and vegetation data, I could focus mitigation strategies where they were most needed.

Obtaining this data was crucial for the analysis. The powerline data showed where vegetation needed to be assessed. The NDVI data measured the vegetation density around these locations, with higher values indicating areas of greater risk. The administrative boundaries provided an important context, helping to visualize risks and prioritize actions on a state-by-state basis.


Data Preparation and Processing

To analyze vegetation encroachment near powerlines effectively, the raw datasets needed significant preparation and validation. In this section, I will take you through the process of cleaning, merging, and transforming the data.

Once I loaded the data I had saved in the data extraction stage, multiple cleaning steps were performed to enhance its quality and ensure it was ready for spatial analysis:

  1. Powerline Data:

    • Invalid geometries were identified using shapely.validation.explain_validity() and corrected where possible.

    • Missing values were checked using isnull().sum(), and since there were none, I dropped duplicate geometry values using drop_duplicates() incase there was any.

    • The dataset's coordinate reference system (CRS) was standardized to EPSG:4326 using .to_crs().

  2. Administrative Boundaries:

    • I got the necessary columns I needed from the data and the names and boundaries of states were checked for consistency using methods like value_counts() to verify accuracy.

    • To ensure consistency in the CRS of all the data , the CRS was aligned to EPSG:4326.

    • For clarity, columns like state were renamed to City.

  3. Vegetation Data:

    • Clipping: I clipped it to Nigeria's boundaries data using the rasterio.mask.mask() function. This focused the analysis on the country, reducing dataset size and computational requirements.

    • Handling Missing Data: The vegetation data contained missing data represented as NaN values, which I replaced with a placeholder value of -9999 using np.nan_to_num() to prevent processing errors.

    • Reprojection: To enable precise measurements, I reprojected the data from EPSG:4326 (WGS 84) to EPSG:3857 (Web Mercator). After performing necessary spatial calculations, I will convert the dataset back to EPSG:4326 for compatibility with global mapping systems.


Data Merging and Transformation

After cleaning, the datasets were merged and transformed to create a unified framework for analysis.

  1. Extracting NDVI for Powerlines:

    • A custom function, extract_ndvi(), was developed to extract NDVI values from the vegetation raster for each powerline’s centroid, utilizing the raster's affine transformation to convert real-world coordinates to pixel coordinates, ensuring that vegetation density was associated with specific powerlines, providing insights into encroachment risks.
    powerline_data["ndvi"] = powerline_centroids.apply(
        lambda point: extract_ndvi(point, vegetation_data_3857, vegetation_data_3857.transform)
    )
  1. Handling powerlines that were out of bounds:

    • NDVI could not be associated with some powerlines, so I tagged them as out of bounds. There were 13 out of bounds powerlines out of a total of 771.

        # Separate out-of-bounds rows
        out_of_bounds_powerlines = powerline_data[powerline_data['ndvi'].isnull()]
        in_bounds_powerlines = powerline_data.dropna(subset=['ndvi'])
      
  2. Clipping Powerlines to Boundaries:

    • I clipped the powerline data that had extracted ndvi to the boundary data using gpd.clip() to filter out powerlines that fall outside Nigeria's boundaries.

        # Cuts out" the powerlines that fall outside the boundaries of the focus areas (Nigeria)
        clipped_powerlines = gpd.clip(in_bounds_gdf, admin_boundaries)
      
  3. Spatial Join:

    • Using gpd.sjoin(), powerlines within boundaries were spatially joined to their corresponding administrative regions. This resulted in a GeoDataFrame, powerlines_with_city, which contained powerline data linked to cities within Nigeria.

        #includes the name of the city to the dataframe
        powerlines_with_city = gpd.sjoin(clipped_powerlines, 
            admin_boundaries[['City', 'geometry']], how='inner', predicate='intersects')
      
  4. Averaging NDVI Values: To get a better and more accurate measure of vegetation density along powerlines, which is important for spotting encroachment risks, using averages across the powerline geometry is more reliable than single-point values. The average NDVI value for each powerline was calculated using a custom function, extract_ndvi_for_geometry(), and these averages were added as a new column to the powerlines_with_city GeoDataFrame. I kept the extract_ndvi function because it was useful for sorting the out_of_bounds powerlines.

     powerlines_with_city['ndvi'] = powerlines_with_city.geometry.apply(
         lambda geom: extract_ndvi_for_geometry(geom, vegetation_data, vegetation_data.transform)
     )
    
     powerlines_with_city = powerlines_with_city.drop(columns=['index_right'])
     powerlines_with_city = powerlines_with_city[['City', 'ndvi', 'geometry']]
     powerlines_with_city = powerlines_with_city.reset_index(drop=True)
    

The final geodataframe is data on average across ndvi values along the powerlines in bound

Cleaning the Derived Data

Even after merging, additional cleaning steps were necessary to refine the final dataset:

  1. Handling Missing Cities:

    • Some powerlines in the data did not have associated Nigerian states, indicating they were outside the country. Using geopy.geocoders.Nominatim for geocoding confirmed their location in Cameroon, and also identified some in Ekiti and Kwara states within Nigeria. Since the analysis focuses only on Nigeria, records outside the country were removed to ensure data accuracy and relevance, making sure the study's findings are specific to Nigerian powerlines. The records in Ekiti and Kwara were kept as they are part of the study area.

        powerlines_with_city.loc[231, 'City'] = 'Kwara'
        powerlines_with_city.loc[234, 'City'] = 'Ekiti'
        powerlines_with_city = powerlines_with_city.drop(index=[425, 666])
        powerlines_with_city.isna().sum()
      
  2. Removing Duplicate Geometries:

    • About 14.16% of the records in the data were duplicates, which I removed using drop_duplicates(subset='geometry').

        ratio = powerlines_with_city['geometry'].duplicated().sum() / powerlines_with_city.shape[0]
        print(f'we have {round(ratio * 100, 2)} percent of repetition in the data')
      
  3. Ensuring Accuracy of NDVI values:

    • The standard NDVI range is from -1 to 1. However, my NDVI values were in the thousands. I checked the MODIS MOD13A1 dataset from Google Earth Engine, which provides vegetation indices, including NDVI, at a spatial resolution of 500 meters. The scale factor for NDVI in MOD13A1 is 0.0001, meaning NDVI values are stored as scaled integers, usually ranging from -10,000 to 10,000. To convert them to the correct range [-1, 1], I need to divide by 10,000.

        powerlines_with_city.loc[:, 'ndvi'] = powerlines_with_city['ndvi'] / 10000
        powerlines_with_city.describe()
      
  4. Data Validation:

    • Assertions were used to confirm the completeness of the dataset, validity of geometries, correct CRS alignment, and proper scaling of NDVI values.

        print(f'{powerlines_with_city.is_valid.sum()} valid geometries out of {powerlines_with_city.shape[0]} observations')  # Check for ratio of invalid geometries
        print(powerlines_with_city.dtypes)
      
        # Completeness
        assert not powerlines_with_city.drop(columns=['geolocated_city']).isnull().any().any(), "Missing values detected!"
      
        # Validity
        assert powerlines_with_city.is_valid.all(), "Invalid geometries found!"
      
        # CRS
        assert powerlines_with_city.crs == "EPSG:4326", "Incorrect CRS!"
      
        # Attribute Checks
        assert powerlines_with_city['ndvi'].between(-1, 1).all(), "NDVI values out of range!"
      
  5. After completing the assertions, I saved my processed data, ready for analysis to derive insights from it.

    Please find my embedded notebook below or click here, where I performed these operations:

Data Analysis

To extract meaningful insights from the prepared data, I applied the following analytical techniques:

  1. Descriptive Statistics: To gain an initial understanding of the data's characteristics, I used descriptive statistics. Histograms and boxplots showed the distribution of NDVI values. Bar plots highlighted areas with dense vegetation and displayed the frequency of powerlines in each state, offering insights into regions with higher infrastructure density.

    These analyses revealed that NDVI values are mostly concentrated between 0.0 and 0.2, indicating sparse vegetation or bare land across much of the area, with fewer regions showing moderate vegetation density (0.4 to 0.7).

  2. Spatial Analysis: To identify high-risk regions, I used a choropleth map to highlight states with high average NDVI. Additionally, a bubble plot on the choropleth map helped highlight states with a high frequency of encroachment hotspots, defined by NDVI values greater than 0.4 across powerlines. The zoomed-in maps provided precise locations for vegetation management efforts within states.

  3. Relationship Analysis: I used scatter plots to explore correlations between average NDVI, the number of powerlines, and the frequency of hotspots within states. This helped identify patterns and prioritize areas for intervention.

    Please learn all insights I gathered in the slide below:

Libraries Used:

  • GeoPandas: For geospatial data manipulation and analysis.

  • Matplotlib and Seaborn: For creating detailed visualizations such as histograms, scatter plots, and maps.

  • Rasterio: For handling raster data (NDVI values).

  • Shapely: For managing geometries in geospatial datasets.

  • Geopy: For reverse geocoding coordinates to provide additional location context.

Discussion and Conclusion

This analysis identified several high-risk locations for vegetation encroachment along power lines in Nigeria.

The insights from this analysis highlight the need for targeted vegetation management strategies. Prioritising high-risk zones for intervention can optimise resource allocation and improve the effectiveness of mitigation efforts. Specific strategies include:

  • Deploying rapid response teams during peak growth seasons.

  • Utilising automated NDVI monitoring systems to track vegetation health.

  • Implementing adaptive management practices tailored to local environmental conditions.

Policy Recommendations

To reduce the risks associated with vegetation encroachment, the following actionable policy recommendations are proposed:

  • Risk Thresholds: Establish NDVI-based thresholds to guide vegetation clearance operations, ensuring resources are focused on areas most at risk.

  • Resource Allocation: Allocate resources to high-risk regions using data-driven prioritisation, reducing the likelihood of power line disruptions.

  • Clearance Regulations: Develop and enforce regulations specifying safe distances between vegetation and power lines, informed by data on encroachment patterns.

  • Data Sharing: Encourage government agencies and organisations to make geospatial and vegetation data readily available to researchers. Access to high-quality data will enable the development of more effective, evidence-based solutions to vegetation management challenges.

Limitations and Future Work

While the analysis provides valuable insights, it is not without limitations. The use of MODIS NDVI data, with its coarse spatial resolution, may overlook finer details of vegetation encroachment. Additionally, preprocessing steps, such as handling missing data, may introduce biases. Future research directions include:

  • Incorporating higher-resolution datasets and alternative vegetation indices to enhance precision.

  • Investigating the impact of other environmental factors, such as rainfall and soil composition, on vegetation growth near power lines.

  • Conducting temporal analyses to predict future encroachment trends and inform proactive management strategies.

  • Strengthening collaboration between researchers, government agencies, and utility companies to improve data accessibility and ensure solutions are grounded in robust datasets.

By addressing these limitations and promoting data availability, stakeholders can leverage geospatial analyses to safeguard infrastructure, improve vegetation management, and enhance the reliability of power distribution systems in Nigeria.

Reference