Dieu_HOA Nguyen

Jul 1, 2020

5 min read

AI challenge: Combining public data to detect area in need of electricity

Leverage public data source to tackle social challenge.

Recently I have found Omdena then joining as collaborator to utilize AI for good. 8 weeks of challenge was intense and mind-opened. Collaborators come from different nations tackling the challenge of detecting the most suitable area for installing sustainable energy for Nigeria. To sum up of requirement: Approximately 50% of Nigeria population is not connected to the national electricity grid. Another 50% are using fossil-fuel energy which is expensive, unsustainable, noisy and health-threatening. We along with Omdena’s partner organization RA 365 are on the mission to bring electricity to area in need.

I am a big fan of living green and supporting impoverished people so it is why I find the project is meaningful in many ways.

At the kick-start meeting, It seems collaborators all share the same confusion because we do not have dataset and well-defined requirement in advanced, not like typical hackathons, competitions. Weeks after, we took a lot of time brainstorming about problem statement & identify potential dataset. Ideas come and go, some win and are fully followed up. One of striking ideas I think is using satellite image of nighttime light as proxy for electricity availability.

In this post I would like to guide you through our process from extracting satellite image to validate and process it in tabular format then clustering.

Back to idea of Nighttime light

We divided into subgroup working on estimating potential sustainable energy (supply) & finding energy demand area (demand) and few more. The nighttime light idea is quickly adopted by demand group. There are two sources of nighttime light data that we consider are from Visible Infrared Imaging Radiometer Suite (VIIRS) & The Defense Meteorological Program (DMSP) Operational Line-Scan System (OLS) . Dataset are available from 2014–01–01T00:00:00 — Present and 1992–01–01T00:00:00–2014–01–01T00:00:00 sequentially. They both comes from National Centers for Environmental information (NOAA) and available in Google Earth Engine (GEE). In the beginning, it is a bit tricky to extract data from GEE but then it is convenient if you acquire some basic knowledge.

Each datasets is superior in some ways:

  • VIIRS: Most up-to-date & Higher resolution: 15 arc instead of 30 arc as DMSP — OLS.
  • DMSP-OLS: Include processed band “stable_light” — “The cleaned up avg_vis contains the lights from cities, towns, and other sites with persistent lighting, including gas flares. Ephemeral events, such as fires, have been discarded. The background noise was identified and replaced with values of zero” (GEE).

It is when knowledge from local person is utilized. We consulted RA 365 for some region with/without electricity in reality for checking.

Nigeria nighttime light generated by VIIRS & DMSP-OLS dataset.

Note: data is selected in 12 months and aggregated by median. Blue color is area with electrical light from VIIR & grey from DMSP-OLS.

In general, two datasets share the same pattern. But looking into more detail, some areas lit on in VIIRS but not in DMSP-OLS and vice versa.

Ekiti state nighttime light generated by VIIRS & DMSP-OLS datasets

Omuo (furthest east cluster) is lit on in DMSP but not VIIRS and in reality, it is not provided electricity. VIIRs win over DMSP-OLS this case.

Which dataset should we use or should we move on to find new dataset? Surprisingly, when we fill up missing piece of VIIRS with DMSP-OLS, it brings positive result and sufficient to go forward next steps. Refer to my code for downloading data from GEE using colab here.

Area with population presence but not lit up.

The next step includes combining no electricity area with population data then clustering to smaller area so solar containers could provide enough energy for residence of one cluster. Also it is feasible for wiring up.

It is a bit of tricky working with .tif file so I convert them into tabular formats for better processing. Due to the big volume of data, I take advantage of dask for joining light and population data. Check out the code for detail of processing.


Due to the restraint that each solar panel can provide up to 4000 households. It then leads to clustering part with constraint of total population per one cluster. We adopt DBSCAN cluster algorithm. I will soon update the cluster code in my repo.

Filter out

RA365 prioritizes area further away from national electrical grid, area with public service organization (eg school & healthcare), photovoltaic power potential. All of them comes to condition of filtering out.


Each cluster all contain information of distance from national electrical grid, number of public organization (school +healthcare), potential photovoltaic power, number of population, area density. Data of potential photovoltaic power can extract here. And location of school & healthcare can extract here.

Each element is used for dense rank: the higher number of population the higher rank, the higher photovoltaic power expectation the higher rank, the more densely the cluster the higher the rank and so on. The final rank is the sum of all rank. The higher the final rank, the higher priority the cluster.

Lessons learnt

  • I have learnt how to work with Google Earth Engine specifically and geospatial dataset in general.
  • It is fulfilled when you know that your effort will benefit some underpriviledged out there.
  • Love the vibe of learning and sharing knowledge. It is cool when every step you go, there is other collaborator or partner organization counter-argument or validate your result. You then all end up with better solution.

You can refer to my repo here for more detail. Script is in progress so stay tune!

Note: You need to sign up with GEE by filling the form link. Instead of extracting satellite image via python, you can process multiple images and cluster as well. Refer here for more detail.