Geeking out Realtime Data at Open Data Hackathon

Events | AUGUST 4 2019

On 26th July 2019, Around 35 data geeks and environment experts came together at Open Data Hackathon organized by Open Knowledge Nepal to explore the real-time air pollution and hydrology data to find key insights. The event started with the basic introduction of the real-time data dashboard and datasets available. The participants were then divided into the new groups to brainstorm ideas and to discuss the types of work they can carry out to fully utilize the Hackathon time.

Some of the highlighted work of the Hackathon are

— Comparison in-between Kathmandu, Beijing, and Delhi’s PM2.5 to find whether the countries pollution was correlated. The results indicated that the countries follow similar seasonal patterns in terms of pollution, but, using the data, we cannot state how pollution in one country is the cause of pollution in another country.

The most significant insight was that people should be careful in the morning (5 AM – 10 AM), especially during the winters, as that was when pollution (PM2.5) increased the most. Winter months are the worst months for pollution. Summer months, especially during the rainy season, are the best for walking around as it has the lowest pollution rate. August is the best month out of the year with an average of 53.38 PM2.5, which is still worse than the EPA accepted ‘good’ (0 – 50 PM2.5) range. The worst month is January with an average 175.49 PM2.5, or more than three times EPA ‘good’ levels.

Please find the interactive Power BI visualization of all analysis and insights from here. 

— Plotting the correlation between different parameters in the environment with the concentration of particulate matter in the air. The team uses the datasets of DHM, Pokhara, although the available dataset of Pokhara did not contain the parameters that team was looking after, such as the humidity, rainfall, air temperature & wind speed. Then the team worked on the past data from different sites and combined the data to make a more complete dataset. The correlation between the different parameters was observed, which can be used for the weather forecast for the next day as well as the trend of PM 2.5 concentration in the atmosphere to make helpful predictions about the concentration of PM 2.5 particulate matter for the next day.

The team needed a quite large dataset to make accurate predictions about the levels of PM 2.5. However, they tried incorporating a prediction engine into the system with little success. In the end, they created an infographic with content to aware people about the effects of particulate matter and the ways to be safe.

All the analysis of air quality and correlation visualization with different environmental factors done by the team can be found here.

— Filtered out the datasets, keeping only the maximum values of water levels from each month, and one particular river tributary (Bagmati at the stations: Khokana, Raigaun, Sundarijal bridge, Bhorleni), and plotted the values in the graph and highlighted the risk zones from which the users could check if their houses fall on the danger zone or not.

The team is also creating a twitter bot that would check all the datasets from the real-time API, and post an alert tweet whenever the current level of water reaches the warning threshold for that particular river tributary. Follow their work from here.

— The trend of air pollution data from 26th May to 26th July of three different locations, namely Ratnapark, Bhaisipati & Sauraha were visualized. The team worked on the parameters given by TSP, PM1, PM10, PM2.5 & Black Carbon to plot the hourly effects of the parameters on 3 months on average in a graph.

The team also worked to see the parameters’ trend in each month for Ratnapark & Sauraha and found that Ratnapark had comparatively more pollution than Sauraha. However, the data for the Sauraha did not have black carbon parameters.

Analysis and insights of this team can be found here.

————————————————————————————————————————————————————————

All the work done by the team at Open Data Hackathon has been open-sourced and can be accessed from the OKN GitHub repository.