
AIR QUALITY ANALYSIS
The objective of this project was to perform exploratory data analysis by plotting different types of graphs, sampling the given data sets, and comparing the Air Quality between two cities of Vietnam – Hanoi and Ho Chi Minh for the year 2019 by conducting various tests.












Course - DANA 4800 | PDD Data Analytics
description
There are two data sets which provide the profiling of PM 2.5 concentration levels acquired from the fixed air quality monitoring systems in the U.S Embassies in the two cities of Vietnam in 2019 – Hanoi and Ho Chi Minh.
Firstly, the accuracy issues and Missing values in the data sets were identified and handled. The clean data was then used to generate Box plots, Histograms, and Scatterplots to examine the distributions, relationships, outliers, and structure of the data.
The datasets were then sampled using Systematic sampling technique. The sampled data was then used to conduct mean, median, maximum, minimum and correlation tests to compare the patterns of the air qualities in the given cities.
It was concluded that Ho Chi Minh has better Air Quality than Hanoi for the year 2019.
Skills used
-
Analyzed the data sets consisting of pollution concentrations levels in Vietnam and compared the air quality of different cities using MS Excel and SAS Studio
-
Developed a SAS program to import the datasets, identify different types of variables, perform Univariate and Bivariate analysis, treat missing values and outliers in order to create reports and visualizations on refined data
-
Resolved accuracy issues by researching different variables used to calculate Air quality Index, analyzing relationships between these variables, and finally evaluating the variables (NowCast concentration and AQI) using formulas provided by the US EPA
-
Demonstrated the findings and interpretations of the exploratory data analysis by documenting the steps used in cleaning, analyzing, and interpreting the given datasets using Microsoft Word
-
Visualized distributions of continuous variables in both the cities using Histograms and Boxplots and discovered correlation between the air quality of the two cities using Scatterplots