MyStories - This is where tell stories about my own self and share my personal opinions. Usually I will avoid controversal topics but I am happy to talk about them privately.

SciEssays - This is where I write about the natural sciences. Here includes knowledge that I have learnt from phyics, chemistry and biology, and I will explore their applications, theories and research. From as great as the universe, to as tiny as an atom, every part that exist in the world will be included. They are all so fascinating and interesting to explore.

Python - This is where share my interesting findings in my own data analysis projects. I like creating visualisations for figures and discussing about data. Not everyone is sophistiated in statistics so unlike the norm, I will try to overcome the barrier of using technical terms by explaining the complex data in an comprehensible way, so everyone can feel comfortable in interpreting numbers.

2025年7月22日星期二

Investigation of HK Covid-19 and data exploration

The aim of this project is to explore HK covid cases data and provide informative description and analysis. Data is obtained from data.gov.hk and the data was collected from 2020 to 2022. Data dictionary file can be accessed from this link

Overview of the dataset


These are the first few lines of the dataset. At the beginning, the number of covid reports over two years time is explored. 


From this, several peaks can be seen, suggesting possible outbreaks of the pandemic in the period. Those periods happened in 2020/03-04, 2020/07, 2020/11-12, 2022/02. To validate the data, the graph is compared with other news and resources online. These data aligned with the 2nd to 4th outbreaks of covid in HK and also marked the beginning of the 5th outbreaks, according to Wikipedia

Further investigation of the age distribution, it seems to show a normal distribution with a mean, median and mode of  43, 42 and 38 respectively. For children at age of <1, they are relabelled as 0 in the graph. Not surprising to see that the number of reports at the age of 0-2 is relatively higher in the trend. However, not many cases were reported from elderly. 


The proportion of Male to Female is roughly 1:1 from the given data. However, in some research shows that Males are possibly twice more likely to get covid infection than Female. (Zaher et al., 2023) It is possible that a larger dataset is required to show this trend. Since this data is only collected from HK, research conducted on other races may not be applicable. 


Next, reports from HK and non-HK residents are compared. It is clear that the majority of the reported cases were from HK residents. Analysing with the classification chart, more than half are imported cases. From these, it can be predicted that the major source of  covid is from HK residents coming back to HK. This helps the government to introduce new laws and regulations to prevent HK residents that arrived from other countries stop spreading covid to the local community. 

Exploring people's awareness and time to report

Given the date of onset and report date, I decided to investigate whether the time that people take to report covid correlates with people's awareness. I hypothesise that the more reported cases on that day, the less time people will take to report. 


The distribution of days to report after onset is plotted. Most people took about 2 days after the day of onset to report. Very few people took more than 15 days to report. 

Plotting the number of reports on that day against the average days to report covid for people who received symptoms on that day, a correlation coefficient of -0.107 is obtained with a p-value of 0.0124. This shows a very weak negative correlation that aligns with the hypothesis and a significant p-value. That means people's awareness is likely to be a factor that affects report speed but will unlikely be taken into account in real analysis because of the small effect. 

Reference:


Zaher, K., Basingab, F., Alrahimi, J., Basahel, K. & Aldahlawi, A. (2023) Gender Differences in Response to COVID-19 Infection and Vaccination. Biomedicines. 11 (6), 1677. doi:10.3390/biomedicines11061677.




沒有留言:

發佈留言

Popular