The coronavirus disease (COVID-19) has spread rapidly around the world following its initial outbreak in the City of Wuhan, the capital of Hubei province of China. By the beginning of 2021, the COVID-19 has affected almost all countries and territories across the world with the global death-toll exceeding two million. Although all of the factors that contributed to the rapid spread of the virus are not precisely known yet, it is believed that socio-economic activities requiring inter-personal interactions, certain long-term health conditions, and lifestyle may have acted behind the unprecedented spread of the disease. To capture the effects of such factors on the number of people infected, as an econometrician, you decide to choose variables representing level of the economic development, population characteristics and the geographical locations of various countries of the world as of 1 February 2021. The dataset [MAE256 T1 2021 Assignment Data] for the assignment is provided by on the MAE256 unit site on CloudDeakin and contains information on the continent of each country (Continent), total number of infected people (Cases), Gross Domestic Product per capita (GDP), population density (POP), percentage of population aged more than 70 years (Pop70), and the prevalence of diabetes (Diabetes). The dataset for this assignment has been obtained from: https://ourworldindata.org/coronavirus-data.
NOTE: You need to use the dataset provided by the Unit Team on CloudDeakin for the assignment. Please include all Excel output tables for summary statistics and regressions, and all figures in your submission.
Variable definitions
Country: The name of each country in the dataset
Continent: The continent of each country in the dataset
Cases: Total number of infected people
GDP: Gross Domestic Product per person (in AUD)
POP: Population density (number of people per square kilometres of land area)
Pop70: Percentage of population who are aged over 70
Diabetes: Percentage of people aged 20-79 who have type 1 or type 2 diabetes
Solution: Let us have a closer look at the descriptive statistics of the variable Cases and GDP.
Cases |
GDP |
||
Mean |
582932.5057 |
Mean |
23485.84714 |
Standard Error |
175785.8242 |
Standard Error |
1901.699237 |
Median |
65817.5 |
Median |
15075.20898 |
Mode |
1 |
Mode |
#N/A |
Standard Deviation |
2318774.276 |
Standard Deviation |
25085.13579 |
Sample Variance |
5.37671E+12 |
Sample Variance |
629264037.7 |
Kurtosis |
91.2485313 |
Kurtosis |
4.849364972 |
Skewness |
8.846743578 |
Skewness |
1.929834591 |
Range |
26321119 |
Range |
149069.6923 |
Minimum |
1 |
Minimum |
847.7435897 |
Maximum |
26321120 |
Maximum |
149917.4359 |
Sum |
101430256 |
Sum |
4086537.403 |
Count |
174 |
Count |
174 |
Largest(1) |
26321120 |
Largest(1) |
149917.4359 |
Smallest(1) |
1 |
Smallest(1) |
847.7435897 |
Confidence Level(95.0%) |
346961.0213 |
Confidence Level(95.0%) |
3753.519445 |
The average number of infected individuals is about 582932 being estimated with a standard error if 17586. The values of skewness and Kurtosis being very much higher than the desired range one can definitely say that the distributions will have high peaks and longer tails. For further analysis we need to work on a transformed data in order to get the reliable results. The basic variable by itself does not satisfy the Gaussian distribution. Hence a transformation will help in reducing the skewness and kurtosis value thereby making the variable satisfy the normal distributions and can be used for other statistical calculations. The range of the data is very large.
The GDP per person has an average value of 23486 being estimated with a standard deviation of 1902. The data exhibits a small amount of skewness and kurtosis. The distribution of the variable can be termed as asymptotically Normal. However a transformation can help in providing better insights for statistical analysis and techniques.
(ii) Estimate the following simple regression model of Cases on GDP:
Cases = b0 + b1GDP + u
Write down the estimated sample regression function and interpret both estimated coefficients.
Solution:
Regression Statistics |
|
Multiple R |
0.281878954 |
R Square |
0.079455745 |
Adjusted R Square |
0.073675398 |
Standard Error |
2294367.428 |
Observations |
174 |
ANOVA |
|||||
df |
SS |
MS |
F |
Significance F |
|
Regression |
1 |
7.86055E+13 |
7.86E+13 |
14.9323 |
0.000157691 |
Residual |
173 |
9.10693E+14 |
5.26E+12 |
||
Total |
174 |
9.89299E+14 |
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
|
Intercept |
0 |
#N/A |
#N/A |
#N/A |
#N/A |
#N/A |
GDP |
19.58937467 |
5.06940753 |
3.864234 |
0.000157 |
9.583523391 |
29.59523 |
Model: Cases = b0 + b1GDP + u
Cases =0+19.59GDP+Error
We observe that the linear relationship between the cases and GDP is around 7.9 or approximately 8%. The regression is significant as F(1,173)=14.932 and the p_value =0.0001<0.05. Hence, we say that the regression is significant at 5% level of significance. The model indicates that with every 1 AUD increase the number of infected cases increases by 19.5%
log(Cases) = b0 + b1 log(GDP) + u
Report your regression results in a sample regression function. Interpret the estimated coefficient of log(GDP). Provide an explanation on the sign of the slope coefficient.
Solution:
Regression Statistics |
|
Multiple R |
0.970323 |
R Square |
0.941528 |
Adjusted R Square |
0.935747 |
Standard Error |
2.675318 |
Observations |
174 |
ANOVA |
|||||
df |
SS |
MS |
F |
Significance F |
|
Regression |
1 |
19937.84 |
19937.84 |
2785.656 |
3.551E-108 |
Residual |
173 |
1238.217 |
7.157324 |
||
Total |
174 |
21176.06 |
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
|
Intercept |
0 |
#N/A |
#N/A |
#N/A |
#N/A |
#N/A |
ln(GDP) |
1.122555 |
0.021269 |
52.77932 |
1.4E-108 |
1.080575459 |
1.164535 |
This is a regression where both the GDP and Cases have been transformed. The logarithmic transformation has been used. This transformed relationship explains 94% of linear relationship among the variables. The regression is significant with F(1,173)=2786 and p_value<0.05.
Model: log(Cases) = b0 + b1 log(GDP) + u
Model : log(cases)=0+1.126* log(GDP)
We can say that the value of the intercept is zero. While, 1.126 can be termed as form of elasticity which is positive in nature. This implies with every 1% increase in GDP there is an increase of 1.126 percent in the infection cases in the linear form
however, the economic interpretation will be as follows:
With every $I AUD increase in GDP there will be an increase of exp(1.126) = 3.083298606 implying 8.3% increase in the infection cases.
log(Cases) = b0 + b1 log(GDP) + b2 log(POP) + u
Report your results in a sample regression function. Based on your estimates, how would you interpret the effect of POP on the number of cases? What can you conclude when you compare the goodness of fit of this regression model and that of the regression model in part (iii)?
Solution:
This is another kind of log-log relationship.
Regression Statistics |
|
Multiple R |
0.970465073 |
R Square |
0.941802457 |
Adjusted R Square |
0.935650146 |
Standard Error |
2.67676773 |
Observations |
174 |
ANOVA |
|||||
df |
SS |
MS |
F |
Significance F |
|
Regression |
2 |
19943.67 |
9971.833 |
1391.726 |
1.5682E-106 |
Residual |
172 |
1232.395 |
7.165085 |
||
Total |
174 |
21176.06 |
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
|
Intercept |
0 |
#N/A |
#N/A |
#N/A |
#N/A |
#N/A |
ln(GDP) |
1.065870695 |
0.066385 |
16.05583 |
4.88E-36 |
0.934835971 |
1.196905 |
ln(POP) |
0.126368468 |
0.140185 |
0.901444 |
0.368613 |
-0.150335123 |
0.403072 |
The variables all used are logarithmic in nature. The transformed variables show hig values R2. Hence the transformed variables produce a good fit for linear models. We observe that the variables GDP and POP turn out to be significant variables in estimating the cases of infection. The regression is significant at 5% level of significance as F(2,172)=1392 with p_value<0.05.
Model:
log(Cases) = b0 + b1 log(GDP) + b2 log(POP) + u
log(cases) =0+1.066* log(GDP)+ 0.127* log(POP) + u
with every 1 unit increase in POP there will be an increase of exp(0.127)= 1.13542 which implies an increase of 13.5% increase in the infected cases.
In comparison to the previous model(iii) there is not a substantial difference in R2 or adj R2 . Hence in out case there is not significant contribution due to an addition of the variable log(POP). Hence in terms of goodness of fit the previous model is able evaluate almost 94% of linear relationship.
Solution:
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
|
Intercept |
0 |
#N/A |
#N/A |
#N/A |
#N/A |
#N/A |
ln(GDP) |
1.065870695 |
0.066385 |
16.05583 |
4.88E-36 |
0.934835971 |
1.196905 |
ln(POP) |
0.126368468 |
0.140185 |
0.901444 |
0.368613 |
-0.150335123 |
0.403072 |
We see the value of log(GDP) =1.06>1. We also observe that the p_value is approximately equal to 0. Since p<0.05 we reject the null hypothesis at 5% of significance and conclude that the coefficient of log(GDP) is definitely greater than 1.
log(Cases)= b0 + b1 log(GDP) +b2 Pop70 +b3 Diabetes + u
Interpret the coefficient of Pop70. Test whether Pop70 and Diabetes are jointly significant at 5% level of significance.
Solution:
Regression Statistics |
|
Multiple R |
0.972645 |
R Square |
0.946037 |
Adjusted R Square |
0.939558 |
Standard Error |
2.585062 |
Observations |
174 |
ANOVA |
|||||
df |
SS |
MS |
F |
Significance F |
|
Regression |
3 |
20033.35 |
6677.782 |
999.2875 |
1.0722E-107 |
Residual |
171 |
1142.715 |
6.682544 |
||
Total |
174 |
21176.06 |
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
|
Intercept |
0 |
#N/A |
#N/A |
#N/A |
#N/A |
#N/A |
ln(GDP) |
1.232289 |
0.06669 |
18.47797 |
1.31E-42 |
1.100647673 |
1.363929 |
Pop70 |
0.053938 |
0.054051 |
0.997908 |
0.319734 |
-0.052754826 |
0.16063 |
Diabetes |
-0.17274 |
0.054876 |
-3.14777 |
0.001942 |
-0.281057747 |
-0.06442 |
The coefficient of Pop70 is 0.054 which is insignificant in the Model. This implies that there is is no significant contribution of the variable Pop70 in terms of producing an increase in the rate of infection.
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
|
Intercept |
2.474843133 |
1.910687616 |
1.295263084 |
0.196996 |
-1.29705 |
6.246732 |
ln(GDP) |
0.910243025 |
0.252205332 |
3.609134735 |
0.000404 |
0.412364 |
1.408122 |
Pop70 |
0.146710908 |
0.144840864 |
1.01291102 |
0.312551 |
-0.13922 |
0.432641 |
Diabetes |
-0.128867729 |
0.08948488 |
-1.440106183 |
0.151687 |
-0.30552 |
0.047784 |
pop70_diabetes |
-0.00610965 |
0.018036742 |
-0.33873358 |
0.735231 |
-0.04172 |
0.029497 |
We observe that Pop70 and Diabetes are not jointly significant because the p_value corresponding to the joint variable is 0.73>0.05. Hence they joint impact can be termed as insignificant at 5% level of significance in impacting the increasing rate of covid spread.
log(Cases)= b0 + b1 log(GDP)+ b2 log(POP)+b3 Oceania + u
Report your regression results in a sample regression function. Interpret the meaning of the coefficient for Oceania.
Solution:
Regression Statistics |
|
Multiple R |
0.978114 |
R Square |
0.956706 |
Adjusted R Square |
0.950352 |
Standard Error |
2.31546 |
Observations |
174 |
ANOVA |
|||||
df |
SS |
MS |
F |
Significance F |
|
Regression |
3 |
20259.27 |
6753.09 |
1259.587 |
7.9E-116 |
Residual |
171 |
916.7915 |
5.361354 |
||
Total |
174 |
21176.06 |
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
|
Intercept |
0 |
#N/A |
#N/A |
#N/A |
#N/A |
#N/A |
ln(GDP) |
1.146681 |
0.058383 |
19.64081 |
1.07E-45 |
1.031438 |
1.261925 |
ln(POP) |
0.012706 |
0.122164 |
0.10401 |
0.917283 |
-0.22844 |
0.25385 |
Oceania |
-6.46518 |
0.84265 |
-7.67244 |
1.23E-12 |
-8.12852 |
-4.80184 |
This linear regression model is good linear fit with 95% of linear relationship being explained. The variables GDP and Oceania are significant as the p_values<0.05. Hence these 2 variables have their contribution in predicting the infection rate
The coefficient of Oceania is -6.47 indicating that the elasticity is negative. Hence with every 1 individual added from oceania there is a decrease in the rate of infection by exp(-6.47)= 0.0015 which means an increase in rate of infection by 0.1% occurs.
Solution:
Regression Statistics |
|
Multiple R |
0.62787987 |
R Square |
0.394233131 |
Adjusted R Square |
0.383543128 |
Standard Error |
2.30696728 |
Observations |
174 |
ANOVA |
|||||
df |
SS |
MS |
F |
Significance F |
|
Regression |
3 |
588.8157 |
196.2719 |
36.87867 |
2.0737E-18 |
Residual |
170 |
904.7567 |
5.322098 |
||
Total |
173 |
1493.572 |
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
Lower 99.0% |
Upper 99.0% |
|
Intercept |
2.210725098 |
1.470129 |
1.503763 |
0.134498 |
-0.6913336 |
5.112784 |
-1.61905 |
6.040496 |
ln(GDP) |
0.945645462 |
0.145795 |
6.486126 |
9.23E-10 |
0.65784349 |
1.233447 |
0.565841 |
1.32545 |
ln(POP) |
-0.050070733 |
0.128676 |
-0.38912 |
0.697673 |
-0.3040798 |
0.203938 |
-0.38528 |
0.285138 |
Oceania |
-6.635032667 |
0.847123 |
-7.83243 |
4.94E-13 |
-8.3072683 |
-4.9628 |
-8.84184 |
-4.42823 |
This linear regression model is good linear fit with 95% of linear relationship being explained. The variables GDP and Oceania are significant as the p_values<0.01. Hence these 2 variables have their contribution in predicting the infection rate
The coefficient of Oceania is -6.64 indicating that the elasticity is negative. Hence with every 1 individual added from oceania there is a decrease in the rate of infection by exp(-6.64)= 0.001307 which means an increase in rate of infection by 0.1% occurs
log(Cases)= b0 + b1 log(GDP)+ b2 log(POP)+b3 Europe+ u
Test whether Europe has a significant effect at the 1% level of significance. What do you infer about the explanatory power of the model in part (ix) compared to the model that you estimated in part (vii)?
Solution:
Regression Statistics |
|
Multiple R |
0.971197 |
R Square |
0.943225 |
Adjusted R Square |
0.936713 |
Standard Error |
2.65158 |
Observations |
174 |
ANOVA |
|||||
df |
SS |
MS |
F |
Significance F |
|
Regression |
3 |
19973.78 |
6657.927 |
946.9552 |
8.1E-106 |
Residual |
171 |
1202.28 |
7.030879 |
||
Total |
174 |
21176.06 |
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
Lower 99.0% |
Upper 99.0% |
|
Intercept |
0 |
#N/A |
#N/A |
#N/A |
#N/A |
#N/A |
#N/A |
#N/A |
ln(GDP) |
1.02528 |
0.068623 |
14.94074 |
7.79E-33 |
0.889822 |
1.160737 |
0.846524 |
1.204035 |
ln(POP) |
0.156855 |
0.139645 |
1.123242 |
0.262909 |
-0.11879 |
0.432504 |
-0.2069 |
0.520613 |
Europe |
1.036708 |
0.500926 |
2.069582 |
0.039995 |
0.047913 |
2.025502 |
-0.26815 |
2.341562 |
The Europe variable is insignificant as the p_value =0.0399>0.01. Hence at 1% level of significance we can conclude that this particular variable has no contributing in terms of increasing or decreasing the infection rate.
In comparision to previous model if an indicidual is from Oceania there is an impact on the rate of infection. However, that is not the case if the individual is from Europe. Hence an individual being from Europe produces no impact on the rate of infection designed in this linear model.
Remember, at the center of any academic work, lies clarity and evidence. Should you need further assistance, do look up to our Economics Assignment Help
1,212,718Orders
4.9/5Rating
5,063Experts
Turnitin Report
$10.00Proofreading and Editing
$9.00Per PageConsultation with Expert
$35.00Per HourLive Session 1-on-1
$40.00Per 30 min.Quality Check
$25.00Total
FreeGet
500 Words Free
on your assignment today
Doing your Assignment with our resources is simple, take Expert assistance to ensure HD Grades. Here you Go....