Statistical Analysis EXECUTIVE SUMMARY This business report is prepared to apply statistical techniques on data set of 2017 Social Progress Index. It is mainly focused on six variables selected from categories with even number as well as countries name and continent variables. Using random selection technique, a seried of 100 countries is chosen from the list of 182 countries provided.Descriptive statistics for each of the six variables and the continents is done to understand the spread of data. It is observed that there is a wide spread of data for all variables based on the standard deviation.From the statistical analysis, it is found that the average scores for category 2 and 4 variables are within the confidence intervals.
It is also found that people in European has greater access to advanced education than African countries. It is also identified that the level of personal safety is not different from Asian to American countries. In addition, it is also observed that biodiversity and habitat has direct connection with level of environment quality within the countries.In last, overall it is observed that the level of social progress is relatively good in American and European countries than Asia and Africa.
Table of ContentsIntroduction 4Analysis 4Descriptive Statistics 5Confidence Intervals 7Hypothesis Testing 8Correlation and Regression 9Conclusion and Limitation 11References 12Appendices 13Appendix 1: Frequency Table 13Appendix 2: Summary Statistics for Water and Sanitation Variable 13Appendix 3: Summary Statistics for Personal Safety Variable 13Appendix 4: Summary Statistics for Access to Information and Communication Variable 13Appendix 5: Summary Statistics for Environment Quality Variable 14Appendix 6: Summary Statistics for Personal Freedom and Choice Variable 14Appendix 7: Summary Statistics for Access to Advanced Education 14Appendix 8: Confidence Interval for Water and Sanitation Variable 15Appendix 9: Confidence Interval for Access to Advanced Variable 15Appendix 10: First Hypothesis Test Result 15Appendix 11: Second Hypothesis Test Result 16Appendix 12: Third Hypothesis Test Result 16Appendix 13: First Regression Output 17Appendix 14: Second Regression Output 17Appendix 15: Scatterplot First Regression 18Appendix 16: Scatterplot Second Regression 18 Introduction The purpose of the report is to perform a statistical analysis in order to analyze and present the data in understandable way. For this, a sample of six variables from water and sanitation, personal safety, access to information and communication, environmental quality, personal freedom and choice, and access to advanced education categories of Social Progress Index is selected randomly from 100 countries. This report mainly includes descriptive statistics, confidence intervals, hypothesis testing, and correlation and regression tests to analyze and interpret selected data.
Statistical analysis approach is applied to analyze and present the data. In this way, regression, t-test, frequency analysis, descriptive statistics and graphical presentation are used to analyze data. The main source of data used in this analysis is “The 2017 Social Progress Index data set”. Analysis This section of report includes data analysis as well as interpretation of findings by using different statistical techniques and appropriate graphs.
Descriptive StatisticsContinent Name: A frequency table was prepared to determine the number of countries that included from Africa, America, Asia, Europe and Oceana continents in this analysis. As shown in frequency table and pie chart, the proportion of counties from is highest for followed by Africa (29%) and followed by Asian countries (24%) and least frequent in case of Oceania (5%) Rural access to improved water source from Category 2: The average (mean) of this variable is 81.96 that indicate the average score of countries for level of access to improved water source in rural area. The value of median is 91.23, which is greater than mean. It means the distribution of data set for this variable is skewed to left because tail is stretching toward the right. Standard deviation for the data set is 21.27 with a range of 91.
21. A high value of standard deviation indicates that there is greater spread in the data of this variable collected for 100 countries. In this way, line chart also exhibits variation in the values of countries in relative to this variable (Rumsey, 2016).Traffic deaths from Category 4: In case of personal safety category, it is found that the average score of countries in the sample is 18.20. The value of mean is higher than median that indicate right skewed distribution of the data.
It is also found that there is limited spread in the data set because standard deviation is 10.59, which is not a high value. The range of this data is 70.
50 that mean maximum and minimum scores of countries have this difference (Motulsky, 2014). Column chart illustrates that the score of the most of the counties for number of traffic deaths is within the range except Mongolia which is very high value.Press freedom index from Category 6: As shown statistic table, mean of the data is 37.41, which is more than median.
It reflects that data is distributed with right skewed. The values of variance and standard deviation are 305.17 and 17.47 respectively. Standard deviation is relatively high that means values of the countries for this variable are highly spread (Webber, 2011). There is a wide disparity that can be seen for the press freedom index for countries Greenhouse gas emissions from Category 8: In case of environment quality, it is identified that sample data has a wide spread. The average of the variable is 723.92, while standard deviation is 1170.
32. In addition, it is also observed that median is lower than mean. It means the data of this variable has distribution with left skewed. For this variable, a column graph was prepared to make visual presentation of observed values for countries. The graph has a wide spread esp. high values for Guinea.Corruption from Category 10: In concern to this variable, it is identified that the average score is 39.54 and it is greater than median.
It is also found that standard deviation is 17.99, which is also high A higher standard deviation value indicates greater spread in the data of corruption variableGlobally ranked universities from Category 12: As per summary statistics, the values of mean and median are 9.78 and 0.00.
This result implies that data is skewed to left as its mean is more than median value. It is also observed that there is greater variation in the values of data set as standard deviation is higher (Rumsey, 2016). Column chart also indicates differences in the scores of countries in relative to globally ranked universities.
Confidence IntervalsCategory 2 – Water and Sanitation: We are 95% confident that the average of “Rural access to improved water source” variable takes between 77.70 and 86.21. While, there is only 5% chance that mean of this variable is not between these two values. But, 95% is pretty good probability, so we can say that the mean score takes all countries (100 countries selected randomly) in the world to provide access to rural population to improved water sources is between 77.70 and 86.21(Gerstman, 2014).
Category 12 – Access to Advanced Education: Under this category, Inequality in the attainment of education variable is selected for statistical analysis. In context of this variable, we are 95% confident that population mean lies between 5.20 and 14.
36. This result implies that the average score of countries from Asia, Africa, America, Europe and Oceana in case of contact to advanced education takes between these two values. Based on this, it can say that 9.78 is the true mean of this variable as it contains by this 95% of the time confidence intervals (Webber, 2011). Hypothesis TestingFirst Hypothesis Test: A hypothesis test (called t-test) was performed on a statement that “Inequality in the Globally ranked universities is same among European and African countries”. This is the case of two-tail hypothesis.
According to test statistics, degree of freedom for this analysis is 50, which aggregate size of sample is 52 observations. It is also identified that the test statistic is 3.655, while the critical t-value is 1.697. This result implies that test statistic is greater than the critical value, so we can reject the null hypothesis. The result of p-value is also supported this finding as it is zero and less than 0.05 (Motulsky, 2014). It also provides strong evidence that hypothesis is not true, so we conclude a difference in Globally ranked universities between African and European nations.
Second Hypothesis Test: Another t-test was performed on the level of Perceived criminality between American and Asian countries. Here, level of Traffic deaths variable was used as sample for this test. Test output shows that t-statistic is 1.
607 while t-critical value for two-tailed test is 2.060. It is also identified that p-value is 0.293 and degree of freedom is 40.
Based on these results, we can declare that results are not statistically significant and we can accept the null hypothesis. It is because test statistic is less than critical value. This result is enough to prove accuracy of hypothesis, so we accept that there is no significant difference among Asian and American countries in terms of personal safety of people (Rumsey, 2016).
Third Hypothesis Test: In case of this two-tail hypothesis test, we found that test statistic is greater than critical value and p-value is less than alpha (0.017