# Assignment: Descriptive Statistics Analysis

## Assignment: Descriptive Statistics Analysis

Assignment: Descriptive Statistics Analysis ORDER NOW FOR CUSTOMIZED AND ORIGINAL ESSAY PAPERS ON Assignment: Descriptive Statistics Analysis Descriptive Statistics Analysis Describe the Sun Coast data using the descriptive statistics tools discussed in the unit lesson. Establish whether assumptions are met to use parametric statistical procedures. Repeat the tasks below for each tab in the Sun Coast research study data set. Utilize the Unit IV Scholarly Activity template here . You will utilize Microsoft Excel ToolPak. The links to the ToolPak are here in the Course Project Guidance document. Assignment: Descriptive Statistics Analysis Here are some of the items you will cover. Produce a frequency distribution table and histogram. Generate descriptive statistics table, including measures of central tendency (mean, median, and mode), kurtosis, and skewness. Describe the dependent variable measurement scale as nominal, ordinal, interval, or ratio. Analyze, evaluate, and discuss the above descriptive statistics in relation to assumptions required for parametric testing. Confirm whether the assumptions are met or are not met. The title and reference pages do not count toward the page requirement for this assignment. This assignment should be no less than five pages in length, follow APA-style formatting and guidelines, and use references and citations as necessary. courseprojectguidanceandunitvii_template.docx unitiv_template.docx study_guide.docx Course Learning Outcomes for Unit IV Upon completion of this unit, students should be able to: 6. Differentiate between various research-based tools commonly used in businesses. 6.1 Describe various forms of descriptive statistics, including frequency distribution tables, histograms, descriptive statistics tables, Kolmogorov-Smirnov tests, measurement scales, and measures of central tendency. 7. Test data for a business research project. 7.1 Establish whether assumptions are met to use parametric statistical procedures by applying descriptive statistics. UNIT IV STUDY GUIDE Data Analysis: Descriptive Statistics Course/Unit Learning Outcomes Learning Activity 6.1 Unit Lesson Video: Kolmogorov-Smirnov Test of Normality in Excel Video: Parametric and Nonparametric Statistical Tests Video: Checking that Data Is Normally Distributed Using Excel Video: 3. Choosing Between Parametric & Non-Parametric Tests Article: Difference Between Parametric and Nonparametric Article: Deciphering the Dilemma of Parametric and Nonparametric Tests Unit IV Scholarly Activity 7.1 Unit Lesson Unit IV Scholarly Activity Reading Assignment In order to access the following resources, click the links below: Fields, H. (2018). Difference between parametric and nonparametric. Retrieved from Difference Between Parametric and Nonparametric Dominguez, V. (2016, April 16). Make a histogram using Excels histogram tool in the Data Analysis ToolPak [Video file]. Retrieved from https://www.youtube.com/watch?v=xekiDJzajYk Click here for a transcript of the video. Grande, T. (2017, August 19). Kolmogorov-Smirnov test of normality in Excel [Video file]. Retrieved from Click here for a transcript of the video. Grande, T. (2015, July 30). Parametric and nonparametric statistical tests [Video file]. Retrieved from https://www.youtube.com/watch?v=pWEWHKnwg_0 Click here for a transcript of the video. MBA 5652, Research Methods 1 Macarty, M. (2015, September 21). Get descriptive statistics in Excel with DataUNAInTalxysSisTUTDooYlpGaUkI[DVEideo file]. Retrieved from https://www.youtube.com/watch?v=h-RzBhBzJOQ Click here for a transcript of the video. Title Oxford Academic (Oxford University Press). (2016, November 17). Checking that data is normally distributed using Excel [Video file]. Retrieved from https://www.youtube.com/watch?v=EG8AF2B_dps Click here for a transcript of the video. Rana, R., Singhal, R., & Dua, P. (2016). Deciphering the dilemma of parametric and nonparametric tests. Journal of the Practice of Cardiovascular Sciences, 2(2), 95. Retrieved from http://link.galegroup.com.libraryresources.columbiasouthern.edu/apps/doc/A488649197/AONE?u=ora n95108&sid=AONE&xid=c54eaf34 The Roslin Institute Training. (2016, May 9). 3. Choosing between parametric & non-parametric tests [Video file]. Retrieved from https://www.youtube.com/watch?v=_1mH6CnXKfM Click here for a transcript of the video. Unit Lesson Data Analysis: Descriptive Statistics The course is now entering the data analysis stage of research design. This is where the methodological fork in the road goes decisively down the quantitative path. The first topic of discussion under data analysis will be what is referred to as descriptive statistics. As the name suggests, the researcher describes the data that are collected. During this stage, the data are described both visually and statistically. Data may be visually displayed to reveal distribution of data, trends, anomalies, outliers, etc. Visual displays of data may take the form of graphs, histograms, tables, plots, and other diagrams. This stage is done before any statistical procedures are used to test the research hypotheses. This begs the question of why the researcher should not simply jump in and immediately start testing their hypotheses using statistical analysis. The following explains the importance of descriptive statistics to test data to ensure assumptions are met before using a parametric test. MBA 5652, Research Methods 2 Assumptions: The Importance of Describing Data UNIT x STUDY GUIDE Title There are various benefits of describing the data. One of the most important benefits is to determine if the data meet the assumptions that are required for the use of parametric statistical procedures. Parametric procedures include, but are not limited to, correlation, regression, t test, and ANOVA. Parametric tests have different assumptions that must be met depending on which test is being considered, but most parametric tests require that the assumption of normality be met. Normality refers to a normal distribution of data which, when graphed as frequencies, resembles a bell shape (as in the image to the right). Other common assumptions that must be met, depending on the statistical procedure used, 60 include sample size, levels- of-measurement, homogeneity of variance, independence, absence of outliers, linearity, etc. (Field, 2005). It is critical that the researcher understands the 30 assumptions for any parametric statistical procedure being considered to determine if they are met before employing the 10 procedure in a research study. An Internet search for any parametric test will quickly return results that list required assumptions. Bell Curve 80 70 50 40 20 10 20 30 40 50 60 70 80 90 100 Normal distribution graph with a bell curve If the assumptions are not met, parametric statistical procedures cannot be used. To use them would result in invalid results. Fortunately, there are corresponding non-parametric tests that can be used when the data do not meet assumptions for parametric tests. Non-parametric tests also have assumptions that must be met, but they are fewer and less rigid. An example of a parametric procedure for correlation would be Pearsons correlation coefficient (Pearsons r), while a corresponding non-parametric test for correlation would be Spearmans rank correlation coefficient (Spearmans rho). An example of a causal-comparative parametric procedure would be ANOVA, while a corresponding non-parametric causal-comparative test would be Kruskal-Wallis. Since non-parametric tests do not require that as many assumptions are met, some students wonder why non-parametric tests are not always used. The reason is that parametric tests are superior to and more powerful than non-parametric tests and should be used if the assumptions are met. A parametric test is more likely to find a true effect when one exists, therefore rejecting the null hypothesis, than a non-parametric test (Norusis, 2008). In other words, a parametric test is less likely to commit a Type II error. Norusis (2008) recommends that researchers conduct both parametric and non-parametric tests if they are unsure as to which is most appropriate to use. If the test results are the same, there is nothing more to worry about. If the test results are statistically significant for the parametric test, and non-significant for the non-parametric test, the researcher should take a closer look at whether the assumptions were met or not. Assumption of Normality Assumptions are evaluated both visually and statistically. As mentioned previously, a normal distribution of data is the most commonly required assumption for parametric statistical tests. The following will explain how the assumption of normality can be described and tested. A normal distribution of data exhibits the characteristics of a bell-shaped curve, as shown below. In a perfect normal curve, the frequency distribution is symmetrical about the center; the mean, median, and mode are all MBA 5652, Research Methods 3 equal; and the tails of the curve approach but do not touch the x-axis (Salkind,U2N0I0T9x).STThUesDeYaGreUaIDllE preliminary indicators that a curve may represent a normal distribution, but there are additional factors to consider. Distribution curves can be short and wide, tall and thin, and anywhere in between. As shown below, each of the colored bell-shaped curves has a mean (?) of zero. Their standard deviations (?), however, or the measure of how widely the data disperses around the mean, are different for each curve. The orange curve has a relatively small standard deviation because the data is closely clustered around the mean. The red curve has a relatively large standard deviation because the data is loosely clustered around the mean. Distribution curves Title Kurtosis describes the tallness of the curves. A platykurtic curve is short and squatty (think plateau), which, as shown at the right in the red curve, represents a relatively greater number of scores in the tails of the curves. A leptokurtic curve is tall and thin (think leapt for the sky), which, as shown in the orange curve, represents a data distribution of relatively fewer number of scores in the tails (Field, 2005). Platykurtic and leptokurtic curves can challenge the assumption of normality, even when the curve is bell-shaped. The data may also be asymmetrical with the data more heavily distributed to one side of the curve or the other. When the data distribution curve is asymmetrical, it is referred to as skewness. Below are examples of negative skewness and positive skewness. Like platykurtic and leptokurtic curves, those exhibiting skewness also threaten the assumption of normality. The assumption of normality can be evaluated visually by describing the frequency of responses in a data set. The frequency table below shows the results of a 120-point safety test administered to 500 employees. For example, two employees scored in the test range of 5054, 90 employees scored in the range of 8589, and three employees scored in the range of 110114. Left-skewed and right-skewed graphs Assignment: Descriptive Statistics Analysis (Sundberg, 2014) MBA 5652, Research Methods 4 When the frequency data is plotted in a histogram, the curve of the data can be observed. To create a histogram, the data values (test score ranges) from the data set are plotted on the x-axis, and the frequency of the values are plotted on the y-axis. So, using the same example from the discussion of the frequency table, it can be seen in the histogram that two employees scored in the test range of 5054, 90 employees scored in the range of 8589, and three employees scored in the range of 110114. By observing the histogram below, it appears the data are approximately normally distributed, and there are no visible outliers. While there is no skewness observed, the kurtosis favors a leptokurtic curve. Skewness and kurtosis can be confirmed by generating descriptive statistics, which is a routine function in statistical packages, including Excel Data Analysis Toolpak. There is a lot of debate regarding acceptable levels of skewness and kurtosis among researchers. George and Mallery (2010) suggest skewness and Kurtosis scores between -2 and +2 as satisfactory results to accept normal distribution. All researchers agree that the closer skewness and kurtosis are to 0, the better. The more kurtosis and skewness deviate from 0, the greater the chances that the data is not normally distributed (Field, 2005). As shown in the descriptive statistics table, both skewness and kurtosis are both relatively close to 0. UNIT x STUDY GUIDE Title It should also be noted that the mean, median, and mode are similar in the descriptive data table below. As noted above, the mean, median, and mode are identical in a perfect distribution. The data presented here would suggest that it is approximately a normal distribution of data. MBA 5652, Research Methods 5 Descriptive Statistics UNIT x STUDY GUIDE Title Mean 80.546 Standard Error 0.446621439 Standard Deviation 9.986758969 Sample Variance 99.73535471 Range 64 Minimum 53 Maximum 117 Sum 40273 Count 500 Largest(1) 117 Smallest(1) 53 The frequency distribution should also be observed for outliers. Outliers are extreme scores far away from the mean in the left or right tails of the curve. Outliers can bias the mean due to their extreme scores. There are different recommendations for how to treat outliers, such as removing the outlier from the data set, but the ramifications should be understood before taking any such action. This is an example where consulting the literature is strongly recommended. Finally, normality can be tested statistically. Several tests can be used to objectively test for normality including Kolmogorov-Smirnov, Shapiro-Wilk, chi-square, Jarque-Bera, Anderson-Darling, and others. Each test has advantages and disadvantages. Once again, this is where the researcher is well-served to consult the literature to determine the most appropriate test for his or her project. Assignment: Descriptive Statistics Analysis The Kolmogorov-Smirnov (KS) test is often used to test for normality. KS compares the frequency distribution of the sample data set to a model of normally distributed data with the same mean and distribution as the sample data. The KS test is performed to test a null and alternative hypothesis, like any other statistical test. The following are the hypotheses. Ho1: There is no statistically significant difference in normality between the sample data and model data. Ha1: There is a statistically significant difference in normality between the sample data and model data. If the results are statistically significant at a p level < .05, the null hypothesis is rejected, and the alternative hypothesis is accepted that there is a statistically significant difference in normality between the sample data and model data. Therefore, we would conclude that the assumption of normality is not met, and a non- parametric test would be required to test our data. If the results are not statistically significant at a p level > .05, the null hypothesis is accepted (and the alternative rejected) that there is no statistically significant difference in normality between the sample data and model data. Therefore, we would conclude that the assumption of normality is met, and a parametric test would be acceptable to test our data. It is important to note that the above steps for evaluating the assumption of normality require a holistic view. No single description of the data is sufficient to make a decision about normality. For example, the KS test is sensitive to small changes in normality for large sample sizes. The result is that it can be prone to Type I errors. Therefore, the researcher should consider all the available information, both visual inspection and statistical analysis, before making a decision about normality (Field, 2005). If, after following the steps above, 81 75 Median Mode 0.095314585 0.065078019 Kurtosis Skewness MBA 5652, Research Methods 6 the assumption of normality does not appear to be met, non-parametric statistUicNalITprxocSeTdUuDreYs GshUoIuDldE be considered in lieu of parametric tests. Assumptions Other Than Normality Title There are two additional assumptions that should be met for any statistical test. They are measurement scales and measures of central tendency. Measurement scales: Statistical procedures used to test hypotheses have unique assumptions about the scales on which the data are measured. Data are measured on nominal, ordinal, interval, or ratio scales. It is important to determine the assumption of measurement scales for any statistical procedure being considered to test the data. For example, an assumption of Pearsons r is that data be measured at the interval or ratio level. Pearsons r could not be used to analyze ordinal data. The non-parametric test, Spearmans rho, would be required to analyze ordinal data for correlation. Rules for Measurement Scales Nominal: Nominal data can be classified but not ordered and have no meaningful distance between variables or unique origin (true zero). This is also referred to as categorical data. Examples include names or categories, like gender and marital status. Examples of statistical procedures that use nominal data include chi-square (Cooper & Schindler, 2014). Ordinal: Ordinal data can be classified and ordered but have no meaningful distance between data values or unique origin (true zero). Examples include surveys with responses ranked on a five-point Likert scale, such as strongly agree to strongly disagree. Examples of statistical procedures that use ordinal data include Spearmans rho, Mann-Whitney test, Wilcoxon test, Kruskal-Wallis test, and Friedman test (Cooper & Schindler, 2014). Interval: Interval data can be classified and ordered and have meaningful distance between data values but no unique origin (true zero). A classic example of an interval level of measurement is temperature measured in degrees. The data is ordered, there are differences between measures, but there is no true zero. Since there is no true zero, it would be improper to say 40 degrees is twice as cold as 20 degrees. Examples of statistical procedures that use interval data include Pearsons r, regression analysis, t test, and ANOVA (Cooper & Schindler, 2014). Ratio: Ratio data can be classified and ordered, have meaningful distance between data values, and have unique origin (true zero). Examples include age in years and income in dollars. Examples of statistical procedures that use ratio data include Pearsons r, regression analysis, t test, and ANOVA (Cooper & Schindler, 2014). It should be noted that parametric tests are used to analyze data measure at the interval and ratio levels but cannot be used to analyze data measured at the nominal and ordinal levels. Measures of central tendency: It may have become evident by now, from the use of the histogram and the discussion of normality, that there is interest in how the data points are dispersed around the mid-point of the curve. This is called central tendency and is the foundation for statistical analysis using linear models. In short, our statistical procedures evaluate how much our data vary from that midpoint when a straight line is fit to the data (Field, 2005). The important takeaway is that the central tendency of that midpoint can be measured in three different ways: a) mean, b) median, and c) mode. As was seen in the descriptive statistics output above, mean, median, and mode are usually included in descriptive statistics generated by software. As was the case with normality and levels of measurement, it is important to determine the assumption of central tendency for any statistical procedure being considered to test the data. Assignment: Descriptive Statistics Analysis Mean: The arithmetic mean is the most commonly used measure of central tendency. It is calculated by adding the data scores and dividing by the number of cases. The mean is the measure of central tendency used with interval and ratio data and is used for statistical procedures like correlation, regression analysis, t test, and ANOVA (Salkind, 2009). Median: The median is the score among the distribution of data, when ordered from highest to lowest, where half of the data points occur above the median and half of the data points occur below the median. In the data MBA 5652, Research Methods 7 set 1, 3, 5, 7, and 9, the median would be 5 since half of the values occur aboUveNaITndx hSaTlfUbDeYloGw.UTIDheE median is the measure of central tendency used with ordinal data (Salkind, 2009). Mode: The mode is the data value that occurs most frequently in the data set, regardless of order. In a data set of 5, 5, 5, 3, 3, 9, 9, 9, 9, 1, 1, 1, 7, 7, 7, 7, 7, the mode would be 7 because it is the value that occurs most frequently in the data set. The mode is the measure of central tendency used with nominal levels of measurement (Salkind, 2009). In ClosingA Word About Validity and Reliability Although some of the most important and common assumptions of statistical testing have been discussed in this lesson, there are still more. This may seem like a very taxing and laborious process to partake in before even getting to the point of testing the research hypotheses. It is absolutely critical that researchers ensure assumptions are met to have certainty that their results reflect the integrity of validity and reliability. To be able to make confident decisions using research, the statistical results must be both valid and reliable. Validity means that the statistical procedure measures what was intended to be measured. As was discussed about normality, if a parametric statistical procedure is used for a data set that lacks a normal distribution of data, the results will be invalid. Reliability refers to repeatability. If a second research study was conducted by replicating the conditions of the original research study (e.g., sampling, data collection, levels of measurement, statistical test, etc.), the results should be similar if the original research results were reliable. It should also be noted that research results can be reliable but not valid. It is conceivable that a research study could be replicated multiple times and reliably generate the same invalid results each time. A classic example is the broken bathroom scale. Assume a persons actual weight is 150 pounds. Each morning, for a week they step on the bathroom scale and the reading is 145 pounds. The measurement is invalid because, due to calibration problems, the measurement is incorrect. The test, however, is reliable because the same result was replicated each day. For research results to have integrity, they must be both valid and reliable. References Cooper, D. R., & Schindler, P. S. (2014). Business research methods (12th ed.). New York, NY: McGraw- Hill/Irwin. Field, A. (2005). Discovering stats using SPSS (2nd ed.). London, England: Sage. George, D., & Mallery, P. (2010). SPSS for Windows step by step: A simple guide and reference, 17.0 update (10th ed.). Boston, MA: Pearson. Norusis, M. J. (2008). SPSS 16.0 guide to data analysis (2nd ed.). Upper Saddle River, NJ: Prentice Hall. Salkind, N. J. (2009). Exploring research (7th ed.). Upper Saddle River, NJ: Pearson. Sundberg, S. (2014). Skewed distribution: Definition, examples [Image]. Retrieved from http://www.statisticshowto.com/probability-and-statistics/skewed-distribution/ Title MBA 5652, Research Methods 8 Assignment: Descriptive Statistics Analysis Get a 10 % discount on an order above $ 100 Use the following coupon code : NURSING10

## Assignment: Descriptive Statistics Analysis

Assignment: Descriptive Statistics Analysis ORDER NOW FOR CUSTOMIZED AND ORIGINAL ESSAY PAPERS ON Assignment: Descriptive Statistics Analysis Descriptive Statistics Analysis Describe the Sun Coast data using the descriptive statistics tools discussed in the unit lesson. Establish whether assumptions are met to use parametric statistical procedures. Repeat the tasks below for each tab in the Sun Coast research study data set. Utilize the Unit IV Scholarly Activity template here . You will utilize Microsoft Excel ToolPak. The links to the ToolPak are here in the Course Project Guidance document. Assignment: Descriptive Statistics Analysis Here are some of the items you will cover. Produce a frequency distribution table and histogram. Generate descriptive statistics table, including measures of central tendency (mean, median, and mode), kurtosis, and skewness. Describe the dependent variable measurement scale as nominal, ordinal, interval, or ratio. Analyze, evaluate, and discuss the above descriptive statistics in relation to assumptions required for parametric testing. Confirm whether the assumptions are met or are not met. The title and reference pages do not count toward the page requirement for this assignment. This assignment should be no less than five pages in length, follow APA-style formatting and guidelines, and use references and citations as necessary. courseprojectguidanceandunitvii_template.docx unitiv_template.docx study_guide.docx Course Learning Outcomes for Unit IV Upon completion of this unit, students should be able to: 6. Differentiate between various research-based tools commonly used in businesses. 6.1 Describe various forms of descriptive statistics, including frequency distribution tables, histograms, descriptive statistics tables, Kolmogorov-Smirnov tests, measurement scales, and measures of central tendency. 7. Test data for a business research project. 7.1 Establish whether assumptions are met to use parametric statistical procedures by applying descriptive statistics. UNIT IV STUDY GUIDE Data Analysis: Descriptive Statistics Course/Unit Learning Outcomes Learning Activity 6.1 Unit Lesson Video: Kolmogorov-Smirnov Test of Normality in Excel Video: Parametric and Nonparametric Statistical Tests Video: Checking that Data Is Normally Distributed Using Excel Video: 3. Choosing Between Parametric & Non-Parametric Tests Article: Difference Between Parametric and Nonparametric Article: Deciphering the Dilemma of Parametric and Nonparametric Tests Unit IV Scholarly Activity 7.1 Unit Lesson Unit IV Scholarly Activity Reading Assignment In order to access the following resources, click the links below: Fields, H. (2018). Difference between parametric and nonparametric. Retrieved from Difference Between Parametric and Nonparametric Dominguez, V. (2016, April 16). Make a histogram using Excels histogram tool in the Data Analysis ToolPak [Video file]. Retrieved from https://www.youtube.com/watch?v=xekiDJzajYk Click here for a transcript of the video. Grande, T. (2017, August 19). Kolmogorov-Smirnov test of normality in Excel [Video file]. Retrieved from Click here for a transcript of the video. Grande, T. (2015, July 30). Parametric and nonparametric statistical tests [Video file]. Retrieved from https://www.youtube.com/watch?v=pWEWHKnwg_0 Click here for a transcript of the video. MBA 5652, Research Methods 1 Macarty, M. (2015, September 21). Get descriptive statistics in Excel with DataUNAInTalxysSisTUTDooYlpGaUkI[DVEideo file]. Retrieved from https://www.youtube.com/watch?v=h-RzBhBzJOQ Click here for a transcript of the video. Title Oxford Academic (Oxford University Press). (2016, November 17). Checking that data is normally distributed using Excel [Video file]. Retrieved from https://www.youtube.com/watch?v=EG8AF2B_dps Click here for a transcript of the video. Rana, R., Singhal, R., & Dua, P. (2016). Deciphering the dilemma of parametric and nonparametric tests. Journal of the Practice of Cardiovascular Sciences, 2(2), 95. Retrieved from http://link.galegroup.com.libraryresources.columbiasouthern.edu/apps/doc/A488649197/AONE?u=ora n95108&sid=AONE&xid=c54eaf34 The Roslin Institute Training. (2016, May 9). 3. Choosing between parametric & non-parametric tests [Video file]. Retrieved from https://www.youtube.com/watch?v=_1mH6CnXKfM Click here for a transcript of the video. Unit Lesson Data Analysis: Descriptive Statistics The course is now entering the data analysis stage of research design. This is where the methodological fork in the road goes decisively down the quantitative path. The first topic of discussion under data analysis will be what is referred to as descriptive statistics. As the name suggests, the researcher describes the data that are collected. During this stage, the data are described both visually and statistically. Data may be visually displayed to reveal distribution of data, trends, anomalies, outliers, etc. Visual displays of data may take the form of graphs, histograms, tables, plots, and other diagrams. This stage is done before any statistical procedures are used to test the research hypotheses. This begs the question of why the researcher should not simply jump in and immediately start testing their hypotheses using statistical analysis. The following explains the importance of descriptive statistics to test data to ensure assumptions are met before using a parametric test. MBA 5652, Research Methods 2 Assumptions: The Importance of Describing Data UNIT x STUDY GUIDE Title There are various benefits of describing the data. One of the most important benefits is to determine if the data meet the assumptions that are required for the use of parametric statistical procedures. Parametric procedures include, but are not limited to, correlation, regression, t test, and ANOVA. Parametric tests have different assumptions that must be met depending on which test is being considered, but most parametric tests require that the assumption of normality be met. Normality refers to a normal distribution of data which, when graphed as frequencies, resembles a bell shape (as in the image to the right). Other common assumptions that must be met, depending on the statistical procedure used, 60 include sample size, levels- of-measurement, homogeneity of variance, independence, absence of outliers, linearity, etc. (Field, 2005). It is critical that the researcher understands the 30 assumptions for any parametric statistical procedure being considered to determine if they are met before employing the 10 procedure in a research study. An Internet search for any parametric test will quickly return results that list required assumptions. Bell Curve 80 70 50 40 20 10 20 30 40 50 60 70 80 90 100 Normal distribution graph with a bell curve If the assumptions are not met, parametric statistical procedures cannot be used. To use them would result in invalid results. Fortunately, there are corresponding non-parametric tests that can be used when the data do not meet assumptions for parametric tests. Non-parametric tests also have assumptions that must be met, but they are fewer and less rigid. An example of a parametric procedure for correlation would be Pearsons correlation coefficient (Pearsons r), while a corresponding non-parametric test for correlation would be Spearmans rank correlation coefficient (Spearmans rho). An example of a causal-comparative parametric procedure would be ANOVA, while a corresponding non-parametric causal-comparative test would be Kruskal-Wallis. Since non-parametric tests do not require that as many assumptions are met, some students wonder why non-parametric tests are not always used. The reason is that parametric tests are superior to and more powerful than non-parametric tests and should be used if the assumptions are met. A parametric test is more likely to find a true effect when one exists, therefore rejecting the null hypothesis, than a non-parametric test (Norusis, 2008). In other words, a parametric test is less likely to commit a Type II error. Norusis (2008) recommends that researchers conduct both parametric and non-parametric tests if they are unsure as to which is most appropriate to use. If the test results are the same, there is nothing more to worry about. If the test results are statistically significant for the parametric test, and non-significant for the non-parametric test, the researcher should take a closer look at whether the assumptions were met or not. Assumption of Normality Assumptions are evaluated both visually and statistically. As mentioned previously, a normal distribution of data is the most commonly required assumption for parametric statistical tests. The following will explain how the assumption of normality can be described and tested. A normal distribution of data exhibits the characteristics of a bell-shaped curve, as shown below. In a perfect normal curve, the frequency distribution is symmetrical about the center; the mean, median, and mode are all MBA 5652, Research Methods 3 equal; and the tails of the curve approach but do not touch the x-axis (Salkind,U2N0I0T9x).STThUesDeYaGreUaIDllE preliminary indicators that a curve may represent a normal distribution, but there are additional factors to consider. Distribution curves can be short and wide, tall and thin, and anywhere in between. As shown below, each of the colored bell-shaped curves has a mean (?) of zero. Their standard deviations (?), however, or the measure of how widely the data disperses around the mean, are different for each curve. The orange curve has a relatively small standard deviation because the data is closely clustered around the mean. The red curve has a relatively large standard deviation because the data is loosely clustered around the mean. Distribution curves Title Kurtosis describes the tallness of the curves. A platykurtic curve is short and squatty (think plateau), which, as shown at the right in the red curve, represents a relatively greater number of scores in the tails of the curves. A leptokurtic curve is tall and thin (think leapt for the sky), which, as shown in the orange curve, represents a data distribution of relatively fewer number of scores in the tails (Field, 2005). Platykurtic and leptokurtic curves can challenge the assumption of normality, even when the curve is bell-shaped. The data may also be asymmetrical with the data more heavily distributed to one side of the curve or the other. When the data distribution curve is asymmetrical, it is referred to as skewness. Below are examples of negative skewness and positive skewness. Like platykurtic and leptokurtic curves, those exhibiting skewness also threaten the assumption of normality. The assumption of normality can be evaluated visually by describing the frequency of responses in a data set. The frequency table below shows the results of a 120-point safety test administered to 500 employees. For example, two employees scored in the test range of 5054, 90 employees scored in the range of 8589, and three employees scored in the range of 110114. Left-skewed and right-skewed graphs Assignment: Descriptive Statistics Analysis (Sundberg, 2014) MBA 5652, Research Methods 4 When the frequency data is plotted in a histogram, the curve of the data can be observed. To create a histogram, the data values (test score ranges) from the data set are plotted on the x-axis, and the frequency of the values are plotted on the y-axis. So, using the same example from the discussion of the frequency table, it can be seen in the histogram that two employees scored in the test range of 5054, 90 employees scored in the range of 8589, and three employees scored in the range of 110114. By observing the histogram below, it appears the data are approximately normally distributed, and there are no visible outliers. While there is no skewness observed, the kurtosis favors a leptokurtic curve. Skewness and kurtosis can be confirmed by generating descriptive statistics, which is a routine function in statistical packages, including Excel Data Analysis Toolpak. There is a lot of debate regarding acceptable levels of skewness and kurtosis among researchers. George and Mallery (2010) suggest skewness and Kurtosis scores between -2 and +2 as satisfactory results to accept normal distribution. All researchers agree that the closer skewness and kurtosis are to 0, the better. The more kurtosis and skewness deviate from 0, the greater the chances that the data is not normally distributed (Field, 2005). As shown in the descriptive statistics table, both skewness and kurtosis are both relatively close to 0. UNIT x STUDY GUIDE Title It should also be noted that the mean, median, and mode are similar in the descriptive data table below. As noted above, the mean, median, and mode are identical in a perfect distribution. The data presented here would suggest that it is approximately a normal distribution of data. MBA 5652, Research Methods 5 Descriptive Statistics UNIT x STUDY GUIDE Title Mean 80.546 Standard Error 0.446621439 Standard Deviation 9.986758969 Sample Variance 99.73535471 Range 64 Minimum 53 Maximum 117 Sum 40273 Count 500 Largest(1) 117 Smallest(1) 53 The frequency distribution should also be observed for outliers. Outliers are extreme scores far away from the mean in the left or right tails of the curve. Outliers can bias the mean due to their extreme scores. There are different recommendations for how to treat outliers, such as removing the outlier from the data set, but the ramifications should be understood before taking any such action. This is an example where consulting the literature is strongly recommended. Finally, normality can be tested statistically. Several tests can be used to objectively test for normality including Kolmogorov-Smirnov, Shapiro-Wilk, chi-square, Jarque-Bera, Anderson-Darling, and others. Each test has advantages and disadvantages. Once again, this is where the researcher is well-served to consult the literature to determine the most appropriate test for his or her project. Assignment: Descriptive Statistics Analysis The Kolmogorov-Smirnov (KS) test is often used to test for normality. KS compares the frequency distribution of the sample data set to a model of normally distributed data with the same mean and distribution as the sample data. The KS test is performed to test a null and alternative hypothesis, like any other statistical test. The following are the hypotheses. Ho1: There is no statistically significant difference in normality between the sample data and model data. Ha1: There is a statistically significant difference in normality between the sample data and model data. If the results are statistically significant at a p level < .05, the null hypothesis is rejected, and the alternative hypothesis is accepted that there is a statistically significant difference in normality between the sample data and model data. Therefore, we would conclude that the assumption of normality is not met, and a non- parametric test would be required to test our data. If the results are not statistically significant at a p level > .05, the null hypothesis is accepted (and the alternative rejected) that there is no statistically significant difference in normality between the sample data and model data. Therefore, we would conclude that the assumption of normality is met, and a parametric test would be acceptable to test our data. It is important to note that the above steps for evaluating the assumption of normality require a holistic view. No single description of the data is sufficient to make a decision about normality. For example, the KS test is sensitive to small changes in normality for large sample sizes. The result is that it can be prone to Type I errors. Therefore, the researcher should consider all the available information, both visual inspection and statistical analysis, before making a decision about normality (Field, 2005). If, after following the steps above, 81 75 Median Mode 0.095314585 0.065078019 Kurtosis Skewness MBA 5652, Research Methods 6 the assumption of normality does not appear to be met, non-parametric statistUicNalITprxocSeTdUuDreYs GshUoIuDldE be considered in lieu of parametric tests. Assumptions Other Than Normality Title There are two additional assumptions that should be met for any statistical test. They are measurement scales and measures of central tendency. Measurement scales: Statistical procedures used to test hypotheses have unique assumptions about the scales on which the data are measured. Data are measured on nominal, ordinal, interval, or ratio scales. It is important to determine the assumption of measurement scales for any statistical procedure being considered to test the data. For example, an assumption of Pearsons r is that data be measured at the interval or ratio level. Pearsons r could not be used to analyze ordinal data. The non-parametric test, Spearmans rho, would be required to analyze ordinal data for correlation. Rules for Measurement Scales Nominal: Nominal data can be classified but not ordered and have no meaningful distance between variables or unique origin (true zero). This is also referred to as categorical data. Examples include names or categories, like gender and marital status. Examples of statistical procedures that use nominal data include chi-square (Cooper & Schindler, 2014). Ordinal: Ordinal data can be classified and ordered but have no meaningful distance between data values or unique origin (true zero). Examples include surveys with responses ranked on a five-point Likert scale, such as strongly agree to strongly disagree. Examples of statistical procedures that use ordinal data include Spearmans rho, Mann-Whitney test, Wilcoxon test, Kruskal-Wallis test, and Friedman test (Cooper & Schindler, 2014). Interval: Interval data can be classified and ordered and have meaningful distance between data values but no unique origin (true zero). A classic example of an interval level of measurement is temperature measured in degrees. The data is ordered, there are differences between measures, but there is no true zero. Since there is no true zero, it would be improper to say 40 degrees is twice as cold as 20 degrees. Examples of statistical procedures that use interval data include Pearsons r, regression analysis, t test, and ANOVA (Cooper & Schindler, 2014). Ratio: Ratio data can be classified and ordered, have meaningful distance between data values, and have unique origin (true zero). Examples include age in years and income in dollars. Examples of statistical procedures that use ratio data include Pearsons r, regression analysis, t test, and ANOVA (Cooper & Schindler, 2014). It should be noted that parametric tests are used to analyze data measure at the interval and ratio levels but cannot be used to analyze data measured at the nominal and ordinal levels. Measures of central tendency: It may have become evident by now, from the use of the histogram and the discussion of normality, that there is interest in how the data points are dispersed around the mid-point of the curve. This is called central tendency and is the foundation for statistical analysis using linear models. In short, our statistical procedures evaluate how much our data vary from that midpoint when a straight line is fit to the data (Field, 2005). The important takeaway is that the central tendency of that midpoint can be measured in three different ways: a) mean, b) median, and c) mode. As was seen in the descriptive statistics output above, mean, median, and mode are usually included in descriptive statistics generated by software. As was the case with normality and levels of measurement, it is important to determine the assumption of central tendency for any statistical procedure being considered to test the data. Assignment: Descriptive Statistics Analysis Mean: The arithmetic mean is the most commonly used measure of central tendency. It is calculated by adding the data scores and dividing by the number of cases. The mean is the measure of central tendency used with interval and ratio data and is used for statistical procedures like correlation, regression analysis, t test, and ANOVA (Salkind, 2009). Median: The median is the score among the distribution of data, when ordered from highest to lowest, where half of the data points occur above the median and half of the data points occur below the median. In the data MBA 5652, Research Methods 7 set 1, 3, 5, 7, and 9, the median would be 5 since half of the values occur aboUveNaITndx hSaTlfUbDeYloGw.UTIDheE median is the measure of central tendency used with ordinal data (Salkind, 2009). Mode: The mode is the data value that occurs most frequently in the data set, regardless of order. In a data set of 5, 5, 5, 3, 3, 9, 9, 9, 9, 1, 1, 1, 7, 7, 7, 7, 7, the mode would be 7 because it is the value that occurs most frequently in the data set. The mode is the measure of central tendency used with nominal levels of measurement (Salkind, 2009). In ClosingA Word About Validity and Reliability Although some of the most important and common assumptions of statistical testing have been discussed in this lesson, there are still more. This may seem like a very taxing and laborious process to partake in before even getting to the point of testing the research hypotheses. It is absolutely critical that researchers ensure assumptions are met to have certainty that their results reflect the integrity of validity and reliability. To be able to make confident decisions using research, the statistical results must be both valid and reliable. Validity means that the statistical procedure measures what was intended to be measured. As was discussed about normality, if a parametric statistical procedure is used for a data set that lacks a normal distribution of data, the results will be invalid. Reliability refers to repeatability. If a second research study was conducted by replicating the conditions of the original research study (e.g., sampling, data collection, levels of measurement, statistical test, etc.), the results should be similar if the original research results were reliable. It should also be noted that research results can be reliable but not valid. It is conceivable that a research study could be replicated multiple times and reliably generate the same invalid results each time. A classic example is the broken bathroom scale. Assume a persons actual weight is 150 pounds. Each morning, for a week they step on the bathroom scale and the reading is 145 pounds. The measurement is invalid because, due to calibration problems, the measurement is incorrect. The test, however, is reliable because the same result was replicated each day. For research results to have integrity, they must be both valid and reliable. References Cooper, D. R., & Schindler, P. S. (2014). Business research methods (12th ed.). New York, NY: McGraw- Hill/Irwin. Field, A. (2005). Discovering stats using SPSS (2nd ed.). London, England: Sage. George, D., & Mallery, P. (2010). SPSS for Windows step by step: A simple guide and reference, 17.0 update (10th ed.). Boston, MA: Pearson. Norusis, M. J. (2008). SPSS 16.0 guide to data analysis (2nd ed.). Upper Saddle River, NJ: Prentice Hall. Salkind, N. J. (2009). Exploring research (7th ed.). Upper Saddle River, NJ: Pearson. Sundberg, S. (2014). Skewed distribution: Definition, examples [Image]. Retrieved from http://www.statisticshowto.com/probability-and-statistics/skewed-distribution/ Title MBA 5652, Research Methods 8 Assignment: Descriptive Statistics Analysis Get a 10 % discount on an order above $ 100 Use the following coupon code : NURSING10

Use Promo Code: FIRST15

**FIRST15**and enjoy expert help with any task at the most affordable price.