CNU BST 322 Regression and Correlation Coefficient

CNU BST 322 Regression and Correlation Coefficient ORDER NOW FOR CUSTOMIZED AND ORIGINAL ESSAY PAPERS ON CNU BST 322 Regression and Correlation Coefficient Collaborate Summary: four points for a two-page summary of the Collaborate lecture. Bullets and outline format are fine. Students can annotate the written lecture document with thoughtful notes as another way to get credit. CNU BST 322 Regression and Correlation Coefficient week_four_collaborate_slides_revised_june_2020.pptx BST 322 Week Four Slides Revised June 22, 2020 Brooks Ensign, MBA, M.Acc. Deadlines • Week Four ( end of course): Final Exam in MyStatLab, Independent Project, Wk 4 HW, Discussion Questions, MyStatLab ASK ME FOR HELP !!! – MyStatLab Final Exam This Week: Week Four Our agenda this week: PREDICTIONS? • Scatterplot ? Correlation calculation ? • Correlation calculation ? Derive regression equation ? • Regression equation ? use to “predict” (“maybe” – if “significant”) • Consider confounding variables and multivariate regression • ANCOVA: introduce(lightly) This Week: Week Four • Review Correlation from Week One ( Ch. 4) • Algebra: draw a line with two points, and get the slope and intercept: gives you the equation (simplified regression process) • Regression: simple bivariate (two variables) • Multivariate: > 1 independent variables – (x1, x2 , x3 ) • Ch. 9: simple bivariate, Ch. 10: multivariate • Ch. 11 (just first 6 pages): Intro. To ANCOVA This Week: “Regression” • Chapters 9 and 10 (and a tiny intro bit of 11) • With interval or ratio data: • From scatterplots, to correlations, to regressions: deriving an equation to describe the data, and (maybe) using the equation to predict values • Y “prime” = Y’ = a plus (b times x) • Y ‘ = the predicted value of y • B = slope, and a = intercept “Regression” • Regress: step back and analyze • In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable and one or more independent variables. • Regression equations can be used to predict values, if … “Maybe” predict? CNU BST 322 Regression and Correlation Coefficient • Explanation: • MyStatLab has strict rule: if the correlation is statistically significant ( p value less than 0.05), then, and only then, you can use the regression equation to predict. Otherwise, you simply use the mean (average) value for the dependent variable. • Class in DQ1: gray area: p value of 0.06, but we will use it anyway (borderline) Preview: Correlation -> Regression ->Prediction • This week we focus on Chapter 9 (all of it), the first half of Chapter 10 (light treatment), and the first third of Chapter 11 (very light treatment), • This week, our test statistics “are (r) ” : • “r “ – lower case r, for the simple correlation and regression with two variables • “R “ – upper case R: for multivariate regression: more than one independent variable; we give multivariate regression a light treatment Significance? • In order to declare that our results are “significant” (i.e. “probably not random”), • We need to “reject the null hypothesis” and we need: • A LARGE test statistic and a very small p value • Test statistics: t … F …. Chi ( ? ) … and now … “ r “ and “R” • For significance of r see page 199 and page 418 • For significance of R see page 231 (stay tuned) “They Work Together:” Think of the test statistic and the p value as the opposite ends of a seesaw. They work in opposite directions. For statistical significance, we want a large r or R test statistic (larger than the table value) and a small p value (smaller than 0.05, i.e., alpha). r or R Test statistic greater than table value P value less than alpha (0.05), the level of significance Review If the absolute value we find for the test statistic is > than the tabled value (at a certain level of significance (?) or P value) Or if we get a P Value < 0.05 then the null hypothesis is rejected and the result is significant. Correlation and Regression (Review) We did correlation when we covered scatterplots in ch. 4 (Pearson’s r) This value r is calculated from a sample of data The population value of r (the correlation coefficient) is “rho” (?) (the lowercase Greek r) We study “ r “ in a sample as an estimate of the “rho” (? ) correlation in the population Remember that r is easily calculated in StatCrunch Scatterplot ? CNU BST 322 Regression and Correlation Coefficient Correlation, and now … • Correlation ? Regression equation-> prediction • Regression: derive an equation from the correlation (“if” the correlation is statistically significant and strong enough to be predictive) • y’ = a plus (b times x) with a = intercept and b = slope • “y prime” = y’ (is the predicted value for y) Correlation in StatCrunch “Click:” Stats — > Summary Stats — > Correlation Slope, Correlation and R2 CONTRAST THESE (they are different) • 1. Slope: “rise / run” ; “b” in y = a plus (b)*x • Slope: “steep?” • 2. r = Correlation: from Week One, “r” = does a change in y relate to a change in x? • Strong Correlation can have low slope! • 3. R2 = regression answer: “proportion of variance” (how much of variance is explained?); also known as “coefficient of determination” • R2 = r times r Correlation is not “slope” (rise over run) Tight fit? Or “messy” Correlation is the degree of “fit” to a line: is it “tight” (very close to being a line, ie. Correlation of 0.7 – 0.9), or is it …. Weak correlation (none) is: A “messy cloud” (zero or low correlation, i.e., 0.1 ) ? Perfect ( r = 1.0 ) Correlation; Slope is 0.1 8 7 6 Strong Correlation, with low slope 5 4 3 2 1 0 0 1 2 3 4 5 6 7 Correlation as a test statistic Now we can look at the Pearson’s r value in terms of it being a test statistic Are the values we see significant? The Null hypothesis here is that the correlation value is……. H0: r = ? A. zero – there is no relationship B. not zero – there is a relationship C. 0.5 – there is a weak relationship Vote now! Correlation as a test statistic H0: rho ? = 0 H1: rho ? ? 0 The Null hypothesis here is that there is … ( no relationship, no correlation, r very small, close to zero) Any ideas from students? What is the null hypothesis? What is the alternative hypothesis? Discuss… Correlation as a test statistic H0: rho ? = 0 H1: rho ? ? 0 The Null hypothesis here is that there is no relationship between the variables in the population (r = 0) — SEE PAGE 199 So we compare the test statistic r (which we use as an estimate of “rho” ?) to the critical value in the table (p.418) —CNU BST 322 Regression and Correlation Coefficient Again, if the test statistic (absolute value) is > than a certain critical value then the null hypothesis is rejected and the result is significant or we let the computer tell us by making it calculate the exact P value (and just compare that to 0.05) Easy Way • The easy way to determine statistical significance of the regression: the p value of the slope (not the p value of the intercept) • Is the p value of the slope less than 0.05? Correlation-Regression ExampleBetter Charts Bad Y Good 4.0 3.5 3.0 Weight Gain After Overeating 2.5 2.0 Y 1.5 3.5 0.5 0.0 0 100 200 300 400 See Polit p.35 for more tips on graphs 500 600 700 800 Fat gain (kilograms) 1.0 y = -0.0033x + 3.3413 R² = 0.6211 3.0 2.5 2.0 1.5 1.0 100 200 300 400 500 600 Nonexercise activity (calories) 700 Significance of r • This was optional in week one; it is now required (easy w StatCrunch: p value?) • Four Slides from Week One (see p. 199, top): • Follow these instructions to test the significance of your correlation in the Independent Project, #5 and #6 • Required for Question Six in the independent project: test the significance of your correlation coefficient (page 199) Meaning of r value (page 71) • Pearson’s r can be between 0 and 1 for positive correlation and 0 and negative 1 for negative correlation. • Positive correlation: 0 < r < 0.2 is weak, 0.2 to 0.5 is moderate, 0.5 to 0.7 is stronger, and >0.7 is very strong (these are “rough” descriptions) • Negative: strong correlation if less than -0.5, weak if between -0.2 and 0. • Value of Zero or near Zero: No Correlation Is “r” value “significant?” • The easiest way: – Look at the p value of the slope in your StatCrunch results (bottom right corner) – Is the p value of the slope: < 0.05 ?? Parameter estimates: Parameter Estimate Std. Err. Alternative DF T-Stat P-Value Intercept Slope 665.7143 131.6546 ? 0 5 5.0565214 0.0039 -0.6989286 0.29438862 ? 0 5 -2.3741696 0.0636 Is “r” value “significant?” • Week 2: “Significant” in statistics means “not random.” (rather than “important”) • Test of significance for Pearson’s r • (top of page 199 and page 418) • Calculate d.f. (degrees of freedom): N-2, with N being the number of data points • Notice: at the very bottom of the table: a low r value can be significant with a large data sample • At the very top of these tables: small samples require LARGE test statistics • Discussion Question One: Large r value may not be significant with a small sample size (top rows in tables) • Vs. Contrast this with the bottom of the table: a small r value may be significant with a large data set Is “r” value “significant?” • Refer to page 199 (top) • Refer to page 418: Use shaded column (0.05) • CNU BST 322 Regression and Correlation Coefficient Find the row that corresponds to d.f. (degrees of freedom); e.g., 10 -2 = 8 d.f. • If your calculated “r” value is greater than the table value, then the calculated “r” value is significant (“non-random”). Is “r” value “significant?” • • • • • • • • Question 14 in W1 homework (week one): Ten data points, d.f = N-2 = 8 Page 418: shaded column (? = 0.05) Page 418: Table A.6, row: d.f. = 8 Table value: 0.632 r value is significant if greater than 0.632 Is “r” value “significant?” (yes, 0.91 > 0.632) Test significance of r value in Independent Project StatCrunch – Discussion Question One • W4 DQ one: the 0.73 r value “seems” large (and significant?) but it is not quite significant, because: the data set is very small • Remember: we predict “maybe?” l • This is the only “close call” in our course • 0.728 < table value of 0.754 (how did I find this table value on page 418 using the guidance from page 199?) Regression • Regression: use the equation derived from the correlation / scatterplot • StatCrunch does all of this for us • Regression: use the equation to PREDICT Regression in StatCrunch • Click: Stat: ? Regression ? Simple linear Regression in StatCrunch: fill in the template Prediction in StatCrunch (using Regression) StatCrunch – Discussion Question One Simple linear regression results: Equation: y = intercept minus b times x Dependent Variable: Cholesterol cholesterol = 665 – 0.69 times Caffeine Independent Variable: Caffeine Cholesterol = 665.7143 – 0.6989286 Caffeine Sample size: 7 R (correlation coefficient) = -0.728 R-sq = 0.52992857 Estimate of error standard deviation: 155.77582 R and R-squared Parameter estimates: l Parameter Estimate Intercept Slope Std. Err. Alternative DF T-Stat P-Value 131.6546 ?0 5 5.0565214 0.0039 -0.6989286 0.29438862 ?0 5 -2.3741696 0.0636 665.7143 Significance? Look for the p value of the slope 0.06 > . 05 Not sig. (but very close) Answers • How do you answer the questions in the discussion questions and the homework? CNU BST 322 Regression and Correlation Coefficient • See the next few slides !! Discussion Question One (use this for Homework Q-7 also ) • Q: r What is the correlation coefficient r and what does it mean in this case? • A: The correlation coefficient (r)=-.728 which means there is a strong, negative correlation. • Q: r2 What is the coefficient of determination and what does it mean in this case? • A: The coefficient of determination is r2. In this case it is equal to .53. This means that 53% of the variation in cholesterol is explained by the independent variable. • Q: Is there a statistically significant correlation between caffeine intake and cholesterol levels in this case? • A: The table value is .754 and the absolute value of r = .73. Because the calculated value does not exceed the table value then there is not statistical significance (“very close,” but not quite). Discussion Question One • The correlation seems strong, but it is not quite significant … • How many more data points do you need? (one or two) • Note: this is “strong” correlation that lacks significance (very small sample) • We can also have weak correlation, with significance (in a large sample): look at the bottom of page 418 – small values Discussion Question One • Using regressions to PREDICT: • Difference in Methodology: • MyStatLab teaches us that we “only” use regressions to predict, if the regression is statistically significant. Otherwise we just use the average value… • But this Discussion Question asks you to predict, using this equation, which is “not quite” signficant • Sometimes statistical approaches differ; this is the only “borderline” example in this class, but there are many in real life StatCrunch Discussion Question One The numbers here are slightly different from your discussion question USE THE NUMBERS IN THE DQ – DON’T JUST COPY THESE NUMBERS Discussion Question One: Predictions • Q: What is the intercept? CNU BST 322 Regression and Correlation Coefficient (or –what would be your cholesterol level while ingesting no caffeine?) • A: The intercept is 665.714. That would be the cholesterol level while ingesting 0 mg of caffeine. • Q: What is the slope? (or, what is what we call b in the linear regression equation?) • A: The slope ( or b in the linear regression equation) is -0.636 • Simple linear regression results: Dependent Variable: Cholesterol Independent Variable: Caffeine Cholesterol = 665.7143 – 0.6989286 Caffeine Sample size: 7 R (correlation coefficient) = -0.728 R-sq = 0.52992857 Estimate of error standard deviation: 155.77582 Discussion Question One Parameter estimates: Parameter Intercept Slope Estimate Std. Err. Alternative DF T-Stat P-Value 131.6546 ?0 5 5.0565214 0.0039 -0.6989286 0.29438862 ?0 5 -2.3741696 0.0636 665.7143 Use the p value of the slope – it is the same as the p value for the correlation. 0.06 is > than 0.05, so the results are not quite statistically significant. ? P value of the slope Is 0.06 Discussion Question One: use a regression to predict • c) How many cups of coffee must you drink to lower your total cholesterol to 150 mg/dL (given that 1 cup of coffee equals 100 mg of caffeine)? ALGEBRA • x=(150-665.714)/(-0.636) • x=810/100 mg • 8 cups • Better way: use the StatCrunch prediction tool for the DQ and for the HW Q7 StatCrunch: Prediction Scroll down in the Simple Linear Regression screen, until you see this: Enter the value of X (the assumed value of X) and StatCrunch will calculate the predicted Y value, based on the regression equation EC Discussion Question Three • Optional: but interesting (fun and easy) CNU BST 322 Regression and Correlation Coefficient • The Most Important Question in This Course • No math! This is your chance to use what you have learned in this course, to… • Recognize the mistakes and misconceptions in the medical literature; some statistical studies are poorly designed / executed… • Misadventures… Skim the Vox Article (design of medical research studies) http://www.vox.com/2015/1/5/7482871/types-of-study-design Misadventures? Look at “Misadventures” in this site: http://www.improvingmedicalstatistics.com/index.html http://www.improvingmedicalstatistics.com/entry_media.h tm http://www.improvingmedicalstatistics.com/entry_high_sc hool.htm http://www.improvingmedicalstatistics.com/Biased%20pro tocol.htm Choose one of the examples cited. Write a short paragraph: identify the article and identify the abuse / misuse of statistical analysis. NOTE: These research articles are prominent, recent medical articles (WITH MISTAKES !!! ??? ). Regression & Multiple Regression Regression (bivariate in ch. 9) one x variable -used to make predictions about the values of variables once we know their relationship easiest – linear -use the equation of a line to predict y variable values, with one x Multiple Regression (multivariate in ch. 10) an extension of simple linear regression where we use two or more x variables (“factors”) to predict the value of the dependent variable YOU LEARNED IN THIS CLASS: The word “factor” is used in this class instead of “cause.” We recognize that explanations usually involve “multiple factors.” What is the “cause” of my hypertension? • Trick question, because there is not “one” cause of hypertension (and many other medical conditions) • There are “multiple ‘contributing’ factors:” salt, stress, genetics, diet, exercise, caffeine, decongestants, medicine • “Multiple factors: – these questions are addressed with “multivariate” regression in CH 10 ( and MF ANOVA, in Ch- 7) Capital R: Multiple Regression (we just want the basics in ch. 10) • Factors: x variables: x1 , x2, x3, x4, … etc. • Adding another “factor” (x variable) will not make the regression worse, but the added benefit will drop as you add x variables (overlapping variance) (*) • Goal: increase R2 (percentage of variance explained) • Every x variable DOES NOT have to be significant • Generally: we may want about 2 – 4 variables (*) • b values (weights) b1 and b2 and b3 are only valid for until you change the combination of variables • (*) Using too many factors may be “overfitting” # of Factors and r-squared • If you can get the same r-squared value with fewer factors, that is a better choice (more robust, more scalable, less “overfitting”) Is your R value statistically significant? HOW TO DECIDE? LOOK AT YOUR F VALUE (AND PAGE 231) Homework • The F Statistic: pages 230-231 • Used for the test of significance with multiple regressions • Is your R value statistically significant? CNU BST 322 Regression and Correlation Coefficient Use your F value to answer this question… • Is calculated F greater than table F? If so: • Is your R value statistically significant? Page 413 Multiple Regression F Statistic See Page 231 For Homework Assignment Page 231: df (b) equals k df (within) = N – k -1 F statistic in Multiple Regression • Using the following information for R2, k, and N, calculate the value of the F statistic for testing the overall regression equation and determine whether F is statistically significant at the 0.05 level Example: R2 = 0.53, k = 5, N = 120 (See table on page 413) (R2 / k ) / [(1- R2)/(N – k -1)] = F = 25.71 > tabled F = 2.29; significant – Reject Null Hypothesis. F statistic in Multiple Regression • Using the following information for R2, k, and N, calculate the value of the F statistic for testing the overall regression equation and determine whether F is statistically significant at the 0.05 level Example: R2 = 0.53, k = 5, N = 120 (See table on page 413) (0.53/5) / (1-0.53)/(120-5-1) = F = 25.71 > tabled F = 2.29; significant – Reject Null Hypothesis. ANCOVA: (ch.11, introduction) – ANALYSIS of COVARIANCE very light treatment with NO MATH, but a very interesting concept (“controlling” for and “adjusting” for confounding variables) ANCOVA is similar to ANOVA: The assumptions include all of the ANOVA assumptions Confounding variables • In plain English: – Are we comparing apples to oranges? – i.e., are there differences in the comparison that we are not properly considering? Is this a “fair” comparsion? Differences: “confounding” variables. Using ANCOVA (in words) • “Freeze” (“control for”) the variance from lurking (confounding) variables, to study the effect of the variable of interest • Example in the written lecture (common in clinical studies): control for back pain (or other clinical endpoint variable) at “baseline” (beginning), in order to study the effects of drugs in reducing this back pain • Otherwise: the drugs appear to be ineffective, and the people with less back pain at baseline continue to have less back pain at the end ANC … Get a 10 % discount on an order above $ 100 Use the following coupon code : NURSING10

Don't use plagiarized sources. Get Your Custom Essay on
CNU BST 322 Regression and Correlation Coefficient
Get a 15% discount on this Paper
Order Essay
Quality Guaranteed

With us, you are either satisfied 100% or you get your money back-No monkey business

Check Prices
Make an order in advance and get the best price
Pages (550 words)
$0.00
*Price with a welcome 15% discount applied.
Pro tip: If you want to save more money and pay the lowest price, you need to set a more extended deadline.
We know that being a student these days is hard. Because of this, our prices are some of the lowest on the market.

Instead, we offer perks, discounts, and free services to enhance your experience.
Sign up, place your order, and leave the rest to our professional paper writers in less than 2 minutes.
step 1
Upload assignment instructions
Fill out the order form and provide paper details. You can even attach screenshots or add additional instructions later. If something is not clear or missing, the writer will contact you for clarification.
s
Get personalized services with My Paper Support
One writer for all your papers
You can select one writer for all your papers. This option enhances the consistency in the quality of your assignments. Select your preferred writer from the list of writers who have handledf your previous assignments
Same paper from different writers
Are you ordering the same assignment for a friend? You can get the same paper from different writers. The goal is to produce 100% unique and original papers
Copy of sources used
Our homework writers will provide you with copies of sources used on your request. Just add the option when plaing your order
What our partners say about us
We appreciate every review and are always looking for ways to grow. See what other students think about our do my paper service.
Nursing
The writer went above and beyond for this assignment. I am so grateful for the help.
Customer 452707, August 19th, 2022
Other
GREAT
Customer 452813, June 25th, 2022
Other
great
Customer 452813, July 9th, 2022
nursing
Thank you!
Customer 452707, April 2nd, 2022
Technology
Great job on the paper!
Customer 452885, December 14th, 2022
Social Work and Human Services
Excellent
Customer 452587, July 28th, 2021
ENVIRONMENT SCIENCE
GOOD
Customer 452813, June 19th, 2022
Nursing
Great writing! Really appreciate your help!
Customer 452503, April 22nd, 2021
Social Work and Human Services
Excellent! Done earlier than needed and with more sources than needed! Great work!
Customer 452485, August 22nd, 2021
English 101
Although a little late, the content and structure of the paper was great! I would definitely use this writer again!
Customer 452561, July 12th, 2021
Human Resources Management (HRM)
Thank you for doing such an awesome job.
Customer 452701, June 30th, 2023
Human Resources Management (HRM)
awesome job
Customer 452701, August 14th, 2023
Enjoy affordable prices and lifetime discounts
Use a coupon FIRST15 and enjoy expert help with any task at the most affordable price.
Order Now Order in Chat

We now help with PROCTORED EXAM. Chat with a support agent for more details