BIO STATISTICS

Explain with examples the difference between (a) Discrete and continous variables (b) Dependent and Independent variables [10] (2) In Mr Kwenda ’s clas of 96 students with Roll numbers 1-96, he desires to take a sample of 10 students. As a bio-statistics student advise Mr Kwenda on which sampling methods is good for him to use. [10] (3) Mr Haanyaka’s perfomance of his 230 Bio 285 class of students is given in an incomplete distribution below. Variable 0-10 10-20 20-30 30-40 40-50 50-60 60-70 Frequence 4 16 f1 f2 f3 6 4 (a) If the median and the mode of Mr Haanyaka’s class are 33.5 and 34 respecively, find the missing frequences [7] (b) Hence identify the mode and the range of his Distribution. [8] (c) Calculate 20th percentile, Quartile deviation, Standard deviation, mean deviation and Variance. Hence explain the meaning of these values in relation to Mr Haanyaka’s results. [25] (4) Seven Mr Nyirenda Microbiology BIO 142 students obtained the following results in their CA and final Exam CA 50 62 70 25 20 60 60 FINAL EXAM 48 65 74 33 25 55 66 Find the correlation and explain if there is a relationship between a student passing the CA and passing the final exam. [10] (5) Twenty students from a Rusangu university were attacked by a disease after eating food from the compound and only 14 survived from being badly sick. Will you reject the hypothesis that survival rate, it attached by this disease is 85 percent at 5 percent level? [10] (6) The incidence of COVID 19-disease is such that on the average twenty percent people suffer from it. If 10 people are selected at random, what is the chance that not more than two people suffer from COVID 19-disease. [10]. (7) The mean height of 100 students at Rusangu university Kitwe campus nursing students are 165 cm and standard deviation is 10cm. Assuming normal distribution, find the number of students whose height is between 159 and 178 cm. [10].

The latest GDP figures released by the Australian Bureau of Statistics (ABS) revealed that in the first three months of 2020, the Australian economy shrank for the first time since March 2011. Real GDP growth rate for 2020Q1 was −0.3% due to the impact of bushfires and the early stages of the coronavirus pandemic. Suppose that you and your group are hired to produce forecasts for Australia for the last three quarters of 2020 and prepare a report to the government. To that end, the assignment is divided into steps as follows 1. Collecting the data of interest from ABS website [10 marks]. Your group is interested in investigating the relationship between real GDP, which is denoted by Y, and a variable called X. Later, X will be used as an independent variable in a regression. Refer to the Appendix below to find what variable assigned to your group and its corresponding series code. 2. Data cleaning [10 marks]: after having Y and X series, you need to transform them into growth rates. Note that, the growth rate for Y and X series can be computed as yt = 100 × Yt−Yt−1 Yt−1 and xt = 100 × Xt−Xt−1 Xt−1, respectively. For the assignment purposes, your group decided to focus on a sample period running from 2000Q1 to 2020Q1 and use this sample to perform the following exercises. To be clear, the number of observations for both y and x will be 80. 3. Getting to know the data [20 marks] (a) Present both y and x on one line graph and report the figure in an Appendix - Figure 1. (b) Create a scatter plot between y and x and report the figure in the Appendix - Figure 2. (c) Calculate the correlation between y and x and report the result in Figure 2. (d) Create a table reporting the values of mean and median for x and y for every 10 quarters and present the results in the Appendix - Table 1. (e) Based on Figure 1, Figure 2 and Table 1, using the second paragraph of your report to discuss the primary results about the relationship between y and x. The first paragraph of the report is for an introduction, where you highlight the aim of your task. Don’t forget a title for your report! 4. Regression [30 marks]

  • You first believe that the current growth must be related to the previous rate, thus you run a regression as follows, namely M1. yt = α0 + α1yt−1 + t (1) and report the estimated value of α0, α1 and R2. Discuss your results in the third paragraph of your report. (b) To qualify the results, you are required to calculate TSS, ESS and SSR to confirm that RSS = SSR+ESS and R2 = 1− SSR T SS but report these technical results in the Appendix. (c) After examining the estimated results obtained from the first regression, suppose that your chief economist asked you to run another regression as follows, namely M2. yt = β0 + β1xi + t. (2) Discuss the new results and compare with those found previously in the fourth paragraph. Based on R2, which model is considered to be better at predicting y? 5. Forecast [25 marks] (a) Having obtained the estimated coefficients of M1 and M2, you are now going to conduct a simple forecast for y. In particular, you are interested in forecasting yT +h, where T indicating the last period of the sample and h is the forecast horizons running from 1 to 3. In your sample period, T will be 2020Q1 and thus h will be 2020Q2, 2020Q3 and 2020Q4. The simple forecast can be computed using following formula i. Compute forecasts for h = 1, 2, 3 under M1 yˆT +h = ˆα1yT +h−1. (3) Precisely, we can derive 1-step-ahead forecast: yˆT +1 = ˆα1yT, 2-step-ahead forecast yˆT +2 = ˆα1yˆT +1 = (ˆα1) 2 yT, and 3-step-ahead forecast yˆT +2 = ˆα1yˆT +2 = (ˆα1) 3 yT,

Compute forecasts for h = 1, 2, 3 under M2 yˆT +h = βˆ 1xT +h, (4) where xT +h is the expected value of x for 3 scenarios presented in the appendix. (b) Using a bar graph (Figure 3) to present your forecast results under M1 and M2 and place this graph in the Appendix. (c) Discus your forecast results. Are there any points of concern that should be taken into account when using the model? 6. Conclusion and policy recommendation (last paragraph) [5 marks].

Assignment 3: Multiple Regression TRES6030 Quantitative Data Analysis Case 1 One critical factor that determines the success of a catalogue store chain is the availability of products that consumers want to buy. If a store is sold out, future sales to that customer are less likely. Accordingly, delivery trucks operating from a central warehouse regularly resupply stores. In an analysis of a chain’s operations, the general manager wanted to determine the factors that are related to how long it takes to unload delivery trucks. A random sample of 50 deliveries to one store was observed. File name: Catalogue.xls Variables: The times (in minutes) to unload the weight (in hundreds of pounds) of the boxes were recorded. Determine the multiple regression equation. Question 1: What are the hypothesis statements of this study? Question 2: Analyse and explain the correlations between the variables of the study. Is there a relationship between the times, unload the weight and boxes were recorded? Question 3: Explain the fits of regression models. Is the model valid? Question 4: Interpret and test the coefficients. a. Produce a 95% interval of the amount of time needed to unload a truck with 100 boxes weighing 5,000 pounds. b. Produce a 95% interval of the average amount of time needed to unload trucks with 100 boxes weighing 5,000 pounds.

Case 2 The survey was designed to explore the factors that affect respondents’ psychological adjustment and wellbeing. For the multiple regression example detailed below, I will be exploring the impact of respondents’ perceptions of control on their levels of perceived stress. The literature in this area suggests that if people feel that they are in control of their lives, they are less likely to experience ‘stress’. In the questionnaire, there were two different measures of control (see the Appendix for the references for these scales). These include the Mastery Scale, which measures the degree to which people feel they have control over the events in their lives; and the Perceived Control of Internal States Scale (PCOISS), which measures the degree to which people feel they have control over their internal states (their emotions, thoughts and physical reactions). In this example, I am interested in exploring how well the Mastery Scale and the PCOISS are able to predict scores on a measure of perceived stress. The variables used in the examples covered in this chapter are presented below. It is a good idea to work through these examples on the computer using this data file. Hands-on practice is always better than just reading about it in a book. Feel free to ‘play’ with the data file—substitute other variables for the ones that were used in the example. See what results you get and try to interpret them.

• Total perceived stress (tpstress): total score on the Perceived Stress Scale. High scores indicate high levels of stress. • Total Perceived Control of Internal States (tpcoiss): total score on the Perceived Control of Internal States Scale. High scores indicate greater control over internal states. • Total Mastery (tmast): total score on the Mastery Scale. High scores indicate higher levels of perceived control over events and circumstances. • Total Social Desirability (tmarlow): total scores on the Marlowe-Crowne Social Desirability Scale, which measures the degree to which people try to present themselves in a positive light. • Age: age in years. Question 1: What are the hypothesis statements of this study? Question 2: Analyse and explain the correlations between the variables of the study. Is there a relationship between the amount of control people have over their internal states and their levels of perceived stress? Do people with high levels of perceived control experience lower levels of perceived stress? What you need: Two variables: both continuous, or one continuous and the other dichotomous (two values). Question 3: Explain the fits of regression models. Is the model valid? Question 4: How well do the two measures of control (mastery, PCOISS) predict perceived stress? How much variance in perceived stress scores can be explained by scores on these two scales? Question 5: Which is the best predictor of perceived stress: control of external events (Mastery Scale) or control of internal states (PCOISS)?