Mast90007: statistics for research workers 2021


This assignment contains three (3) questions worth a total of 20 marks. There is some general advice on the assignment at the end of this document, on page 8. 

The overall requirement for this assignment is to carry out and report on data analytics that address three questions about the data from the Framingham heart study. 

You may know about this study from your general knowledge; it is one of the most famous studies in epidemiology. You can learn about the study from information on Wikipedia (, but also through these references: 

Levy, D., National Heart Lung and Blood Institute., et al. (1999). 50 years of discovery: medical milestones from the National Heart, Lung, and Blood Institute’s Framingham Heart Study. Hackensack, N.J., Center for Bio-Medical Communication Inc. 

Mahmood, S. S., Levy, D., Vasan, R. S., & Wang, T. J. (2014). The Framingham Heart Study and the epidemiology of cardiovascular disease: a historical perspective. The Lancet, 383(9921), 999-1008. 

Oppenheimer, G. M. (2005). Becoming the Framingham study 1947–1950. American Journal of Public Health, 95(4), 602-610. 

You may also find your own useful references. You are not required to read these references for the purposes of the assignment. 

The data file contains some information from long term follow up as well as baseline measures. The file contains records for 5,209 people – all the participants in the original cohort of the study. The participants were followed up every 2 years. The data file includes information from baseline, the 2nd examination (one variable), and the 16th examination (30 years after baseline). 


SRW MAST90007 2021 Major assignment 


The data file includes: Age at baseline (years) 

Weight at baseline (pounds) 


Diastolic blood pressure at baseline (mmHg) 

Systolic blood pressure at baseline (mmHg) 

Serum cholesterol (mg/100ml) examination 2 

Metropolitan Relative Weight at baseline 

Smoker at baseline 

Number cigarettes smoked per day at baseline 

Survived at last examination 

Female / Male 

Serum cholesterol (mg/100ml) at the 2nd examination; this variable has 626 missing values. 

A measure of the percentage of actual weight to desirable weight; a measure very similar to BMI. 

Smoker / Non-smoker 

0 = alive at 16th examination; 1 = died prior to 16th examination 


Serum cholesterol (mg/100ml) examination 1 Serum cholesterol (mg/100ml) at baseline; this variable has 2,037 missing values. 



Height at baseline (inches) 


Body Mass Index at baseline (kg/m2) 


Serum cholesterol (mg/100ml) baseline Baseline serum cholesterol at examination 1, or, when missing at examination 1, the 

serum cholesterol at the second examination. 


Last examination number Number of the last examination that the person participated in. 


Cause of death 

0 = still alive
1 = sudden death from coronary heart disease (CHD)
2 = other coronary heart disease
3 = stroke (cerebrovascular accident, CVA) 4 = other cerebral vascular disease
5 = cancer
6 = other causes of death
9 = cause unknown 


Examination at which CHD diagnosed, if