Aims

This module covers three application areas of statistics: multivariate methods, demography and epidemiology and sampling.

On completion of the course students should be able to (learning outcomes):

    • Understand and to apply multivariate methods;

    • Assess the results of discriminant analysis, principal components, cluster analysis and multivariate analysis of variance;

    • Understand and to apply demographical and epidemiological methods;

    • Understand and to apply sampling methods.




Syllabus:

Multivariate methods
Vectors of expected values. Covariance and correlation matrices. Discriminant analysis, choice between two populations, calculation of discriminant function, and probability of misclassification, test and training samples, leave-one-out and k-fold cross-validation, idea of extension to several populations. Principal components; definition, interpretation of calculated components, use in regression. Cluster analysis, similarity measures, single-link and other hierarchical methods, k-means. Informal approaches to checking for multivariate Normality. Tests and confidence regions for multivariate means.

Demography and epidemiology
Population pyramids. Life tables. Standardised rates (e.g. mortality). Incidence and prevalence. Design and analysis of cohort (prospective) studies. Design and analysis of case-control (retrospective) studies. Confounding and interaction.
Matched case control design and analyses, using McNemar's test. Causation.
Relative risk. Odds ratio. Estimation and confidence intervals for 2x2 tables.
Mantel-Haenszel procedure. Sensitivity, specificity, ROC curves, positive predictive value, negative predictive value.

Sampling
Census and sample survey design. Target and study populations, uses and limitations of non-probability sampling methods, sampling frames, sampling fraction.
Simple random sampling. Estimators of totals, means and proportions; bias. Estimated standard errors, confidence intervals and precision. Sampling fraction and finite population correction. Ratio and regression estimators. Stratified random sampling. Estimators of totals, means and proportions; bias. Estimated standard errors, confidence intervals and precision. Cost functions. Proportional and optimal allocations. Limitations of stratified sampling. One-stage cluster sampling. Estimators for totals, means and proportions with equal cluster sizes and with different cluster sizes. Estimated standard errors, confidence intervals and precision. Link with systematic sampling. Description of two-stage sampling and of multi-stage sampling. Limitations.