Medical Statistics at a Glance (eBook)
John Wiley & Sons (Verlag)
9781119167839 (ISBN)
Now in its fourth edition, Medical Statistics at a Glance is a concise and accessible introduction to this complex subject. It provides clear instruction on how to apply commonly used statistical procedures in an easy-to-read, comprehensive and relevant volume. This new edition continues to be the ideal introductory manual and reference guide to medical statistics, an invaluable companion for statistics lectures and a very useful revision aid.
This new edition of Medical Statistics at a Glance:
- Offers guidance on the practical application of statistical methods in conducting research and presenting results
- Explains the underlying concepts of medical statistics and presents the key facts without being unduly mathematical
- Contains succinct self-contained chapters, each with one or more examples, many of them new, to illustrate the use of the methodology described in the chapter.
- Now provides templates for critical appraisal, checklists for the reporting of randomized controlled trials and observational studies and references to the EQUATOR guidelines for the presentation of study results for many other types of study
- Includes extensive cross-referencing, flowcharts to aid the choice of appropriate tests, learning objectives for each chapter, a glossary of terms and a glossary of annotated full computer output relevant to the examples in the text
- Provides cross-referencing to the multiple choice and structured questions in the companion Medical Statistics at a Glance Workbook
Medical Statistics at a Glance is a must-have text for undergraduate and post-graduate medical students, medical researchers and biomedical and pharmaceutical professionals.
Aviva Petrie is Honorary Associate Professor, Biostatistics Unit, UCL Eastman Dental Institute, London, UK.
Caroline Sabin is Professor of Medical Statistics and Epidemiology, Department of Primary Care and Population Sciences, Royal Free and University College Medical School, London, UK.
Aviva Petrie is Honorary Associate Professor, Biostatistics Unit, UCL Eastman Dental Institute, London, UK. Caroline Sabin is Professor of Medical Statistics and Epidemiology, Department of Primary Care and Population Sciences, Royal Free and University College Medical School, London, UK.
Preface ix
Part 1 Handling data 1
1 Types of data 2
2 Data entry 4
3 Error checking and outliers 6
4 Displaying data diagrammatically 8
5 Describing data: the 'average' 10
6 Describing data: the 'spread' 12
7 Theoretical distributions: the Normal distribution 14
8 Theoretical distributions: other distributions 16
9 Transformations 18
Part 2 Sampling and estimation 21
10 Sampling and sampling distributions 22
11 Confidence intervals 24
Part 3 Study design 27
12 Study design I 28
13 Study design II 31
14 Clinical trials 34
15 Cohort studies 37
16 Case-control studies 40
Part 4 Hypothesis testing 43
17 Hypothesis testing 44
18 Errors in hypothesis testing 47
Part 5 Basic techniques for analysing data 51
Numerical data
19 Numerical data: a single group 52
20 Numerical data: two related groups 54
21 Numerical data: two unrelated groups 57
22 Numerical data: more than two groups 60
Categorical data
23 Categorical data: a single proportion 63
24 Categorical data: two proportions 66
25 Categorical data: more than two categories 69
Regression and correlation
26 Correlation 72
27 The theory of linear regression 75
28 Performing a linear regression analysis 77
29 Multiple linear regression 81
30 Binary outcomes and logistic regression 85
31 Rates and Poisson regression 89
32 Generalized linear models 93
33 Explanatory variables in statistical models 96
Important considerations
34 Bias and confounding 100
35 Checking assumptions 104
36 Sample size calculations 107
37 Presenting results 111
Part 6 Additional chapters 115
38 Diagnostic tools 116
39 Assessing agreement 119
40 Evidence-based medicine 124
41 Methods for clustered data 127
42 Regression methods for clustered data 130
43 Systematic reviews and meta-analysis 134
44 Survival analysis 138
45 Bayesian methods 142
46 Developing prognostic scores 144
Appendices 147
A Statistical tables 148
B Altman's nomogram for sample size calculations (Chapter 36) 155
C Typical computer output 156
D Checklists and trial profile from the EQUATOR network and critical appraisal templates 169
E Glossary of terms 178
F Chapter numbers with relevant multiple-choice questions and structured questions from Medical Statistics at a Glance Workbook 188
Index 190
3
Error checking and outliers
| Learning objectives |
| By the end of this chapter, you should be able to:
Relevant Workbook questions: MCQs 5 and 6; and SQs 1 and 28 available online |
In any study there is always the potential for errors to occur in a data set, either at the outset when taking measurements, or when collecting, transcribing and entering the data into a computer. It is hard to eliminate all of these errors. However, you can reduce the number of typing and transcribing errors by checking the data carefully once they have been entered. Simply scanning the data by eye will often identify values that are obviously wrong. In this chapter we suggest a number of other approaches that you can use when checking data.
Typing errors
Typing mistakes are the most frequent source of errors when entering data. If the amount of data is small, then you can check the typed data set against the original forms/questionnaires to see whether there are any typing mistakes. However, this is time-consuming if the amount of data is large. It is possible to type the data in twice and compare the two data sets using a computer program. Any differences between the two data sets will reveal typing mistakes. Although this approach does not rule out the possibility that the same error has been incorrectly entered on both occasions, or that the value on the form/questionnaire is incorrect, it does at least minimize the number of errors. The disadvantage of this method is that it takes twice as long to enter the data, which may have major cost or time implications.
Error checking
- Categorical data – it is relatively easy to check categorical data, as the responses for each variable can only take one of a number of limited values. Therefore, values that are not allowable must be errors.
- Numerical data – numerical data are often difficult to check but are prone to errors. For example, it is simple to transpose digits or to misplace a decimal point when entering numerical data. Numerical data can be range checked – that is, upper and lower limits can be specified for each variable. If a value lies outside this range then it is flagged up for further investigation.
- Dates – it is often difficult to check the accuracy of dates, although sometimes you may know that dates must fall within certain time periods. Dates can be checked to make sure that they are valid. For example, 30th February must be incorrect, as must any day of the month greater than 31, and any month greater than 12. Certain logical checks can also be applied. For example, a patient’s date of birth should correspond to his/her age, and patients should usually have been born before entering the study (at least in most studies). In addition, patients who have died should not appear for subsequent follow-up visits!
With all error checks, a value should only be corrected if there is evidence that a mistake has been made. You should not change values simply because they look unusual.
Handling missing data
There is always a chance that some data will be missing. If a large proportion of the data is missing, then the results are unlikely to be reliable. The reasons why data are missing should always be investigated – if missing data tend to cluster on a particular variable and/or in a particular subgroup of individuals, then it may indicate that the variable is not applicable or has never been measured for that group of individuals. If this is the case, it may be necessary to exclude that variable or group of individuals from the analysis. There are different types of missing data1:
- Missing completely at random (MCAR) – the missing values are truly randomly distributed in the data set and the fact that they are missing is unrelated to any study variable. The resulting parameter estimates are unlikely to be biased (Chapter 34). An example is when a patient fails to attend a hospital appointment because he is in a car accident.
- Missing at random (MAR) – the missing values of a variable do not depend on that variable but can be completely explained by non-missing values of one or more of the other variables. For example, suppose that individuals are asked to keep a diet diary if their BMI is above 30 kg/m2: the missing diet diary data are MAR because missingness is completely determined by BMI (those with a BMI below the cut-off do not complete the diet diary).
- Missing not at random (MNAR) – the chance that data on a particular variable are missing is strongly related to that variable. In this situation, our results may be severely biased For example, suppose we are interested in a measurement that reflects the health status of patients and this information is missing for some patients because they were not well enough to attend their clinic appointments: we are likely to get an overly optimistic overall view of the patients’ health if we take no account of the missing data in the analysis.
Provided the missing data are not MNAR, we may be able to estimate (impute1) the missing data2. A simple approach is to replace a missing observation by the mean of the existing observations for that variable or, if the data are longitudinal, by the last observed value. These are examples of single imputation. In multiple imputation, we create a number (generally up to five) of imputed data sets from the original data set, with the missing values replaced by imputed values which are derived from an appropriate model that incorporates random variation. We then use standard statistical procedures on each complete imputed data set and finally combine the results from these analyses. Alternative statistical approaches to dealing with missing data are available2, but the best option is to minimize the amount of missing data at the outset.
Outliers
What are outliers?
Outliers are observations that are distinct from the main body of the data, and are incompatible with the rest of the data. These values may be genuine observations from individuals with very extreme levels of the variable. However, they may also result from typing errors or the incorrect choice of units, and so any suspicious values should be checked. It is important to detect whether there are outliers in the data set, as they may have a considerable impact on the results from some types of analyses (Chapter 29).
For example, a woman who is 7 feet tall would probably appear as an outlier in most data sets. However, although this value is clearly very high, compared with the usual heights of women, it may be genuine and the woman may simply be very tall. In this case, you should investigate this value further, possibly checking other variables such as her age and weight, before making any decisions about the validity of the result. The value should only be changed if there really is evidence that it is incorrect.
Checking for outliers
A simple approach is to print the data and visually check them by eye. This is suitable if the number of observations is not too large and if the potential outlier is much lower or higher than the rest of the data. Range checking should also identify possible outliers. Alternatively, the data can be plotted in some way (Chapter 4) – outliers can be clearly identified on histograms and scatter plots (see also Chapter 29 for a discussion of outliers in regression analysis).
Handling outliers
It is important not to remove an individual from an analysis simply because his/her values are higher or lower than might be expected. However, the inclusion of outliers may affect the results when some statistical techniques are used. A simple approach is to repeat the analysis both including and excluding the value – this is a type of sensitivity analysis (Chapter 35). If the results are similar, then the outlier does not have a great influence on the result. However, if the results change drastically, it is important to use appropriate methods that are not affected by outliers to analyse the data. These include the use of transformations (Chapter 9) and non-parametric tests (Chapter 17).
Example
After entering the data described in Chapter 2, the data set is checked for errors (Fig. 3.1). Some of the inconsistencies highlighted are simple data entry errors. For example, the code of ‘41’ in the ‘Sex of baby’ column is incorrect as a result of the sex information being missing for patient 20; the rest of the data for patient 20 had been entered in the incorrect columns. Others (e.g. unusual values in the gestational age and weight columns) are likely to be errors, but the notes should be checked before any decision is made, as these may reflect genuine outliers. In this case, the gestational age of patient number 27 was 41 weeks, and it was decided that a weight of...
| Erscheint lt. Verlag | 23.7.2019 |
|---|---|
| Reihe/Serie | At a Glance |
| At a Glance | At a Glance |
| Sprache | englisch |
| Themenwelt | Medizin / Pharmazie ► Allgemeines / Lexika |
| Medizin / Pharmazie ► Medizinische Fachgebiete | |
| Studium ► Querschnittsbereiche ► Epidemiologie / Med. Biometrie | |
| Schlagworte | medical education • Medical Science • Medical Statistics & Epidemiology • medical statistics examples • medical statistics for students • medical statistics guide • medical statistics handbook • medical statistics introduction • medical statistics reference • medical statistics revision aid • medical statistics techniques • Medizin • Medizinische Statistik u. Epidemiologie • Medizinstudium • Statistics • Statistik |
| ISBN-13 | 9781119167839 / 9781119167839 |
| Informationen gemäß Produktsicherheitsverordnung (GPSR) | |
| Haben Sie eine Frage zum Produkt? |
Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM
Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine
Geräteliste und zusätzliche Hinweise
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich