|
|
This article is in need of attention from an expert on the subject. WikiProject Statistics or the Statistics Portal may be able to help recruit one. (February 2009) |
- There is also an imputation disambiguation page.
In statistics, imputation is the substitution of some value for a missing data point or a missing component of a data point. Once all missing values have been imputed, the dataset can then be analysed using standard techniques for complete data. The analysis should ideally take into account that there is a greater degree of uncertainty than if the imputed values had actually been observed, however, and this generally requires some modification of the standard complete-data analysis methods. While many imputation techniques are available, two of the most commonly used are hot-deck imputation and regression imputation.
Hot-deck imputation fills in missing values on incomplete records using values from similar, but complete records of the same dataset. (The term "hot deck" dates back to the storage of data on punch cards, and indicates that the information donors come from the same dataset as the recipients; the stack of cards was "hot" because it was currently being processed. Cold-deck imputation, by contrast, selects donors from another dataset.)
Since standard analysis techniques do not reflect the additional uncertainty due to imputing for missing data, further adjustments (such as multiple imputation or a Rao-Shao correction) are necessary to account for this.
Contents |
Alternatives to imputing missing data
Imputation is not the only method available for handling missing data. It usually gives better results than listwise deletion (in which all subjects with any missing values are omitted from the analysis), and may be competitive with a maximum likelihood approach in many circumstances. Other successful methods include computational intelligence methods. [1]
In machine learning, it is sometimes possible to learn a classifier directly over the original data without imputing it first. This was shown to yield better performance in cases where the missing data is structurally absent, rather than missing due to measurement noise.[citation needed]
See also
- Bootstrapping (statistics)
- Censoring (statistics)
- Expectation-maximization algorithm
- Geo-imputation
- Regression estimation
References
- ^ T. Marwala. Computational Intelligence for Missing Data Imputation, Estimation, and Management Knowledge Optimization Techniques. Information Science Reference, ISBN: 978-1-60566-336-4..
External links
- Missing Data: Instrument-Level Heffalumps and Item-Level Woozles
- Multiple-imputation.com
- Multiple imputation FAQs, Penn State U
- A description of hot deck imputation from Statistics Finland.
- Paper extending Rao-Shao approach and discussing problems with multiple imputation.
| This statistics-related article is a stub. You can help Wikipedia by expanding it. |
This entry is from Wikipedia, the leading user-contributed encyclopedia. It may not have been reviewed by professional editors (see full disclaimer)




