We propose a nonparametric compound Poisson model for underreported count data that introduces a latent clustering structure for the reporting probabilities. The latter are estimated with the model's parameters based on experts' opinion and exploiting a proxy for the reporting process. The proposed model is used to estimate the prevalence of chronic kidney disease in Apulia, Italy, based on a unique statistical database covering information on m = 258 municipalities obtained by integrating multisource register information. Accurate prevalence estimates are needed for monitoring, surveillance, and management purposes; yet, counts are deemed to be considerably underreported, especially in some areas of Apulia, one of the most deprived and heterogeneous regions in Italy. Our results agree with previous findings and highlight interesting geographical patterns of the disease. We compare our model to existing approaches in the literature using simulated as well as real data on early neonatal mortality risk in Brazil, described in previous research: the proposed approach proves to be accurate and particularly suitable when partial information about data quality is available.
A Bayesian nonparametric approach to correct for underreporting in count data
Arima, Serena
Primo
Conceptualization
;
2023-01-01
Abstract
We propose a nonparametric compound Poisson model for underreported count data that introduces a latent clustering structure for the reporting probabilities. The latter are estimated with the model's parameters based on experts' opinion and exploiting a proxy for the reporting process. The proposed model is used to estimate the prevalence of chronic kidney disease in Apulia, Italy, based on a unique statistical database covering information on m = 258 municipalities obtained by integrating multisource register information. Accurate prevalence estimates are needed for monitoring, surveillance, and management purposes; yet, counts are deemed to be considerably underreported, especially in some areas of Apulia, one of the most deprived and heterogeneous regions in Italy. Our results agree with previous findings and highlight interesting geographical patterns of the disease. We compare our model to existing approaches in the literature using simulated as well as real data on early neonatal mortality risk in Brazil, described in previous research: the proposed approach proves to be accurate and particularly suitable when partial information about data quality is available.File | Dimensione | Formato | |
---|---|---|---|
Biostat_published_paper.pdf
non disponibili
Tipologia:
Versione editoriale
Licenza:
Copyright dell'editore
Dimensione
858.69 kB
Formato
Adobe PDF
|
858.69 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.