Appl Health Econ Health Policy. 2026 Mar 25.
BACKGROUND: Microsimulation models are increasingly used to project health trajectories of individuals with cardiometabolic diseases, including type 2 diabetes, obesity, cardiovascular disease, and chronic kidney disease. Despite the emergence of practice guidelines on model calibration and validation, it remains unclear whether practices in model development and reporting have improved accordingly.
OBJECTIVE: To summarize the characteristics of studies reporting cardiometabolic disease microsimulation models, assess how calibration and validation processes are reported, and examine variations in reporting practices by study characteristics.
METHODS: We searched PubMed, Embase, and Web of Science for studies reporting the original development of microsimulation models of cardiometabolic diseases published between 2016 and June 1, 2024. Studies reporting calibration and/or validation processes were included. We recorded study characteristics and assessed reporting adherence to six calibration processes (defining parameters, selecting targets, applying search strategies, specifying convergence criteria, establishing stopping rules, and selecting goodness-of-fit measures) and five validation processes (face validity, verification, cross-validation, external validation, and predictive validation) based on published practice guidelines. We further investigated variation in guideline adherence by study characteristics (modeling type, cardiometabolic diseases, publication year, baseline population data source, modeling country, simulation tool, and open-source status). This study is registered in PROSPERO (CRD42024562800).
RESULTS: Of 2646 studies screened, 31 were included in the final sample. Sixteen studies (52%) reported application-based model development and 15 (48%) reported natural history model development; 7 (23%) made their code publicly available; and 8 (26%) simulated three or more diseases. For calibration, 23 studies (74%) reported at least one of the six processes, most often specifying calibration targets (n = 22, 71%) and calibrated parameters (n = 21, 68%). For validation, 26 studies (84%) reported at least one of the five processes, most commonly external validation (n = 19, 61%), but no study reported predictive validation. Studies that developed natural history models more often reported goodness-of-fit measures, stopping rules, and external validation than application-based models. Studies that open-sourced their code reported statistical goodness-of-fit measures more frequently than those that did not. Models simulating three or more diseases more often documented face validity and verification than those simulating fewer diseases.
CONCLUSIONS: Reporting of calibration and validation in recent microsimulation models has improved, but important gaps remain. We suggest that future work prioritize (1) more rigorous calibration and validation in application-based model development, (2) clearer reporting of calibration processes, particularly parameter search strategies and convergence criteria, (3) stronger quantitative performance measures for external validation and greater use of predictive validation, and (4) broader adoption of open-source practices to enhance transparency and reproducibility.