Objective: If a gold standard is lacking in a diagnostic test accuracy study, expert diagnosis is frequently used as reference standard. However, interobserver and intraobserver agreements are imperfect. The aim of this study was to quantify the reproducibility of a panel diagnosis for pediatric infectious diseases. Study Design and Setting: Pediatricians from six countries adjudicated a diagnosis (i.e., bacterial infection, viral infection, or indeterminate)for febrile children. Diagnosis was reached when the majority of panel members came to the same diagnosis, leaving others inconclusive. We evaluated intraobserver and intrapanel agreement with 6 weeks and 3 years’ time intervals. We calculated the proportion of inconclusive diagnosis for a three-, five-, and seven-expert panel. Results: For both time intervals (i.e., 6 weeks and 3 years), intrapanel agreement was higher (kappa 0.88, 95%CI: 0.81-0.94 and 0.80, 95%CI: NA)compared to intraobserver agreement (kappa 0.77, 95%CI: 0.71-0.83 and 0.65, 95%CI: 0.52-0.78). After expanding the three-expert panel to five or seven experts, the proportion of inconclusive diagnoses (11%)remained the same. Conclusion: A panel consisting of three experts provides more reproducible diagnoses than an individual expert in children with lower respiratory tract infection or fever without source. Increasing the size of a panel beyond three experts has no major advantage for diagnosis reproducibility.