Noa Zilberman, Andrew W. Moore
Abstract
Reproducibility: the extent to which consistent results are obtained when an experiment is repeated, is important as a means to validate experimental results, promote integrity of research, and accelerate follow up work. Commitment to artifact reviewing and badging seeks to promote reproducibility and rank the quality of submitted artifacts.
However, as illustrated in this issue, the current badging scheme, with its focus upon an artifact being reusable, may not identify limitations of architecture, implementation, or evaluation.
We propose that to improve the insight into artifact reproducibility, the depth and nature of artifact evaluation must move beyond simply considering if an artifact is reusable. Artifact evaluation should consider the methods of that evaluation alongside the varying of inputs to that evaluation. To achieve this, we suggest an extension to the scope of artifact badging, and describe both approaches and best practice arising in other communities. We seek to promote conversation and make a call to action intended to strengthen the scientific method within our domain.