Reproducibility in Collaborative Machine Learning
Information
Författare: Adam Ekström Hagevall, Carl WikströmBeräknat färdigt: 2021-01
Handledare: Stefan Hellander
Handledares företag/institution: Scaleout Systems
Ämnesgranskare: Olle Gällmo
Övrigt: -
Presentationer
Presentation av Adam Ekström HagevallPresentationstid: 2021-02-11 14:15
Presentation av Carl Wikström
Presentationstid: 2021-02-11 15:15
Opponenter: Gustav Demmelmaier, Carl Westerberg
Abstract
Many argue that there is an ongoing reproducibility crisis in the scientific community, particularly in the fields of data science and machine learning. Promising and exciting research findings cannot be refuted nor confirmed, let alone accepted by the scientific community if there is no clear way of reproducing the experiments it is grounded upon. One could assume that this crisis would be less prominent within machine learning: After all, everything needed to reproduce machine learning experiments should be readily available on the computer. This is not the case, however, and machine learning and artificial intelligence is equally, if not more, exposed to this crisis as any other area of science.
The purpose of this thesis paper was to develop new features in the cloud-native and open-source machine learning platform STACKn, aiming to strengthen the platform’s support for conducting reproducible machine learning experiments through provenance, transparency and reusability. Adhering to the definition of reproducibility as the ability of independent researchers to exactly duplicate scientific results with the same material as in the original experiment, two concepts were explored as alternatives for this specific goal: 1) Increased support for standardized textual documentation of machine learning models and their corresponding datasets; and 2) Increased support for provenance to track the lineage of machine learning models by making code, data and metadata readily available and stored for future reference. We set out to investigate to what degree these features could increase reproducibility in STACKn, both when used in isolation and when combined.
When these features had been implemented through an exhaustive software engineering process, an evaluation of the implemented features was made in an effort to quantify the degree of reproducibility that STACKn supports, both with and without the added functionalities. The evaluation shows that the implemented features, especially provenance features, substantially increase the possibilities to conduct reproducible experiments in STACKn, as opposed to when none of the developed features are used. While the employed evaluation method was not entirely objective, these features are clearly a good first initiative in meeting current recommendations and guidelines to how computational science can be made reproducible.