Neuer Konferenzbeitrag für die „Conference on Health, Inference, and Learning“ (2021) verfügbar

(15.04.2021) Unser neuer Konferenzbeitrag „Trustworthy machine learning for health care: scalable data valuation with the shapley value “ von Konstantin Pandl, Fabian Feiland, Scott Thiebes, und Ali Sunyaev wurde für die Konferenz „CHIL '21: ACM Conference on Health, Inference, and Learning“, welche am 7. und 8.04.2021 als Online-Event stattfindet, angenommen und ist nach der Konferenz online verfügbar.

Abstract (englisch):
Collecting data from many sources is an essential approach to generate large data sets required for the training of machine learning models. Trustworthy machine learning requires incentives, guarantees of data quality, and information privacy. Applying recent advancements in data valuation methods for machine learning can help to enable these. In this work, we analyze the suitability of three different data valuation methods for medical image classification tasks, specifically pleural effusion, on an extensive data set of chest X-ray scans. Our results reveal that a heuristic for calculating the Shapley valuation scheme based on a k-nearest neighbor classifier can successfully value large quantities of data instances. We also demonstrate possible applications for incentivizing data sharing, the efficient detection of mislabeled data, and summarizing data sets to exclude private information. Thereby, this work contributes to developing modern data infrastructures for trustworthy machine learning in health care.

Der Artikel ist nach der Konferenz online unter https://dl.acm.org/doi/abs/10.1145/3450439.3451861 einsehbar. DOI: 10.1145/3450439.3451861.