Sparse Robust Regression for Explaining Classifiers
Björklund, Anton (2019)
Björklund, Anton
Åbo Akademi
2019
Julkaisu on tekijänoikeussäännösten alainen. Teosta voi lukea ja tulostaa henkilökohtaista käyttöä varten. Käyttö kaupallisiin tarkoituksiin on kielletty.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi-fe2019081624406
https://urn.fi/URN:NBN:fi-fe2019081624406
Tiivistelmä
A common characteristic of many datasets is the presence of outliers, items that
do not follow the same structure as the rest of the data. If the outliers are
not taken into account it will have negative consequences when the dataset is used,
such as leading to the wrong conclusions. The field of robust statistics is
concerned with finding and dealing with the outliers. This thesis introduces a
novel algorithm for robust regression called SLISE, Sparse LInear Subset Explanations. SLISE is able to ignore
outliers by finding the largest subset of data items that can be represented by
a linear model to a given accuracy.
In this thesis SLISE is compared to existing robust regression methods both
theoretically and empirically. We find that SLISE is as robust as
state-of-the-art methods and is faster on large datasets, which is
important with regards to the ever-growing sizes of modern datasets.
One of the most interesting applications for SLISE is to explain outcomes from
black box models. With the increase of machine learning these kinds of models
become more and more prevalent, but in many situations this opaqueness limits their
usefulness. Thus recently there have been a lot of research into explaining
outcomes from black box models. Of the explanations the local explanations
seems to be easiest to interpret. Local explanations are only valid
for one item or a subset of all possible items but this enables them to focus on
the important features for those specific items.
Similar to many other local explanation methods SLISE gives explanations in the
form of linear models that locally approximate the black box model. An advantage
with SLISE is that no new data or new outcomes are required contrary to many
existing methods that have data-specific mutation processes. This allows SLISE
to account for constraint and structures inherent to the data, such as
conservation laws in physical systems.
do not follow the same structure as the rest of the data. If the outliers are
not taken into account it will have negative consequences when the dataset is used,
such as leading to the wrong conclusions. The field of robust statistics is
concerned with finding and dealing with the outliers. This thesis introduces a
novel algorithm for robust regression called SLISE, Sparse LInear Subset Explanations. SLISE is able to ignore
outliers by finding the largest subset of data items that can be represented by
a linear model to a given accuracy.
In this thesis SLISE is compared to existing robust regression methods both
theoretically and empirically. We find that SLISE is as robust as
state-of-the-art methods and is faster on large datasets, which is
important with regards to the ever-growing sizes of modern datasets.
One of the most interesting applications for SLISE is to explain outcomes from
black box models. With the increase of machine learning these kinds of models
become more and more prevalent, but in many situations this opaqueness limits their
usefulness. Thus recently there have been a lot of research into explaining
outcomes from black box models. Of the explanations the local explanations
seems to be easiest to interpret. Local explanations are only valid
for one item or a subset of all possible items but this enables them to focus on
the important features for those specific items.
Similar to many other local explanation methods SLISE gives explanations in the
form of linear models that locally approximate the black box model. An advantage
with SLISE is that no new data or new outcomes are required contrary to many
existing methods that have data-specific mutation processes. This allows SLISE
to account for constraint and structures inherent to the data, such as
conservation laws in physical systems.