Knowledge-lean Text Mining
Rönnqvist, Samuel (2017-12-08)
Rönnqvist, Samuel
Turku Centre for Computer Science (TUCS)
08.12.2017
Julkaisu on tekijänoikeussäännösten alainen. Teosta voi lukea ja tulostaa henkilökohtaista käyttöä varten. Käyttö kaupallisiin tarkoituksiin on kielletty.
Julkaisun pysyvä osoite on
https://urn.fi/URN:ISBN:978-952-12-3622-8
https://urn.fi/URN:ISBN:978-952-12-3622-8
Tiivistelmä
This thesis explores the process of introducing text mining to new areas of application, which involves both defining appropriate types of analysis and often designing appropriate computational methods to support the analysis. Targeted toward a particular use, text mining resources tend to become highly specialized and require considerable efforts in development. The thesis addresses the question of what computational methods can serve practical text analysis needs, while avoiding costly and narrow development of linguistic resources.
Relying on machine learning and visualization, this knowledge-lean approach assumes minimal encoding of prior knowledge into resources, which is essential in entering uncharted text mining territory, that is, areas too new or too marginal to be well served by traditional text mining approaches. Knowledge-lean text mining is explored within the domain of systemic financial risk, where few text mining efforts have previously been pursued.
Without the support of existing linguistic resources for the task, unsupervised and data-driven methods play a key role in providing flexible means for text analysis. The central theme of representation learning is studied also in the context of fully knowledge-free, domain-independent topic modeling and linguistically resource-lean discourse structure parsing for the refinement of text mining results.
The research has been able to establish the value of knowledge-lean text mining, by exploring the use of text as a source of information for systemic risk analytics. Furthermore, the work on discourse parsing has shown that competitive - and in some cases state-of-the-art - performance can be achieved without relying on explicit encoding of linguistic knowledge.
Relying on machine learning and visualization, this knowledge-lean approach assumes minimal encoding of prior knowledge into resources, which is essential in entering uncharted text mining territory, that is, areas too new or too marginal to be well served by traditional text mining approaches. Knowledge-lean text mining is explored within the domain of systemic financial risk, where few text mining efforts have previously been pursued.
Without the support of existing linguistic resources for the task, unsupervised and data-driven methods play a key role in providing flexible means for text analysis. The central theme of representation learning is studied also in the context of fully knowledge-free, domain-independent topic modeling and linguistically resource-lean discourse structure parsing for the refinement of text mining results.
The research has been able to establish the value of knowledge-lean text mining, by exploring the use of text as a source of information for systemic risk analytics. Furthermore, the work on discourse parsing has shown that competitive - and in some cases state-of-the-art - performance can be achieved without relying on explicit encoding of linguistic knowledge.