CorText Manager around the world

2017

PhD Theses

Ruiz, Pablo

Concept-based and relation-based corpus navigation : applications of natural language processing in digital humanities PhD Thesis

PSL Research University, 2017, (HAL Id : tel-01575167 , version 2).

Abstract | BibTeX | Links:

@phdthesis{Ruiz2017,

title = {Concept-based and relation-based corpus navigation : applications of natural language processing in digital humanities},

author = {Pablo Ruiz},

url = {https://tel.archives-ouvertes.fr/tel-01575167v2},

year = {2017},

date = {2017-06-23},

urldate = {2017-06-23},

school = {PSL Research University},

abstract = {Social sciences and Humanities research is often based on large textual corpora, that it would be unfeasible to read in detail. Natural Language Processing (NLP) can identify important concepts and actors mentioned in a corpus, as well as the relations between them. Such information can provide an overview of the corpus useful for domain-experts, and help identify corpus areas relevant for a given research question. To automatically annotate corpora relevant for Digital Humanities (DH), the NLP technologies we applied are, first, Entity Linking, to identify corpus actors and concepts. Second, the relations between actors and concepts were determined based on an NLP pipeline which provides semantic role labeling and syntactic dependencies among other information. Part I outlines the state of the art, paying attention to how the technologies have been applied in DH. Generic NLP tools were used. As the efficacy of NLP methods depends on the corpus, some technological development was undertaken, described in Part II, in order to better adapt to the corpora in our case studies. Part II also shows an intrinsic evaluation of the technology developed, with satisfactory results. The technologies were applied to three very different corpora, as described in Part III. First, the manuscripts of Jeremy Bentham. This is a 18th–19th century corpus in political philosophy. Second, the Poli Informatics corpus, with heterogeneous materials about the American financial crisis of 2007–2008. Finally, the Earth Negotiations Bulletin (ENB), which covers international climate summits since 1995, where treaties like the Kyoto Protocol or the Paris Agreements get negotiated. For each corpus, navigation interfaces were developed. These user interfaces (UI) combine networks, full-text search and structured search based on NLP annotations. As an example, in the ENB corpus interface, which covers climate policy negotiations, searches can be performed based on relational information identified in the corpus: The negotiation actors having discussed a given issue using verbs indicating supportor opposition can be searched, as well as all statements where a given actor has expressed support or opposition. Relation information is employed, beyond simple co-occurrence between corpus terms. The UIs were evaluated qualitatively with domain-experts, to assess their potential usefulness for research in the experts’ domains. First, we payed attention to whether the corpus representations we created correspond to experts’ knowledge of thecorpus, as an indication of the sanity of the outputs we produced. Second, we tried to determine whether experts could gain new insight on the corpus by using the applications, e.g. if they found evidence unknown to them or new research ideas. Examples of insight gain were attested with the ENB interface; this constitutes a good validation of the work carried out in the thesis. Overall, the applications’ strengths and weaknesses were pointed out, outlining possible improvements as future work.},

note = {HAL Id : tel-01575167 , version 2},

keywords = {},

pubstate = {published},

tppubtype = {phdthesis}

}

Social sciences and Humanities research is often based on large textual corpora, that it would be unfeasible to read in detail. Natural Language Processing (NLP) can identify important concepts and actors mentioned in a corpus, as well as the relations between them. Such information can provide an overview of the corpus useful for domain-experts, and help identify corpus areas relevant for a given research question. To automatically annotate corpora relevant for Digital Humanities (DH), the NLP technologies we applied are, first, Entity Linking, to identify corpus actors and concepts. Second, the relations between actors and concepts were determined based on an NLP pipeline which provides semantic role labeling and syntactic dependencies among other information. Part I outlines the state of the art, paying attention to how the technologies have been applied in DH. Generic NLP tools were used. As the efficacy of NLP methods depends on the corpus, some technological development was undertaken, described in Part II, in order to better adapt to the corpora in our case studies. Part II also shows an intrinsic evaluation of the technology developed, with satisfactory results. The technologies were applied to three very different corpora, as described in Part III. First, the manuscripts of Jeremy Bentham. This is a 18th–19th century corpus in political philosophy. Second, the Poli Informatics corpus, with heterogeneous materials about the American financial crisis of 2007–2008. Finally, the Earth Negotiations Bulletin (ENB), which covers international climate summits since 1995, where treaties like the Kyoto Protocol or the Paris Agreements get negotiated. For each corpus, navigation interfaces were developed. These user interfaces (UI) combine networks, full-text search and structured search based on NLP annotations. As an example, in the ENB corpus interface, which covers climate policy negotiations, searches can be performed based on relational information identified in the corpus: The negotiation actors having discussed a given issue using verbs indicating supportor opposition can be searched, as well as all statements where a given actor has expressed support or opposition. Relation information is employed, beyond simple co-occurrence between corpus terms. The UIs were evaluated qualitatively with domain-experts, to assess their potential usefulness for research in the experts’ domains. First, we payed attention to whether the corpus representations we created correspond to experts’ knowledge of thecorpus, as an indication of the sanity of the outputs we produced. Second, we tried to determine whether experts could gain new insight on the corpus by using the applications, e.g. if they found evidence unknown to them or new research ideas. Examples of insight gain were attested with the ENB interface; this constitutes a good validation of the work carried out in the thesis. Overall, the applications’ strengths and weaknesses were pointed out, outlining possible improvements as future work.

2016

Book Chapters

Baya-Laffite, Nicolas; Cointet, Jean-Philippe

Mapping Topics in International Climate Negotiations: A Computer-Assisted Semantic Network Approach Book Chapter

In: Innovative Methods in Media and Communication Research, pp. 273-291, Springer, 2016.

Abstract | BibTeX | Links:

Proceedings Articles

Ruiz, Pablo; Plancq, Clément; Poibeau, Thierry

Climate Negotiation Analysis Proceedings Article

In: Digital Humanities 2016, pp. 663-666, 2016.

Abstract | BibTeX | Links:

LIST OF SCIENTIFIC WORKS THAT HAVE USED CORTEXT MANAGER
(Sources: Google Scholar, HAL, Scopus, WOS and search engines)

We are grateful that you have found CorTexT Manager useful. Over the years, you have been more than 1050 authors to trust CorTexT for your publicly accessible analyzes. This represents a little less than 10% of CorTexT Manager user’s community. So, thank you!

We seek to understand how the scientific production that used CorText Manager has evolved and to characterise it. You will find here our analysis of this scientific production.

Browse documents by main topics

Documents per language
Documents per language
350 English
121 French
13 Portuguese
12 Spanish
4 Chinese
4 Danish
2 Korean
1 Japanese
1 Norwegian
1 Ukrainian
1 Persian

What types of documents?
What types of documents?
234 journal articles
42 conference proceedings
41 conference (not in proceedings)
39 Ph.D. thesis
31 reports
30 online articles
23 book chapters
21 masters thesis
12 workshop
11 bachelorthesis
10 book
5 miscellaneous
2 presentation
1 proceedings
1 workingpaper
1 manual

Main peer-reviewed journals
Main peer-reviewed journals
Scientometrics
I2D - Information, données & documents
PloS one
Revue d’anthropologie des connaissances
Réseaux
Journal of Rural Studies
Revue d'anthropologie des connaissances
Library Hi Tech
Agronomy
Agronomy for Sustainable Development

2017

PhD Theses

2016

Book Chapters

Proceedings Articles

Browse documents by main topics

Browse narrower topics