Professor Fabio Ciravegna,
University of Sheffield, UKFabio
Ciravegna is a full Professor of Computer Science at the University of
Sheffield. His area of research is Information Extraction (IE) from documents
where he has been active since 1988. He coordinated the IE activity both at
the Fiat Research Centre, Turin, Italy (1991-1993) and ITC-Irst, Trento, Italy
(1995-2000). He is the author of more than 40 publications on books, journals,
conferences and workshops. He has been principal investigator of four IE
systems, including Sintesi, a system for IE from car failure reports (one of
the pioneering pieces of work in IE in Europe), LearningPinocchio, one of the
first industrial systems for adaptive IE in the world; Amilcare, an adaptive
IE system for Semantic Web applications and Armadillo, a system for
Information Extraction and Integration for the Semantic Web. Currently he is
coordinator of the European consortium for Dot.Kom (a European project on
Information Extraction-based services for the Semantic Web ) and
co-investigator in the EPSRC IRC AKT project (a multi-million project for
research on knowledge-based systems for Knowledge Management). He is also
co-investigator for the EPSRC MI-AKT project on research on knowledge-based
tools in medicine. Ciravegna has organised workshops on adaptive IE at
international conferences (IJCAI01, ECAI02, ECML03, AAAI04), given tutorials
at national and international conferences (ECML03, ECAI02 and AI*IA89). He was
invited lecturer on IE to first European Summer School on Semantic Web in
2003. He is invited keynote speaker at the international conference on Natural
Language and Information Systems 2004 and invited panelist at the next Search
Engine Meeting 2004. He holds a PhD in Information Systems from the University
of East Anglia and a doctorship in Computer Science from the University of
Turin, Italy.
Challenges in Harvesting Information for the Semantic Web
The Semantic Web provides opportunities for new ways of retrieving,
managing and exchanging information on the Web. The aim is to produce
documents for automatic use and not only human reading as in the current Web.
The SW requires the annotation of documents with ontology-based semantics.
The current expectation is that users will annotate their own documents
manually. However, Web users are very unlikely to annotate their own documents
and even if they did, the quality of their annotation could be low in the
average case. Moreover, there is the concrete risk that professional spamming
companies will undermine the usability of the SW with devious annotations. In
this talk, I will propose producing automatic semantic annotation engines
(SAE). SAE’s work in a way similar to today’s search engines and allow
annotating and retrieving information in large repositories for SW uses. If
successful SAE’s will avoid the bottleneck of human centred annotation and the
dangers of low quality or devious annotation. Our methodology is based on
exploiting the redundancy of information on large repositories in order to
train information extraction systems in an unsupervised way. Learning is
seeded by integrating information from structured sources (e.g. databases and
digital libraries). Retrieved information is then used to bootstrap learning
for simple Information Extraction (IE) methodologies (e.g. wrappers), which in
turn produce more annotation to train more complex IE engines.
:: back to the top
|