Aims and scope
Conference organisation
Paper submission
Invited speaker
Conference registration
Conference program
Conference venue
Travel information
Conference photos

| email us  

Professor  Fabio Ciravegna, University of Sheffield, UK

Fabio Ciravegna is a full Professor of Computer Science at the University of Sheffield. His area of research is Information Extraction (IE) from documents where he has been active since 1988. He coordinated the IE activity both at the Fiat Research Centre, Turin, Italy (1991-1993) and ITC-Irst, Trento, Italy (1995-2000). He is the author of more than 40 publications on books, journals, conferences and workshops. He has been principal investigator of four IE systems, including Sintesi, a system for IE from car failure reports (one of the pioneering pieces of work in IE in Europe), LearningPinocchio, one of the first industrial systems for adaptive IE in the world; Amilcare, an adaptive IE system for Semantic Web applications and Armadillo, a system for Information Extraction and Integration for the Semantic Web. Currently he is coordinator of the European consortium for Dot.Kom (a European project on Information Extraction-based services for the Semantic Web ) and co-investigator in the EPSRC IRC AKT project (a multi-million project for research on knowledge-based systems for Knowledge Management). He is also co-investigator for the EPSRC MI-AKT project on research on knowledge-based tools in medicine. Ciravegna has organised workshops on adaptive IE at international conferences (IJCAI01, ECAI02, ECML03, AAAI04), given tutorials at national and international conferences (ECML03, ECAI02 and AI*IA89). He was invited lecturer on IE to first European Summer School on Semantic Web in 2003. He is invited keynote speaker at the international conference on Natural Language and Information Systems 2004 and invited panelist at the next Search Engine Meeting 2004. He holds a PhD in Information Systems from the University of East Anglia and a doctorship in Computer Science from the University of Turin, Italy.

Challenges in Harvesting Information for the Semantic Web

The Semantic Web provides opportunities for new ways of retrieving, managing and exchanging information on the Web. The aim is to produce documents for automatic use and not only human reading as in the current Web. The SW requires the annotation of documents with ontology-based semantics.

The current expectation is that users will annotate their own documents manually. However, Web users are very unlikely to annotate their own documents and even if they did, the quality of their annotation could be low in the average case. Moreover, there is the concrete risk that professional spamming companies will undermine the usability of the SW with devious annotations. In this talk, I will propose producing automatic semantic annotation engines (SAE). SAE’s work in a way similar to today’s search engines and allow annotating and retrieving information in large repositories for SW uses. If successful SAE’s will avoid the bottleneck of human centred annotation and the dangers of low quality or devious annotation. Our methodology is based on exploiting the redundancy of information on large repositories in order to train information extraction systems in an unsupervised way. Learning is seeded by integrating information from structured sources (e.g. databases and digital libraries). Retrieved information is then used to bootstrap learning for simple Information Extraction (IE) methodologies (e.g. wrappers), which in turn produce more annotation to train more complex IE engines.

:: back to the top

Site designed by alsoft.co.uk