INformation, FILtering, Evaluation

INFILE: INformation, Filtering, Evaluation

Participation

June 2, 2009 The Interactive filtering task is about to start Please download the new client software infileClient-1.2.tgz to be installed on your side. Please check the Interactive Filtering evaluation protocol.

April 2, 2009 The Batch Filtering evaluation task has just started. If you didn't receive the evaluation data, please contact the organizers.

If you want to join the Batch Filtering or Adaptative Filtering evaluation tasks, please register through the CLEF website.

REGISTRATION is now open. If you are interested in participating in INFILE 2009, please visit the CLEF Registration page for instructions and registration form. Please remember to fill in the registration form and send 2 copies of the End-User agreement to the CLEF coordinator.

The 2009 call for participation is available.

INFILE welcomes participation of any institution, academic an industrial.

The participation is free of charge and participants can keep and use the development and evaluation data for free after the evaluations for research and development purposes.

Background

INFILE (INformation, FILtering, Evaluation), sponsored by the French National Research Agency, is a cross-language adaptive filtering evaluation campaign organized by the CEA LIST (M. Laïb, R. Besançon), ELDA (D. Mostefa, K. Choukri), and the University of Lille 3 (S. Chaudiron, I. Timimi).

The goal of the INFILE project is to organize evaluation campaigns for monolingual and multilingual information filtering systems based on close-to-real-usage conditions for intelligence applications. Both methodology and metrics are discussed within a group of experts, set up at the beginning of the project.

The campaign is directed to R&D laboratories and software publishers that would like to evaluate their technology, in a real-use context and according to the needs of information routing for technology watch. This project is not limited to French participants.

The languages under consideration in the evaluation campaign are English, Arabic and French, either in monolingual (or multilingual) filtering, or in interlingual filtering.

The documents (input information) is from two different information domains: scientific and technical, on the one hand, and journalistic, on the other. These two information types correspond to a context of technology watch for the first one, and to a context of general intelligence for the second one (watching for political information, for image information, or the follow-up of operations such as mergings and acquisitions, etc.). The evaluation campaign takes into account the results from the discussions between the organizers and the community of researchers, the software publishers and the previous evaluation campaign. During the first phase, a dry run will carried out in order to ensure the good functioning of the system evaluation.

Two corpora are developed with profiles and the associated field results.

Current evaluation campaign

INFILE 2009 is a pilot track in CLEF ( Cross-Language Evaluation Forum) and is scientifically endorsed by NIST TREC (Text Retrieval Evaluation Conference).

In 2009 two evaluation tasks are considered: interactive filtering and batch filtering.

Tasks

Interactive filtering task

For this task, a set of documents is automatically sent to each of the systems being tested by the evaluating system. The assignment of each document to 0, 1 or several profiles must be returned automatically. The evaluating system will return the errors, thus allowing the system to improve its performance.

This process will be repeated a number of times in order for any improvement in the system's performance to be visualized and assessed.

A client-server architecture has been developed to support this protocol.

Batch filtering task

For the batch task, the whole corpus of documents and the set of profiles are provided to the participants and the systems are expected to give back the results of the filtering system.

Dissemination

The results will be computed and communicated to the participants for discussion. The results of the campaign and the new evaluation methods will be presented in the framework of a workshop at the end of the project and will be published anonymously.

At the end of the project, an evaluation kit will be made available to the community. With this evaluation kit, new teams will be able to assess and compare their system's results with those of the participants, in the same conditions as during the evaluation campaign.

Resources

Corpus

The InFile corpus is made of newswires provided by the Agence France Presse (AFP) for research purpose.

For InFile, we selected 3 languages, (Arabic, English and French) and a 3 years period (2004-2006) which represents a collection of about one and half millions newswires for around 10 GB. Newswires are available in three languages, Arabic, English and French but are not translations from a language to another. News articles are encoded in XML format and follow the News Markup Language (NewsML) specifications.

Here are some examples for Arabic, English and French.

Only 100 000 documents of each language are used for the filtering test, in order to cope with the time constraints of an interactive filtering process. These documents correspond to the set of relevant documents for the topics completed by a set of non-relevant documents as shown in the figure below

Test collection construction

Development data

Schedule

  • January 2009: Registration Opens.
  • April 01st to May 30th, 2009 : session of Batch Filtering.
  • June 01st to June 30th, 2009 : session of Adaptive Filtering.
  • July 15th, 2009: Communication of Individual Results.
  • August 30th, 2009 : Submission of Paper for CLEF.
  • September 30th to 2 October, 2009: CLEF workshop in Corfu, Greece

Contact

If you wish to participate in the INFILE Campaign or if you are interested in participating in the group of experts of this project, please contact us

Publications

  • R. Besançon, D. Mostefa, I. Timimi, S. Chaudiron, M. Laib, K. Choukri,”Arabic, English and Frensh : three Languages in a Filtering Systems Evaluation Project”, MEDAR, Cairo (Egypt), April 2009.
  • Chaudiron S., Timimi I., Besançon R., Mostefa D., Laib M., Choukri K. (2009), « L'évaluation d'un système de filtrage automatique de l'information en contexte de veille : cadre réflexif et premiers retours », (to appear) In the Systèmes d'Information et Intelligence Economique (SII'2009), Hammamet, Tunisia, February 12-14 2009.
  • Besançon R., Chaudiron S., Mostefa D., Hamon O., Timimi I., Choukri K.(2008) Overview of the CLEF 2008 INFILE Pilot Track In Working Notes of the Cross Language Evaluation Forum (CLEF 2008), Aarhus, Sept. 2008.
  • Chaudiron S., Timimi I. (2008), « Information Filtering as a Knowledge Organization process: techniques and evaluation » In the International Society for Knowledge Organization (ISKO'08), Montreal, August 5-8 2008.
  • Chaudiron S., Besançon R., Mostefa D., Timimi I., Laib M., Choukri K. (2008) InFile : une campagne d'évaluation des logiciels de filtrage d'information textuelle In Actes du Colloque International en traductologie et TAL, Oran, Algeria, June 2008.
  • Besançon R., Chaudiron S., Mostefa D., Timimi I. Choukri K. (2008) The InFile project: a crosslingual filtering systems evaluation campaign In Proceedings of LREC 2008, Marrakech, May 2008.
 
start.txt · Last modified: 2009/06/02 11:27 by djamel
 
Recent changes RSS feed Creative Commons License Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki