June 2, 2009 The Interactive filtering task is about to start Please download the new client software infileClient-1.2.tgz to be installed on your side. Please check the Interactive Filtering evaluation protocol.
April 2, 2009 The Batch Filtering evaluation task has just started. If you didn't receive the evaluation data, please contact the organizers.
If you want to join the Batch Filtering or Adaptative Filtering evaluation tasks, please register through the CLEF website.
REGISTRATION is now open. If you are interested in participating in INFILE 2009, please visit the CLEF Registration page for instructions and registration form. Please remember to fill in the registration form and send 2 copies of the End-User agreement to the CLEF coordinator.
The 2009 call for participation is available.
INFILE welcomes participation of any institution, academic an industrial.
The participation is free of charge and participants can keep and use the development and evaluation data for free after the evaluations for research and development purposes.
INFILE (INformation, FILtering, Evaluation), sponsored by the French National Research Agency, is a cross-language adaptive filtering evaluation campaign organized by the CEA LIST (M. Laïb, R. Besançon), ELDA (D. Mostefa, K. Choukri), and the University of Lille 3 (S. Chaudiron, I. Timimi).
The goal of the INFILE project is to organize evaluation campaigns for monolingual and multilingual information filtering systems based on close-to-real-usage conditions for intelligence applications. Both methodology and metrics are discussed within a group of experts, set up at the beginning of the project.
The campaign is directed to R&D laboratories and software publishers that would like to evaluate their technology, in a real-use context and according to the needs of information routing for technology watch. This project is not limited to French participants.
The languages under consideration in the evaluation campaign are English, Arabic and French, either in monolingual (or multilingual) filtering, or in interlingual filtering.
The documents (input information) is from two different information domains: scientific and technical, on the one hand, and journalistic, on the other. These two information types correspond to a context of technology watch for the first one, and to a context of general intelligence for the second one (watching for political information, for image information, or the follow-up of operations such as mergings and acquisitions, etc.). The evaluation campaign takes into account the results from the discussions between the organizers and the community of researchers, the software publishers and the previous evaluation campaign. During the first phase, a dry run will carried out in order to ensure the good functioning of the system evaluation.
Two corpora are developed with profiles and the associated field results.
INFILE 2009 is a pilot track in CLEF ( Cross-Language Evaluation Forum) and is scientifically endorsed by NIST TREC (Text Retrieval Evaluation Conference).
In 2009 two evaluation tasks are considered: interactive filtering and batch filtering.
For this task, a set of documents is automatically sent to each of the systems being tested by the evaluating system. The assignment of each document to 0, 1 or several profiles must be returned automatically. The evaluating system will return the errors, thus allowing the system to improve its performance.
This process will be repeated a number of times in order for any improvement in the system's performance to be visualized and assessed.
A client-server architecture has been developed to support this protocol.
For the batch task, the whole corpus of documents and the set of profiles are provided to the participants and the systems are expected to give back the results of the filtering system.
The results will be computed and communicated to the participants for discussion. The results of the campaign and the new evaluation methods will be presented in the framework of a workshop at the end of the project and will be published anonymously.
At the end of the project, an evaluation kit will be made available to the community. With this evaluation kit, new teams will be able to assess and compare their system's results with those of the participants, in the same conditions as during the evaluation campaign.
The InFile corpus is made of newswires provided by the Agence France Presse (AFP) for research purpose.
For InFile, we selected 3 languages, (Arabic, English and French) and a 3 years period (2004-2006) which represents a collection of about one and half millions newswires for around 10 GB. Newswires are available in three languages, Arabic, English and French but are not translations from a language to another. News articles are encoded in XML format and follow the News Markup Language (NewsML) specifications.
Here are some examples for Arabic, English and French.
Only 100 000 documents of each language are used for the filtering test, in order to cope with the time constraints of an interactive filtering process. These documents correspond to the set of relevant documents for the topics completed by a set of non-relevant documents as shown in the figure below
The development data is already available. It consists of :
If you wish to participate in the INFILE Campaign or if you are interested in participating in the group of experts of this project, please contact us