Evaluation of Information Filtering Systems in a Competitive Intelligence Framework Workshop 22 May 2010 Held in conjunction with LREC 2010, Malta
The constant growth of publicly available information has entailed a consequently similar growth in the research areas devoted to the automated organization of this information. In particular, Information Filtering systems have been developed to tackle the issues of many applications of business intelligence, from mail categorization to news routing and technology watch. However, the theoretically challenging models of these systems have seldom been evaluated in real usage context. Indeed, standard evaluation benchmarks usually introduce artefacts that simplify the task. For instance, in the context of competitive intelligence, systems must filter documents without any global information on a !Hcollection!I but can use feedback from the user, and must be efficient enough to deal with a real-time document stream. Furthermore, with the increase of mondialized access and availability of the information, sources may be found in many languages, and the multilinguality issue must also be considered. TREC Adaptive Filtering Track and INFILE track at CLEF for multilingual information filtering have tried to propose evaluation frameworks closer to the usage. The goal of this workshop is to study different aspects of Information Filtering Evaluation and to bring together researchers from the community of Information Filtering to develop new evaluation frameworks and confront current models with these new evaluation models. Submissions are expected to propose new insights on evaluation methodologies and resources for Information Filtering or to present Information Filtering models that will meet the requirements of a real usage evaluation (in particular, researchers that submit papers presenting Information Filtering Models are encouraged to evaluate their model using an existing benchmark such as the INFILE benchmark, available at http://www.infile.org).
Both theoretical and practical research papers are welcome from both research and industrial communities addressing the main workshop theme, in any aspect including:
Papers will be submitted to the workshop via the START LREC Conference Manager, under https://www.softconf.com/lrec2010/InFile2010/ Authors should submit a PDF file of no more than 10 pages, following the LREC conference formatting details. Papers will be reviewed by three members of the Program Committee, Accepted papers will be published in the workshop Proceedings. When submitting a paper from the START page, authors will be asked to provide essential information about resources (in a broad sense, i.e. also technologies, standards, evaluation kits, etc.) that have been used for the work described in the paper or are a new result of your research. For further information on this new iniative, please refer to http://www.lrec-conf.org/lrec2010/?LREC2010-Map-of-Language-Resources.
June 2, 2009 The Interactive filtering task is about to start Please download the new client software infileClient-1.2.tgz to be installed on your side. Please check the Interactive Filtering evaluation protocol.
April 2, 2009 The Batch Filtering evaluation task has just started. If you didn't receive the evaluation data, please contact the organizers.
If you want to join the Batch Filtering or Adaptative Filtering evaluation tasks, please register through the CLEF website.
REGISTRATION is now open. If you are interested in participating in INFILE 2009, please visit the CLEF Registration page for instructions and registration form. Please remember to fill in the registration form and send 2 copies of the End-User agreement to the CLEF coordinator.
The 2009 call for participation is available.
INFILE welcomes participation of any institution, academic an industrial.
The participation is free of charge and participants can keep and use the development and evaluation data for free after the evaluations for research and development purposes.
INFILE (INformation, FILtering, Evaluation), sponsored by the French National Research Agency, is a cross-language adaptive filtering evaluation campaign organized by the CEA LIST (M. Laïb, R. Besançon), ELDA (D. Mostefa, K. Choukri), and the University of Lille 3 (S. Chaudiron, I. Timimi).
The goal of the INFILE project is to organize evaluation campaigns for monolingual and multilingual information filtering systems based on close-to-real-usage conditions for intelligence applications. Both methodology and metrics are discussed within a group of experts, set up at the beginning of the project.
The campaign is directed to R&D laboratories and software publishers that would like to evaluate their technology, in a real-use context and according to the needs of information routing for technology watch. This project is not limited to French participants.
The languages under consideration in the evaluation campaign are English, Arabic and French, either in monolingual (or multilingual) filtering, or in interlingual filtering.
The documents (input information) is from two different information domains: scientific and technical, on the one hand, and journalistic, on the other. These two information types correspond to a context of technology watch for the first one, and to a context of general intelligence for the second one (watching for political information, for image information, or the follow-up of operations such as mergings and acquisitions, etc.). The evaluation campaign takes into account the results from the discussions between the organizers and the community of researchers, the software publishers and the previous evaluation campaign. During the first phase, a dry run will carried out in order to ensure the good functioning of the system evaluation.
Two corpora are developed with profiles and the associated field results.
INFILE 2009 is a pilot track in CLEF ( Cross-Language Evaluation Forum) and is scientifically endorsed by NIST TREC (Text Retrieval Evaluation Conference).
In 2009 two evaluation tasks are considered: interactive filtering and batch filtering.
For this task, a set of documents is automatically sent to each of the systems being tested by the evaluating system. The assignment of each document to 0, 1 or several profiles must be returned automatically. The evaluating system will return the errors, thus allowing the system to improve its performance.
This process will be repeated a number of times in order for any improvement in the system's performance to be visualized and assessed.
A client-server architecture has been developed to support this protocol.
For the batch task, the whole corpus of documents and the set of profiles are provided to the participants and the systems are expected to give back the results of the filtering system.
The results will be computed and communicated to the participants for discussion. The results of the campaign and the new evaluation methods will be presented in the framework of a workshop at the end of the project and will be published anonymously.
At the end of the project, an evaluation kit will be made available to the community. With this evaluation kit, new teams will be able to assess and compare their system's results with those of the participants, in the same conditions as during the evaluation campaign.
The InFile corpus is made of newswires provided by the Agence France Presse (AFP) for research purpose.
For InFile, we selected 3 languages, (Arabic, English and French) and a 3 years period (2004-2006) which represents a collection of about one and half millions newswires for around 10 GB. Newswires are available in three languages, Arabic, English and French but are not translations from a language to another. News articles are encoded in XML format and follow the News Markup Language (NewsML) specifications.
Here are some examples for Arabic, English and French.
Only 100 000 documents of each language are used for the filtering test, in order to cope with the time constraints of an interactive filtering process. These documents correspond to the set of relevant documents for the topics completed by a set of non-relevant documents as shown in the figure below
The development data is already available. It consists of :
If you wish to participate in the INFILE Campaign or if you are interested in participating in the group of experts of this project, please contact us