Logo Grupo SID Logo Grupo SID Logo Grupo SID Logo Grupo SID


Logo Universidad de Zaragoza

GENIE is the acronym of GENeric Information Extraction Framework. GENIE is an architectural proposal that implements a set of components which objective is to provide tools to make easier information extraction with easily accessible formats using Machine Learning, IA Technics, Natural Language Processing tools and Semantic Methods.


Why Genie?
The reason for developing this framework is to provide a general purpose platform capable of integrating all kinds of processes related to the information extraction and that can be developed for applications which require execution of components capable of handling tasks relevant to disambiguation, the language analysis, labeling and text classification, summary writing, semantic search, etc.

Information Extraction
The Information extraction is a science that deals with the search for data on any type of digital documentary collection in a pertinent and relevant way. And this is the main field of activity of the platform GENIE. Today the access to large amounts of information has become something regular in our lives and it is being considered the need for tools to collect, organize, analyze and distribution this information. These products require capabilities that are not trivial and that can hardly be found in commercial products. That is why it has been considered very useful to have software that assembles in common framework different elements to tackle this problem from different angles, giving also the possibility of automate many usual processes related to the extraction information. This can help limit the possibility of human errors in these tasks, increase productivity of organizations and save the resources needed to achieve their goals.

Who we are?
GENIE it is being developed at the University of Zaragoza, within The SID Research Group that has an extensive experience in issues related to the Semantic Web. In the project are working together doctors, engineers, academicians, scholars and business professionals. You can meet us taking a look at the tab of Staff.


  • Creating a framework able to handle different languages and to integrate a large number of processes related to information extraction.

  • Integrating in this framework modular, generic and open tools that can be used in other external applications.
  • Developing an open framework allowing future expansion.

  • Facilitate experimentation and testing allowing the improvement of actuals methods and the development of new tools that represent an innovation in the field of information extraction.


In addition to its high interest as a research platform, this software has a lot of practical applications almost immediately:

Search Engines:
Increase the performance of a standard index of terms based search engine, making their behavior closer to a semantic search engine. That allows getting results although gender (male or female), the number (singular or plural) and verb forms are different from the key words used in the query. It also consider synonyms and related words when retrieving data, which greatly enhances the user experience.

Documentation Archives:
GENIE can help to improve the productivity of a administration or documentation department by automating tedious tasks of tagging, either general or even more detailed, as GENIE has tools capable of performing semantic labeling with a high level of precision. GENIE Not only can tag, but through a suitable interface could alter if necessary item details in the document database. Furthermore GENIE has links to geographical databases, making it capable of taking into account tagged text locations and it disambiguates when it needs. Besides, GENIE is able to produce summaries of text with a defined extension, something very useful in many areas.

Information management:
With the ability to 'understand' a text, GENIE can extract information from a text and then, or translate to measurable values, or fill sheets. The System can be applied to the analysis of text data, for reporting, analysis of brand reputation, or for implementing filters.


Article (in Spanish) published in the newspaper "El Heraldo de Aragón" (Abr 3, 2016) which explains the AIS project, in collaboration with the company Insynergy (ISYC), dedicated to extracting information from legal documents using semantic techniques.

Review in the ISYC company blog

Article (in Spanish) published in the newspaper "El Heraldo de Aragón" (April 18, 2012) that describes NASS, our work on catagorising news using semantic techniques.

Entrevista en Aragón Radio (22-05-2012)