Knowledge Graph Construction from Heterogeneous Data Sources exploiting Declarative Mapping Rules

<< Volver atrás

Tesis:

Knowledge Graph Construction from Heterogeneous Data Sources exploiting Declarative Mapping Rules

Autor: CHAVES FRAGA, David

Título: Knowledge Graph Construction from Heterogeneous Data Sources exploiting Declarative Mapping Rules

Fecha: 2021

Materia: Sin materia definida

Escuela: E.T.S DE INGENIEROS INFORMÁTICOS

Departamentos: INTELIGENCIA ARTIFICIAL

Acceso electrónico: http://oa.upm.es/67890/

Director/a 1º: CORCHO, Oscar

Resumen: Over the last years, a large and constant growth of data have been made available on the Web. These data are published in many different formats following several schemes. The Semantic Web, and more in detail the Knowledge Graphs, have gained momentum as a result of this explosion of available data and the demand of expressive models to integrate factual knowledge spread across various data sources. Although these results endorse the success of Semantic Web technologies, they also exhort the development of computational tools to scale up knowledge graphs to the large data growth expected for the next years. The proposal of robust methods able to integrate these data sources across the Web is the first step that has to be solved so as to start seeing theWeb as an integrated overall database. This thesis addresses the problem of constructing knowledge graphs exploiting declarative mapping rules. The contributions presented in this document are: • A complete evaluation framework for knowledge graph construction engines. • The concept of mapping translation and its desirable properties. • Optimizations and enhancements during the access to heterogeneous data sources in the construction of virtual knowledge graphs exploiting the mapping translation concept. • Optimizations in the construction of materialized knowledge graphs over complex data integration scenarios translating mapping rules among different specifications. The final conclusions of this thesis reflect that the optimization of the construction of knowledge graphs at scale has been approached for the first time using the translations among mapping languages, a novel concept in the state of the art. This has been accompanied by a complete evaluation framework that allows the identification of weakness and strengthens of these engines. Finally, the future lines of work reflect the need to continue researching new methods and techniques that ensure a wide adoption of this type of technologies on a large scale in the industry. ----------RESUMEN---------- Hoy en día existe una cantidad ingente de datos que se han ido almancenando en la Web. Estos datos se representan en formatos y haciendo uso de vocabularios y esquemas muy diversos. La Web Semántica, y más en concreto los Grafos de Conocimientos, se han posicionado como una solución a escala capaz de integrar esta gran cantidad de datos siguiendo un model común (ontología). Aunque esto refleja un éxito de estás tecnologías, siendo utilizadas por muchas compañías y proyectos en la gestión de los datos, también reflejan una necesidad en el desarrollo y conceptualiación de sistemas capaces de construir estos grafos de conocimiento en escenarios dónde caracterísitcas como el volumen o la variedad de los datos es compleja. En esta tesis se aborda el problema de la construcción de grafos de conocimiento la traducción entre relgas de mapeo declarativas. Las contribuciones que se presentan son: • Un sistema de evaluación completo para herramientas de construcción de grafos de conocimientos. • El concepto de traducción entre languajes de mapeo y sus propiedades. • Mejoras en el acceso a datos heterogéneos durante la construcción de grafos de conocimientos virtuales explotando la traducción entre lenguajes. • Optimizaciones para la construcción de grafos de conocimientos materializados en escenarios de integración de datos complejos haciendo traduciendo reglas entre diferentes especificaciones. Las conclusiones finales de esta tesis reflejan que se ha abordado por primera vez la optimización de construcción de grafos de conocimiento a escala haciendo uso de la traducción de lenguajes de mapeo, un concepto novedoso en el estado del arte. Esto se ha acompañado de un sistema completo para la evaluación de estas herramientas. Finalmente, las líneas de trabajo futuro reflejan la necesidad de seguir investigando sobre nuevos métodos y técnicas que aseguren una amplia adopción de estas tecnologías a gran escala en la industria.