Reasoning on data semantics
we investigate different models and algorithms for querying data (or ressources) through possibly heterogeneous and distributed ontologies.
List of participants: F. Jouanot (Associate Professor), M.-Ch. Rousset (Professor), A. Termier (Associate Professor), G. Vargas-Solar (CR1), Ch. Collet (Professor), R. Tournaire (PhD 2007- ), S. Tandabany (PhD 2007-)
The web has deeply changed the vision of modern data management systems and has forced to revisit the problem of querying data which are distributed, possibly heterogeneous and ill or semi-structured. This revolution is going to get amplified with the miniaturization of storage devices connected to the network. This opens new possibilities and raises new challenges for integrating heterogeneous decentralized and context-sensitive data. Reasoning on context and data semantics is one of the keys for attacking in a principled way those challenges. The positioning of the group is to investigate the different algorithmic issues for the scalability of querying data through possibly heterogeneous and distributed ontologies.
We plan to extend our work on data semantics in two main directions:
- Models and algorithms for reasoning on the distributed semantics of Web data. In particular, we will investigate models for handling uncertainty and trust in peer-to-peer data management systems (Dataring project), and algorithms for automatic discovery of probabilistic mappings between taxonomies of classes.
- Models and algorithms for handling semantic and contextual descriptions of devices or services. In particular, we will investigate the problem of automatic discovery and composition of services based on the semantic description of their functionalities and of the context in which the devices supporting them are deployed (CONTINUUM project).
Composing data services in a dynamic way
We investigate models, algorithms and tools for coordinating services with non functional properties (contracts) and for providing access to heterogeneous data coming from services.
List of participants: Ch. Bobineau (Associate PR), Ch. Collet (PR), G. Vargas-Solar (CR1), Javier Alfonso Espinosa-Oviedo (PhD 2009 - , co-direction France - Mexico), Alberto Portilla-Flores (PhD 2005- 2009 co-direction, France - Mexico), Victor Cuesvas-Vicentt´ın (PhD 2006 - )
Composing services exported by different organisations is a key issue when building large scale and data-intensive applications/systems. It is becoming more crucial when considering services within ubiquitous infrastructures made of heterogeneous devices, servers, applications connected with heterogeneous systems. Composition requires to take into account the characteristics of these eco-systems (e.g., memory and computing, and network capabilities). The composition process uses this knowledge or semantics to dynamically discover and coordinate (ubiquitous) services, and then to adapt the coordination process depending on the availability and change of services. Another important challenge is to consider non functional aspects and QoS (quality of service) criteria such as availability, reliability, and temporal constraints that are crucial when composing data services in a dynamic way. Int he group, we investigate models, algorithms and tools for:
- merging data and control flows for describing service coordination ;
- processing (accessing and integrating) heterogeneous data coming from services in a discrete and continuous way;
- reliable and adaptive data services composition;
- dealing with autonomic services and systems.
We plan to extend our work on reliable and autonomic data services composition in three directions:
- Providing reliability to services composition: we will define a language for specifying non functional properties of services’ coordinations. In particular, it will be used for programming recovery actions with associated execution strategies for synchronizing recovery with the execution of the coordination. The semantics of the language will be formally described and its properties will be studied and demonstrated (ECOS-ANUIES ORCHESTRA project).
- Service-based query processing: we will investigate new query processing techniques that tackle at the same time classic, mobile and continuous queries by composing data services providers that are (push/pull, static or nomad) providers (OPTIMACS ANR project, PhD). At the opposite we will investigate how to use declarative queries expressions in integrating and composing services available in dynamic environments (e-CLOUDSS and redSHINE projects).
- Event flow management: research on events and rules will continue as a base for the autonomic management of views for data spaces and services clouds. It will address event management through a flexible approach that enables programmers to build their own event composition/synthetisis and management functions.
Accessing data in large-scale systems
List of participants: Ch. Collet (Professor), M.-Ch. Rousset (Professor), Ch. Bobineau (Associate Professor), F. Jouanot (Associate Professor), A. Termier (Associate Professor), Benjamen Negrevergne (PhD 2008- )
Query optimization in distributed and dynamic systems
Accessing data concerns several dimensions of large scale systems: number of resources, data volume and data complexity.Current large scale systems in number of resources include grids, peer-to-peer networks, sensor networks, ambient and ubiquitous environments. The most popular method to access data within these systems in a convenient and efficient way is still to consider declarative queries that are optimized based on system characteristics. Due to the strong dynamicity of these systems, classical distributed query evaluation techniques are not applicable.Having a global view of the system is not possible: pertinent data sources cannot be a priori known and useful metadata for query evaluation are not always available. In addition, the evaluation strategy for a query has to dynamically adapt to fluctuating conditions and to users with different needs. For example, some may want to maximize performance while others may need to minimize energy consumption. The HADAS group focuses on new approaches for query evaluation efficiency w.r.t. application needs running on large-scale systems following our precedent works on adaptive query processing. We plan to extend our works on efficient query evaluation in two main directions:
These two directions will be explored in the setting of the ANR Blanc 2009 project UBIQUEST and in a collaboration with CEA-LETI.
- Machine-learning-based adaptive query evaluation. In distributed environments where metadata are lacking, classical query evaluation techniques cannot be applied. We propose machine learning techniques exploiting easures taken during previous query executions to improve performance of future query evaluations (case-based easoning).
- Data and network management in dynamic ad-hoc networks. In distributed environments, queries have to be decomposed into subqueries that have to be evaluated on different nodes of the network. In dynamic environments, here is no knowledge about data distribution (localization and volumes). We propose to combine network and ata management by viewing the whole network as a dynamic distributed database system. This work has been tarted in collaboration with the LIAMA in China, and promising results have been already obtained.
Mining large amounts of data to extract patterns of interest
Data mining is another way to access large quantities of data, by extracting interesting patterns from them. Such patternsprovide meaningful abstractions of raw data, which are thus less numerous and more appropriate for data analysis. Thegroup works on pattern mining in complex data such as sequences, trees or graphs, which are found in many applicationsin chemistry (e.g. graphs representing molecules) or in bioinformatics (e.g. gene regulation networks).
The focus will be on designing and deploying parallel pattern mining algorithms on multicore processors. The starting DAMOCLES project (supported by the MSTIC pole of UJF) will investigate “DAta Mining for On Chip Low Energy systems”. This project involves HADAS and MESCAL teams of LIG and the machine architecture team of the TIMA laboratory in Grenoble (F. Petrot, SLS team).