Projects

Big Data

The Big Data research programme focuses on the research in efficient big data analysis, with special focus on identifying the limits of existing data analysis techniques, and guidance for the selection of the most efficient data analysis technique to each specific problem setup. Besides the benchmarks and comparative studies of existing data analysis techniques, new methods might be proposed to optimize the utilization of the infrastructure.

In the Big Data area, CERIT-SC aims to identify and validate the effectiveness and limitations of existing data analysis techniques when being applied to different datasets under various setups. Extensive attention is paid to extremely large datasets, which often require very simplistic techniques leveraging between analysis feasibility (e.g. response time, amount of used resources) and achievable information value. The example domains include large networks of interconnected sensors (present for instance within the concept of Internet of Things), cybersecurity assurance and (cyber)crime detection (dealing with vast amounts of heterogeneous data), as well as various bioinformatics data portals and analyses (e.g., genome DNA/RNA sequencing and analyses).

Research goals
The main aim of this research subprogramme is to identify and validate the effectiveness and limitations of existing data analysis techniques when being applied to different datasets under various setups (e.g. given by the research question being answered for the specific dataset). This research, supported by numerous benchmarks on realistic data samples, shall result into codified knowledge guiding the selection of the most effective technique for a specific problem setup, with positive impact on increasing the efficiency of the CERIT-SC infrastructure utilization. Furthermore, our experience from software architecture design will be employed to study the optimization of the infrastructure, and also to study the utilization of data analysis to optimize software architectures of other relevant systems.

Besides studying and researching various techniques for big data analysis, extensive attention will be paid to extremely large datasets, which often require very simplistic techniques leveraging between analysis feasibility (e.g. response time, amount of used resources) and achievable information value. With the help of the discussed benchmarks, we aim at assessing the level of simplification and approximation that is necessary for feasible analysis of the datasets of extremely large sizes. The example domains that belong to these extreme cases include, according to our experience and observations, large networks of interconnected sensors, present for instance within the concept of Internet of Things, cybersecurity assurance and (cyber)crime detection, dealing with vast amounts of heterogeneous data, as well as various bioinformatics data portals and analyses (e.g., genome DNA/RNA sequencing and analyses).



Achievements

You are running an old browser version which is not fully supported information system anymore. Some applications might not display correctly, some functions might not work as expected or might not work at all.