Knowledge discovery in Big Data

Scenery

We often say that we are immersed in the era of Big Data.

Growing digitalization and technological advances mean that more and more data is generated in our daily lives and that it is increasingly more complex.

Knowledge Discovery, Exploratory Data Analysis (EDA) and Pattern Recognition are dedicated to identifying patterns in data. And, without a doubt, these patterns are of vital importance since they can offer new insights into the reality behind the data (such as the causes of a disease or the patterns of climate change).

In those cases in which the data is massive (Big Data), the identification of these patterns becomes a truly complex task.

Codas Lab - Research lines - Scenary

Challenge

Complex data is often affected by various factors that make it difficult to analyze:

▪︎ Missing records

▪︎ Several forms of noise

▪︎ Huge numbers of features

▪︎ Need to integrate different data sources

▪︎ Etc.

Furthermore, Big Data processing requires the ability to manage specific hardware (high-performance servers, parallelization, containers, clouds…) and develop software and exploratory approaches that allow managing the Vs of Big Data (Volume, Variety, Velocity, Veracity, etc.).

Codas Lab - Research lines - Challenge

Our proposal

At Codas Lab we use advanced analysis methods and, if necessary, develop methods of our own design in order to improve the interpretation of the data.

In most cases these methods are data factorization algorithms combined with computational statistics that simplify visualization and inference on complex data. These methods are essential since they allow us to find hidden patterns and acquire new insights.

Likewise, we use computational means of simulation and real data to validate and optimize our data pipelines for future problems.

Codas Lab - Research lines - Our proposal

Our vision

We are aware that there are a large number of tools that are ideal when the objective is to leverage the data for the development of automatic applications (Artificial Intelligence, Deep Learning, LLMs, etc.).

However, when the objective is to understand this data, at Codas Lab we rely on the use of matrix factorization methods. We opt for simple, easy-to-interpret methods, and generate complex data pipelines around them.

This makes us the right choice in cases where the researcher needs to understand their data and optimize results.

Codas Lab - Research lines - Our vision

If you want to receive more information about our Knowledge discovery in Big Data research line, do not hesitate to contact us.