Knowledge Discovery in Big Data

Knowledge Discovery, Exploratory Data Analysis (EDA) or Pattern Recognition generally refer to the identification of interpretable patterns from data. For that purpose, multivariate techniques have been extensively employed in many research fields, including social sciences, education, medicine, chemistry and related fields. EDA based on multivariate models, also known as Multivariate EDA (MEDA), relies on a set of visualizations that simplify the understanding of complex data. A plus of MEDA is that multivariate models are interpretable and can be used to interact with data in order to investigate the underlying phenomena of interest.

Exploratory and Big Data Analysis

The MEDA tools are extremely powerful when applied to normal size data, as illustrated in hundreds of applications in a wide range of areas. However, they are hard to extend to the Big Data paradigm. The MEDA Toolbox in Matlab, a software initiative maintained by the CoDaS Lab, has been one of the first attempts to perform such extension. The MEDA Toolbox is open software available at Github: (https://github.com/codaslab/MEDA-Toolbox). It combines clustering and kernel computations to extend MEDA visualization tools to unlimited numbers of observations or variables. This toolbox has been employed with success in several research and development projects, showing its potentiality to handle very complex data of disparate nature: clinical, chemical, biological, computer network traffic and security data, etc.

The CoDaS Lab organizes several courses on MEDA, including the Ph.D. Course Multivariate Exploratory Data Analysis teached by Prof. José Camacho, which gathers students from very disparate areas (ICTs, astronomy, geology, biology, health, etc.) from the University of Granada.

Related references: