Arquitectura lambda aplicada a clustering de documentos en contextos bigdata

noviembre 5, 2015
The application of data mining techniques, specifically clustering on large volumes of data (such as Big Data), represents a challenge in terms of scalability and response time, given that a higher amount of data involves a higher computation time. The lambda architecture offers a set of general purpose recommendations to design an architecture in Big Data scenarios with the purpose of reducing latency and getting results in real time. This paper presents a study on applying the Lambda architecture on the clustering of documents in a Big Data environment. We focus on the problem of the high latency that occurs when new documents are ingested into the clustering system. One of the key points suggested by this architecture is a separation into three layers of processing: batch layer, serving layer and speed layer. An additional problem in dealing with documents is the high dimensionality when representing such documents, which is approached by means of dimensionality reduction using Latent Dirichlet Allocation. We have seen that the combination of layers proposed in the Lambda Architecture allows clustering of large data volumes and yields results updated in real-time, obtaining a clustering quality comparable to other approaches that do not work on Big Data.
VALLEJO MARTÍNEZ, Alberto; MARTÍNEZ UNANUE, Raquel; RODRIGO YUSTE, Álvaro. Arquitectura lambda aplicada a clustering de documentos en contextos bigdata. 2015.


quecko en el III Concurso de Software Libre

noviembre 22, 2008

quecko es un proyecto de software libre que continua participando en el III Concurso de Software Libre.

¿que es quecko?

noviembre 16, 2007

quecko es un proyecto de software libre que participa en el II Concurso de Software Libre que está alojado en la forja de rediris.

El objetivo es programar una aplicacion web con una implementación abierta de java para formar una red social.

Soy Alberto Vallejo, el administrador del proyecto, y soy estudiante de la UNED.

Hola mundo!

noviembre 12, 2007

Empieza quecko..