Spatial Evolutionary Framework for Parallel Machine Learning

An issue that often arises in automated analysis of biological sequences, particularly in the context of supervised classification, is the size of the dataset. This issue needs to be addressed. Unlike controlled experiments, realistic setting of classification of biological sequences can entail hundreds of thousands of sequences. This is now made possible due to advances in high-throughput genome sequencing. Parallel machine learning approaches dedicated to dealing with this issue to control computational cost predominantly rely on boosting.

We have recently proposed a novel approach that employs builds on spatially-structured evolutionary algorithms (SSEAs) to distribute training data among a number of classifiers in such a way that each classifier is presented with hard examples at the true boundary of the entire dataset. The result is a powerful framework that does not require changing the underlying classifier. This work has appeared in: Uday Kamath, Johan Kaers, Amarda Shehu, and Kenneth A. De Jong. "A Spatial EA Framework for Parallelizing Machine Learning Methods." Intl Conf on Parallel Problem Solving From Nature (PPSN), LNCS vol. 7491, pg. 206-215, Taormina, Italy, 2012.

On this Project:

  • Uday Kamath

    Johan Kaers

    Amarda Shehu

    Kenneth De Jong