Towards more accurate clustering method by using dynamic time warping

04/12/2013
by   Khadoudja Ghanem, et al.
0

An intrinsic problem of classifiers based on machine learning (ML) methods is that their learning time grows as the size and complexity of the training dataset increases. For this reason, it is important to have efficient computational methods and algorithms that can be applied on large datasets, such that it is still possible to complete the machine learning tasks in reasonable time. In this context, we present in this paper a more accurate simple process to speed up ML methods. An unsupervised clustering algorithm is combined with Expectation, Maximization (EM) algorithm to develop an efficient Hidden Markov Model (HMM) training. The idea of the proposed process consists of two steps. In the first step, training instances with similar inputs are clustered and a weight factor which represents the frequency of these instances is assigned to each representative cluster. Dynamic Time Warping technique is used as a dissimilarity function to cluster similar examples. In the second step, all formulas in the classical HMM training algorithm (EM) associated with the number of training instances are modified to include the weight factor in appropriate terms. This process significantly accelerates HMM training while maintaining the same initial, transition and emission probabilities matrixes as those obtained with the classical HMM training algorithm. Accordingly, the classification accuracy is preserved. Depending on the size of the training set, speedups of up to 2200 times is possible when the size is about 100.000 instances. The proposed approach is not limited to training HMMs, but it can be employed for a large variety of MLs methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/04/2019

Regression-clustering for Improved Accuracy and Training Cost with Molecular-Orbital-Based Machine Learning

Machine learning (ML) in the representation of molecular-orbital-based (...
research
03/25/2016

Hybridization of Expectation-Maximization and K-Means Algorithms for Better Clustering Performance

The present work proposes hybridization of Expectation-Maximization (EM)...
research
09/04/2019

Regression-clustering for Improved Accuracy and Training Cost with Molecular-Orbital-BasedMachine Learning

Machine learning (ML) in the representation of molecular-orbital-based (...
research
10/19/2012

The Information Bottleneck EM Algorithm

Learning with hidden variables is a central challenge in probabilistic g...
research
04/21/2022

Accurate Molecular-Orbital-Based Machine Learning Energies via Unsupervised Clustering of Chemical Space

We introduce an unsupervised clustering algorithm to improve training ef...
research
03/07/2018

Fast Dawid-Skene

Many real world problems can now be effectively solved using supervised ...

Please sign up or login with your details

Forgot password? Click here to reset