A hybrid heuristic algorithm for evolving models in simultaneous scenarios of classification and clustering

Cerrada, Mariela; Aguilar, Jose; Altamiranda, Junior; Sánchez, René-Vinicio

doi:10.1007/s10115-019-01336-3

A hybrid heuristic algorithm for evolving models in simultaneous scenarios of classification and clustering

Regular Paper
Published: 09 February 2019

Volume 61, pages 755–798, (2019)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Mariela Cerrada¹,
Jose Aguilar ORCID: orcid.org/0000-0003-4194-6882²,
Junior Altamiranda² &
…
René-Vinicio Sánchez¹

434 Accesses
5 Citations
2 Altmetric
Explore all metrics

Abstract

Machine Learning is currently an important research field that attracts interest due to its importance for discovering hidden knowledge or patterns from big datasets. In this paper, we propose a heuristic algorithm which can solve problems related to only classification, only clustering, or classification with clustering by creating models with the ability to evolve to another class/cluster configuration without a retraining process for new incoming data. This algorithm combines supervised and unsupervised learning principles for the incremental construction of both classes and clusters, by using the main guidelines from two classical methods of classification based on distance and clustering based on prototypes, such as KNN and K-means. The algorithm is able to deal with labeled and unlabeled samples as inputs in order to create new groups (classes or clusters), merge or reconfigure existing ones. Basically, the creation of new groups follows three sequential steps: (i) locate the provisional group for an input sample using K-means, (ii) using 1NN, locate the nearest sample to the input sample, only considering the samples in the provisional group, and (iii) merge or reconfigure existing groups following specific guidelines. Several benchmarks, related to classification and clustering problems, were evaluated by our proposal; the results were compared with classical algorithms. On the other hand, artificial datasets with labeled and unlabeled samples have been created to show the ability of our algorithm in the hybrid context to solve classification and clustering combined. As a result, the algorithm is able to create clusters and classes, simultaneously, when required. Finally, a real case study of fault diagnosis in rotating machinery is presented for discovering new groups that might represent patterns from unknown data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Detection and diagnosis of process fault using unsupervised learning methods and unlabeled data

Article 10 February 2023

A Proposal of On-Line Detection of New Faults and Automatic Learning in Fault Diagnosis

A Dissimilarity Learning Approach by Evolutionary Computation for Faults Recognition in Smart Grids

Notes

References

Aggarwal CC, Han J, Wang J, Yu PS (2006) A framework for on-demand classification of evolving data streams. IEEE Trans Knowl Data Eng 18(5):577–589
Article Google Scholar
Amini A, Saboohi H, Herawan T, Wah TY (2016) Mudi-stream: a multi density clustering algorithm for evolving data stream. J Netw Comput Appl 59:370–385
Article Google Scholar
Basu S (2004) A comparison of inference techniques for semi-supervised clustering with Hidden Markov random fields. In: Proceedings of the ICML-2004 workshop on statistical relational learning and its connections to other fields
Bishop C (2006) Pattern recognition and machine learning. Springer, Cambridge
MATH Google Scholar
Bordignon F, Gomide F (2014) Uninorm based evolving neural networks and approximation capabilities. Neurocomputing 127:13–20
Article Google Scholar
Cai Q, He H, Man H (2014) Imbalanced evolving self-organizing learning. Neurocomputing 133:258–270
Article Google Scholar
Cara A, Herrera L, Pomares H, Rojas I (2013) New online self-evolving neuro fuzzy controller based on the TaSe-NF model. Inf Sci 220:226–243
Article MathSciNet Google Scholar
Chapelle O, Scholkopf B, Zien A (2006) Semi-supervised learning. MIT Press, Cambridge
Book Google Scholar
Czarnowski I, Jdrzejowicz P (2014) Ensemble classifier for mining data streams. Procedia Comput Sci 35:397–406 (knowledge Based and Intelligent Information; Engineering Systems 18th Annual Conference, KES 2014 Gdynia, September 2014, Poland)
Article Google Scholar
de Andrade SJ, Hruschka ER, Gama J (2017) An evolutionary algorithm for clustering data streams with a variable number of clusters. Expert Syst Appl 67:228–238
Article Google Scholar
Djouadi A, Snorrason O, Garber F (1990) The quality of training-sample estimates of the Bhattacharyya coefficient. IEEE Trans Pattern Anal Mach Intell 12:92–97
Article Google Scholar
Dunham M (2003) Data mining: introductory and advanced topics. Prentice Hall, Upper Saddle River
Google Scholar
Ertoz L, Steinbach M, Kumar V (2004) Finding topics in collections of documents: a shared nearest neighbor approach. In: Wu W, Xiong H, Shekhar S (eds) Clustering and information retrieval, network theory and applications, vol 11. Springer, Boston, pp 83–103
Chapter Google Scholar
Geng X, Smith-Miles K (2009) Incremental learning. Springer, Boston, pp 731–735
Google Scholar
Ghesmoune M, Lebbah M, Azzag H (2015) Micro-batching growing neural gas for clustering data streams using spark streaming. Procedia Comput Sci 53:158–166 (INNS Conference on Big Data 2015 Program San Francisco, CA, August 2015, USA)
Article Google Scholar
Hartert L, Sayed Mouchaweh M, Billaudel P (2010) A semi-supervised dynamic version of fuzzy K-nearest neighbours to monitor evolving systems. Evol Syst 1(1):3–15
Article Google Scholar
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction. Springer, New York
Book MATH Google Scholar
He H, Garcia E (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284
Article Google Scholar
Hyde R, Angelov P, MacKenzie A (2017) Fully online clustering of evolving data streams into arbitrarily shaped clusters. Inf Sci 382383:96–114
Article Google Scholar
Klancar G, Skrjanc I (2015) Evolving principal component clustering with a low run-time complexity for LRF data mapping. Appl Soft Comput 35:349–358
Article Google Scholar
Kontaki M, Gounaris A, Papadopoulos AN, Tsichlas K, Manolopoulos Y (2016) Efficient and flexible algorithms for monitoring distance-based outliers over data streams. Inf Syst 55:37–53
Article Google Scholar
Krawczyk B, Woniak M (2015) Incremental weighted one-class classifier for mining stationary data streams. J Comput Sci 9:19–25
Article Google Scholar
Larose D (2004) k-nearest neighbor algorithm, in discovering knowledge in data: an introduction to data mining. Wiley, Hoboken
Book Google Scholar
Liu B, Xiao Y, Yu P, Cao L, Zhang Y, Hao Z (2014) Uncertain one-class learning and concept summarization learning on uncertain data streams. IEEE Trans Knowl Data Eng 26(2):468–484
Article Google Scholar
Lughofer E (2013) On-line assurance of interpretability criteria in evolving fuzzy systems achievements, new concepts and open issues. Inf Sci 251:22–46
Article MathSciNet Google Scholar
Lughofer E, Sayed-Mouchaweh M (2015) Autonomous data stream clustering implementing split-and-merge concepts towards a plug-and-play approach. Inf Sci 304:54–79
Article Google Scholar
Lughofer E, Weigl E, Heidl W, Eitzinger C, Radauer T (2015a) Integrating new classes on the fly in evolving fuzzy classifier designs and their application in visual inspection. Appl Soft Comput 35:558–582
Article Google Scholar
Lughofer E, Weigl E, Heidl W, Eitzinger C, Radauer T (2015b) Integrating new classes on the fly in evolving fuzzy classifier designs and their application in visual inspection. Appl Soft Comput 35:558–582
Article Google Scholar
Maslennikov OV, Nekorkin VI (2015) Evolving dynamical networks with transient cluster activity. Commun Nonlinear Sci Numer Simul 23(13):10–16
Article Google Scholar
Mythily R, Banu A, Raghunathan S (2015) Clustering models for data stream mining. Procedia Comput Sci 46:619–626 (proceedings of the International Conference on Information and Communication Technologies, ICICT, December 2014, India)
Article Google Scholar
Pacheco F, de Oliveira JV, Sánchez RV, Cerrada M, Cabrera D, Li C, Zurita G, Artés M (2016) A statistical comparison of neuroclassifiers and feature selection methods for gearbox fault diagnosis under realistic conditions. Neurocomputing 194:192–206
Article Google Scholar
Pacheco F, Cerrada M, Snchez RV, Cabrera D, Li C, de Oliveira JV (2017) Attribute clustering using rough set theory for feature selection in fault severity classification of rotating machinery. Expert Syst Appl 71:69–86
Article Google Scholar
PhridviRaj M, GuruRao C (2014) Data mining past, present and future a typical survey on data streams. Procedia Technol 12:255–263 (the 7th International Conference Interdisciplinarity in Engineering, INTER-ENG 2013, October 2013, Romania)
Article Google Scholar
Pimentel M, Clifton D, Clifton L, Tarassenko L (2014) A review of novelty detection. Signal Process 99:215–249
Article Google Scholar
Rehman M, Li T, Yang Y, Wang H (2014) Hyper-ellipsoidal clustering technique for evolving data stream. Knowl Based Syst 70:3–14
Article Google Scholar
Rosenberg A, Hirschberg J (2007) V-measure: a conditional entropy-based external cluster evaluation measure. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL), pp 410–420
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65
Article MATH Google Scholar
Saffari A, Leistner C, Santner J, Godec M, Bischof H (2009) On-line random forests. In: 2009 IEEE 12th international conference on computer vision workshops, ICCV workshops, pp 1393–1400
Silva AM, Caminhas W, Lemos A, Gomide F (2014) A fast learning algorithm for evolving neo-fuzzy neuron. Appl Soft Comput Part B 14:194–209
Article Google Scholar
Skrjanc I, Dovzan D (2015) Evolving gustafson-kessel possibilistic c-means clustering. Procedia Computer Science 53:191–198
Article Google Scholar
Syed Z, Rubinfeld I (2010) Fast anomaly detection in dynamic clinical datasets using near-optimal hashing with concentric expansions. In: 2010 IEEE international conference on data mining workshops (ICDMW), pp 763–770
Tan P, Steinbach M, Kumar V (2005) Introduction to data mining. Addison Wesley, Boston
Google Scholar
Xia S, Xiong Z, Luo Y, WeiXu Zhang G (2015) Effectiveness of the euclidean distance in high dimensional spaces. Optik Int J Light Electron Opt 126(24):5614–5619
Article Google Scholar
Yang H, Fong S (2015) Countering the concept-drift problems in big data by an incrementally optimized stream mining model. J Syst Softw 102:158–166
Article Google Scholar
Zhang R, Rudnicky AI (2002) A large scale clustering scheme for kernel k-means. In: Object recognition supported by user interaction for service robots, vol 4, pp 289–292
Zhu X, Ding W, Yu P, Zhang C (2011) One-class learning and concept summarization for data streams. Knowl Inf Syst 28(3):523–553
Article Google Scholar

Download references

Acknowledgements

The authors want to acknowledge deeply M.Sc. Fannia Pacheco for her contribution in this work. The results and guidelines proposed in her Master Thesis (https://bit.ly/2EQtOgm) have sustained the development of this approach. This work was sponsored by the Prometeo Project of the Secretariat for Higher Education, Science, Technology and Innovation (SENESCYT) of the Republic of Ecuador. The experiment of the real case application was developed at the Vibration Lab of the GIDTEC research group, at the Universidad Politécnica Salesiana, Cuenca-Ecuador.

Author information

Authors and Affiliations

GIDTEC, Universidad Politécnica Salesiana, Cuenca, Ecuador
Mariela Cerrada & René-Vinicio Sánchez
Computing Sciences Department, Universidad de Los Andes, Mérida, Venezuela
Jose Aguilar & Junior Altamiranda

Authors

Mariela Cerrada
View author publications
You can also search for this author in PubMed Google Scholar
Jose Aguilar
View author publications
You can also search for this author in PubMed Google Scholar
Junior Altamiranda
View author publications
You can also search for this author in PubMed Google Scholar
René-Vinicio Sánchez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mariela Cerrada.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cerrada, M., Aguilar, J., Altamiranda, J. et al. A hybrid heuristic algorithm for evolving models in simultaneous scenarios of classification and clustering. Knowl Inf Syst 61, 755–798 (2019). https://doi.org/10.1007/s10115-019-01336-3

Download citation

Received: 07 June 2017
Revised: 14 January 2019
Accepted: 19 January 2019
Published: 09 February 2019
Issue Date: 01 November 2019
DOI: https://doi.org/10.1007/s10115-019-01336-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A hybrid heuristic algorithm for evolving models in simultaneous scenarios of classification and clustering

Abstract

Access this article

Similar content being viewed by others

Detection and diagnosis of process fault using unsupervised learning methods and unlabeled data

A Proposal of On-Line Detection of New Faults and Automatic Learning in Fault Diagnosis

A Dissimilarity Learning Approach by Evolutionary Computation for Faults Recognition in Smart Grids

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A hybrid heuristic algorithm for evolving models in simultaneous scenarios of classification and clustering

Abstract

Access this article

Similar content being viewed by others

Detection and diagnosis of process fault using unsupervised learning methods and unlabeled data

A Proposal of On-Line Detection of New Faults and Automatic Learning in Fault Diagnosis

A Dissimilarity Learning Approach by Evolutionary Computation for Faults Recognition in Smart Grids

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation