Skip to main content
Log in

A hybrid heuristic algorithm for evolving models in simultaneous scenarios of classification and clustering

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Machine Learning is currently an important research field that attracts interest due to its importance for discovering hidden knowledge or patterns from big datasets. In this paper, we propose a heuristic algorithm which can solve problems related to only classification, only clustering, or classification with clustering by creating models with the ability to evolve to another class/cluster configuration without a retraining process for new incoming data. This algorithm combines supervised and unsupervised learning principles for the incremental construction of both classes and clusters, by using the main guidelines from two classical methods of classification based on distance and clustering based on prototypes, such as KNN and K-means. The algorithm is able to deal with labeled and unlabeled samples as inputs in order to create new groups (classes or clusters), merge or reconfigure existing ones. Basically, the creation of new groups follows three sequential steps: (i) locate the provisional group for an input sample using K-means, (ii) using 1NN, locate the nearest sample to the input sample, only considering the samples in the provisional group, and (iii) merge or reconfigure existing groups following specific guidelines. Several benchmarks, related to classification and clustering problems, were evaluated by our proposal; the results were compared with classical algorithms. On the other hand, artificial datasets with labeled and unlabeled samples have been created to show the ability of our algorithm in the hybrid context to solve classification and clustering combined. As a result, the algorithm is able to create clusters and classes, simultaneously, when required. Finally, a real case study of fault diagnosis in rotating machinery is presented for discovering new groups that might represent patterns from unknown data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Notes

  1. http://scikit-learn.org.

  2. http://www.uni-marburg.de/fb12/datenbionik/data?language_sync=1.

  3. http://archive.ics.uci.edu/ml.

  4. http://cs.joensuu.fi/sipu/datasets/.

References

  1. Aggarwal CC, Han J, Wang J, Yu PS (2006) A framework for on-demand classification of evolving data streams. IEEE Trans Knowl Data Eng 18(5):577–589

    Article  Google Scholar 

  2. Amini A, Saboohi H, Herawan T, Wah TY (2016) Mudi-stream: a multi density clustering algorithm for evolving data stream. J Netw Comput Appl 59:370–385

    Article  Google Scholar 

  3. Basu S (2004) A comparison of inference techniques for semi-supervised clustering with Hidden Markov random fields. In: Proceedings of the ICML-2004 workshop on statistical relational learning and its connections to other fields

  4. Bishop C (2006) Pattern recognition and machine learning. Springer, Cambridge

    MATH  Google Scholar 

  5. Bordignon F, Gomide F (2014) Uninorm based evolving neural networks and approximation capabilities. Neurocomputing 127:13–20

    Article  Google Scholar 

  6. Cai Q, He H, Man H (2014) Imbalanced evolving self-organizing learning. Neurocomputing 133:258–270

    Article  Google Scholar 

  7. Cara A, Herrera L, Pomares H, Rojas I (2013) New online self-evolving neuro fuzzy controller based on the TaSe-NF model. Inf Sci 220:226–243

    Article  MathSciNet  Google Scholar 

  8. Chapelle O, Scholkopf B, Zien A (2006) Semi-supervised learning. MIT Press, Cambridge

    Book  Google Scholar 

  9. Czarnowski I, Jdrzejowicz P (2014) Ensemble classifier for mining data streams. Procedia Comput Sci 35:397–406 (knowledge Based and Intelligent Information; Engineering Systems 18th Annual Conference, KES 2014 Gdynia, September 2014, Poland)

    Article  Google Scholar 

  10. de Andrade SJ, Hruschka ER, Gama J (2017) An evolutionary algorithm for clustering data streams with a variable number of clusters. Expert Syst Appl 67:228–238

    Article  Google Scholar 

  11. Djouadi A, Snorrason O, Garber F (1990) The quality of training-sample estimates of the Bhattacharyya coefficient. IEEE Trans Pattern Anal Mach Intell 12:92–97

    Article  Google Scholar 

  12. Dunham M (2003) Data mining: introductory and advanced topics. Prentice Hall, Upper Saddle River

    Google Scholar 

  13. Ertoz L, Steinbach M, Kumar V (2004) Finding topics in collections of documents: a shared nearest neighbor approach. In: Wu W, Xiong H, Shekhar S (eds) Clustering and information retrieval, network theory and applications, vol 11. Springer, Boston, pp 83–103

    Chapter  Google Scholar 

  14. Geng X, Smith-Miles K (2009) Incremental learning. Springer, Boston, pp 731–735

    Google Scholar 

  15. Ghesmoune M, Lebbah M, Azzag H (2015) Micro-batching growing neural gas for clustering data streams using spark streaming. Procedia Comput Sci 53:158–166 (INNS Conference on Big Data 2015 Program San Francisco, CA, August 2015, USA)

    Article  Google Scholar 

  16. Hartert L, Sayed Mouchaweh M, Billaudel P (2010) A semi-supervised dynamic version of fuzzy K-nearest neighbours to monitor evolving systems. Evol Syst 1(1):3–15

    Article  Google Scholar 

  17. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction. Springer, New York

    Book  MATH  Google Scholar 

  18. He H, Garcia E (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284

    Article  Google Scholar 

  19. Hyde R, Angelov P, MacKenzie A (2017) Fully online clustering of evolving data streams into arbitrarily shaped clusters. Inf Sci 382383:96–114

    Article  Google Scholar 

  20. Klancar G, Skrjanc I (2015) Evolving principal component clustering with a low run-time complexity for LRF data mapping. Appl Soft Comput 35:349–358

    Article  Google Scholar 

  21. Kontaki M, Gounaris A, Papadopoulos AN, Tsichlas K, Manolopoulos Y (2016) Efficient and flexible algorithms for monitoring distance-based outliers over data streams. Inf Syst 55:37–53

    Article  Google Scholar 

  22. Krawczyk B, Woniak M (2015) Incremental weighted one-class classifier for mining stationary data streams. J Comput Sci 9:19–25

    Article  Google Scholar 

  23. Larose D (2004) k-nearest neighbor algorithm, in discovering knowledge in data: an introduction to data mining. Wiley, Hoboken

    Book  Google Scholar 

  24. Liu B, Xiao Y, Yu P, Cao L, Zhang Y, Hao Z (2014) Uncertain one-class learning and concept summarization learning on uncertain data streams. IEEE Trans Knowl Data Eng 26(2):468–484

    Article  Google Scholar 

  25. Lughofer E (2013) On-line assurance of interpretability criteria in evolving fuzzy systems achievements, new concepts and open issues. Inf Sci 251:22–46

    Article  MathSciNet  Google Scholar 

  26. Lughofer E, Sayed-Mouchaweh M (2015) Autonomous data stream clustering implementing split-and-merge concepts towards a plug-and-play approach. Inf Sci 304:54–79

    Article  Google Scholar 

  27. Lughofer E, Weigl E, Heidl W, Eitzinger C, Radauer T (2015a) Integrating new classes on the fly in evolving fuzzy classifier designs and their application in visual inspection. Appl Soft Comput 35:558–582

    Article  Google Scholar 

  28. Lughofer E, Weigl E, Heidl W, Eitzinger C, Radauer T (2015b) Integrating new classes on the fly in evolving fuzzy classifier designs and their application in visual inspection. Appl Soft Comput 35:558–582

    Article  Google Scholar 

  29. Maslennikov OV, Nekorkin VI (2015) Evolving dynamical networks with transient cluster activity. Commun Nonlinear Sci Numer Simul 23(13):10–16

    Article  Google Scholar 

  30. Mythily R, Banu A, Raghunathan S (2015) Clustering models for data stream mining. Procedia Comput Sci 46:619–626 (proceedings of the International Conference on Information and Communication Technologies, ICICT, December 2014, India)

    Article  Google Scholar 

  31. Pacheco F, de Oliveira JV, Sánchez RV, Cerrada M, Cabrera D, Li C, Zurita G, Artés M (2016) A statistical comparison of neuroclassifiers and feature selection methods for gearbox fault diagnosis under realistic conditions. Neurocomputing 194:192–206

    Article  Google Scholar 

  32. Pacheco F, Cerrada M, Snchez RV, Cabrera D, Li C, de Oliveira JV (2017) Attribute clustering using rough set theory for feature selection in fault severity classification of rotating machinery. Expert Syst Appl 71:69–86

    Article  Google Scholar 

  33. PhridviRaj M, GuruRao C (2014) Data mining past, present and future a typical survey on data streams. Procedia Technol 12:255–263 (the 7th International Conference Interdisciplinarity in Engineering, INTER-ENG 2013, October 2013, Romania)

    Article  Google Scholar 

  34. Pimentel M, Clifton D, Clifton L, Tarassenko L (2014) A review of novelty detection. Signal Process 99:215–249

    Article  Google Scholar 

  35. Rehman M, Li T, Yang Y, Wang H (2014) Hyper-ellipsoidal clustering technique for evolving data stream. Knowl Based Syst 70:3–14

    Article  Google Scholar 

  36. Rosenberg A, Hirschberg J (2007) V-measure: a conditional entropy-based external cluster evaluation measure. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL), pp 410–420

  37. Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65

    Article  MATH  Google Scholar 

  38. Saffari A, Leistner C, Santner J, Godec M, Bischof H (2009) On-line random forests. In: 2009 IEEE 12th international conference on computer vision workshops, ICCV workshops, pp 1393–1400

  39. Silva AM, Caminhas W, Lemos A, Gomide F (2014) A fast learning algorithm for evolving neo-fuzzy neuron. Appl Soft Comput Part B 14:194–209

    Article  Google Scholar 

  40. Skrjanc I, Dovzan D (2015) Evolving gustafson-kessel possibilistic c-means clustering. Procedia Computer Science 53:191–198

    Article  Google Scholar 

  41. Syed Z, Rubinfeld I (2010) Fast anomaly detection in dynamic clinical datasets using near-optimal hashing with concentric expansions. In: 2010 IEEE international conference on data mining workshops (ICDMW), pp 763–770

  42. Tan P, Steinbach M, Kumar V (2005) Introduction to data mining. Addison Wesley, Boston

    Google Scholar 

  43. Xia S, Xiong Z, Luo Y, WeiXu Zhang G (2015) Effectiveness of the euclidean distance in high dimensional spaces. Optik Int J Light Electron Opt 126(24):5614–5619

    Article  Google Scholar 

  44. Yang H, Fong S (2015) Countering the concept-drift problems in big data by an incrementally optimized stream mining model. J Syst Softw 102:158–166

    Article  Google Scholar 

  45. Zhang R, Rudnicky AI (2002) A large scale clustering scheme for kernel k-means. In: Object recognition supported by user interaction for service robots, vol 4, pp 289–292

  46. Zhu X, Ding W, Yu P, Zhang C (2011) One-class learning and concept summarization for data streams. Knowl Inf Syst 28(3):523–553

    Article  Google Scholar 

Download references

Acknowledgements

The authors want to acknowledge deeply M.Sc. Fannia Pacheco for her contribution in this work. The results and guidelines proposed in her Master Thesis (https://bit.ly/2EQtOgm) have sustained the development of this approach. This work was sponsored by the Prometeo Project of the Secretariat for Higher Education, Science, Technology and Innovation (SENESCYT) of the Republic of Ecuador. The experiment of the real case application was developed at the Vibration Lab of the GIDTEC research group, at the Universidad Politécnica Salesiana, Cuenca-Ecuador.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mariela Cerrada.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cerrada, M., Aguilar, J., Altamiranda, J. et al. A hybrid heuristic algorithm for evolving models in simultaneous scenarios of classification and clustering. Knowl Inf Syst 61, 755–798 (2019). https://doi.org/10.1007/s10115-019-01336-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-019-01336-3

Keywords

Navigation