CREDIBILISTIC FUZZY CLUSTERING BASED ON ANALYSIS OF DATA DISTRIBUTION DENSITY AND THEIR PEAKS
DOI:
https://doi.org/10.15588/1607-3274-2022-3-6Keywords:
fuzzy clustering, credibilistic clustering, density peak of dataset.Abstract
Context. The task of clustering – classification without a teacher of data arrays occupies a rather important place in Data Mining. To solve this problem, many approaches have been proposed at the moment, differing from each other in a priori assumptions in the studied and analyzed arrays, in the mathematical apparatus that is the basis of certain methods. The solution of clustering problems is complicated by the large dimension of the vectors of the analyzed observations, their distortion of various types.
Objective. The purpose of the work is to introduce a fuzzy clustering procedure that combines the advantages of methods based on the analysis of data distribution densities and their peaks, which are characterized by high speed and can work effectively in conditions of classes that overlapping.
Method. The method of fuzzy clustering of data arrays, based on the ideas of analyzing the distribution densities of these data, their peaks, and a confidence fuzzy approach has been introduced. The advantage of the proposed approach is to reduce the time for solving optimization problems related to finding attractors of density functions, since the number of calls to the optimization block is determined not by the volume of the analyzed array, but by the number of density peaks of the same array.
Results. The method is quite simple in numerical implementation and is not critical to the choice of the optimization procedure. The experimental results confirm the effectiveness of the proposed approach in clustering problems under the condition of cluster intersection and allow us to recommend the proposed method for practical use in solving problems of automatic clustering of large data volumes.
Conclusions. The method is quite simple in numerical implementation and is not critical to the choice of the optimization procedure. The advantage of the proposed approach is to reduce the time for solving optimization problems related to finding attractors of density functions, since the number of calls to the optimization block is determined not by the volume of the analyzed array, but by the number of density peaks of the same array. The method is quite simple in numerical implementation and is not critical to the choice of the optimization procedure. The experimental results confirm the effectiveness of the proposed approach in clustering problems under conditions of overlapping clusters.
References
Gan G., Ma Ch., Wu J. Data Clustering: Theory, Algorithms and Applications. Philadiphia, Pensilvania, SIAM, 2007, 455 p.
Abonyi J., Feil D. Cluster Analisis for Data Mining and System Identification. Basel, Birlhause, 2007, 303 p.
Xu R., Wunsch D. C. Clustering. Hoboken N.J., John Wiley & Sons, Inc., 2009, 398 p.
Aggarwal C. C. Data Mining. Switzerland, Springer, 2015, 727 p. DOI https://doi.org/ 10.1007 / 978-3-319-14142-8.
Höppner F., Klawonn F., Kruse R., Runkler T. Fuzzy Clustering Analysis: Methods for Classification, Data Analisys and Image Recognition. Chichester, John Wiley &Sons, 1999, 300 p.
Bezdek J. C. et al. Fuzzy models and algorithms for pattern recognition and image processing. Springer Science & Business Media, 1999, Vol. 4.
Hinneburg A., Klein D. An efficient approach to clustering in large multimedia databases with noise, Proc. 4th Int. Conf. in Knowledge Discovering and Data Mining, KDD98, N.Y.: AAAI Press, Aug. 27, 1998. Hinneburg, 1998, pp. 58– 65.
Hinneburg A., Gabriel HH. In: R. Berthold, M., ShaweTaylor, J., Lavrač, N. (eds) DENCLUE 2.0: Fast Clustering Based on Kernel Density Estimation. Advances in Intelligent Data Analysis VII. IDA 2007. Lecture Notes in Computer Science, Vol. 4723. Springer. Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74825-0_7
Hinneburg A., Keim D. A. A general approach to clustering in large databases with noise, Knowledge and Identification Systems, 2003, 5 (4), pp. 387–415. https://doi.org/10.1007 /s10115-003-0086-9
Rehhioni H., Idrissi A., Abourezq M., Zegrary F. DENCLUE-IM: A new approachfor big data clustering, Procedia Computer Science, 2016, 83, pp. 560–567.
Rodriguez A., Laio A. Clustering by fast seach and find of density peaks, Science, 2014, No. 34, pp. 1492–1496. https://doi.org/10.1126/science.124207
Shafronenko A., Bodyanskiy Ye., Pliss I., Klymova I. Online Credibilistic Fuzzy Clustering Method Based on Cauchy Density Distribution Function, 2021 11th International Conference on Advanced Computer Information Technologies (ACIT): proceedings. Deggendorf, Germany, IEEE, 2021, pp. 704–707. DOI: 10.1109/ ACIT52158.2021.9548572
Epanechnikov V. A. Nonparametric estimation of multivariate probability density, Probability theory and its Application, 1968, 14, No. 2, pp. 156–161.
Parzen E. On estimation of a probably density function and mode, The Annals of Math Statistics, 1962, 33, No. 3, pp. 1065–1076. http://dx.doi.org/10.1214/aoms/1177704472
Nadaraya E. A. On nonparametric estimates of density function and regression curves, Theory of Probabilistic Application, 1965, No. 10, pp. 186–190.
Watson G. S. Smoth regression analisys, The Indian Journal of Statistics. Sankhya, 1964, Ser. A, 26, No. 4, pp. 359–372.
Fukunaga K., Hostler L. D.// The estimation of the gradient of a density function with application in pattern recognition, IEEE Trans. on Inf. Theory, Jan., 1975, IEEE, 1975, No. 21 pp. 32–40. https://doi.org/ 10.1109/TIT.1975.10 55330.
Zhou J., Wang Q., Hung C.-C., Yi X. Credibilistic clustering: the model and algorithms, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2015, Vol. 23, No. 4, pp. 545–564. https://doi.org/ 10.1142/S0218488515500245
Zhou J., Wang Q., Hung C. C. Credibilistic clustering algorithms via alternating cluster estimation, Journal of Intelligent Manufacturing, 2017, Vol. 28, pp. 727–738. DOI: https://doi.org/10.1007/s10845-014-1004-6.
Shafronenko A., Bodyanskiy Ye., Klymova I., Holovin O.] Online credibilistic fuzzy clustering of data using membership functions of special type[Electronic resource, Proceedings of The Third International Workshop on Computer Modeling and Intelligent Systems (CMIS-2020), April 27-1 May 2020. Zaporizhzhia, 2020. Access mode: http://ceurws.org/Vol-2608/paper56.pdf.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2022 Є. В. Бодянський, І. П. Плісс, А. Ю. Шафроненко, О. В. Калиниченко
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Creative Commons Licensing Notifications in the Copyright Notices
The journal allows the authors to hold the copyright without restrictions and to retain publishing rights without restrictions.
The journal allows readers to read, download, copy, distribute, print, search, or link to the full texts of its articles.
The journal allows to reuse and remixing of its content, in accordance with a Creative Commons license СС BY -SA.
Authors who publish with this journal agree to the following terms:
-
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License CC BY-SA that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
-
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
-
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.