|Commenced in January 2007||Frequency: Monthly||Edition: International||Paper Count: 10|
Data mining has, over recent years, seen big advances because of the spread of internet, which generates everyday a tremendous volume of data, and also the immense advances in technologies which facilitate the analysis of these data. In particular, classification techniques are a subdomain of Data Mining which determines in which group each data instance is related within a given dataset. It is used to classify data into different classes according to desired criteria. Generally, a classification technique is either statistical or machine learning. Each type of these techniques has its own limits. Nowadays, current data are becoming increasingly heterogeneous; consequently, current classification techniques are encountering many difficulties. This paper defines new measure functions to quantify the resemblance between instances and then combines them in a new approach which is different from actual algorithms by its reliability computations. Results of the proposed approach exceeded most common classification techniques with an f-measure exceeding 97% on the IRIS Dataset.
In this paper a non-parametric statistical pattern recognition algorithm for the problem of credit scoring will be presented. The proposed algorithm is based on a clustering k- means algorithm and allows for the determination of subclasses of homogenous elements in the data. The algorithm will be tested on two benchmark datasets and its performance compared with other well known pattern recognition algorithm for credit scoring.
This paper presents an optimal and unsupervised satellite image segmentation approach based on Pearson system and k-Means Clustering Algorithm Initialization. Such method could be considered as original by the fact that it utilised K-Means clustering algorithm for an optimal initialisation of image class number on one hand and it exploited Pearson system for an optimal statistical distributions- affectation of each considered class on the other hand. Satellite image exploitation requires the use of different approaches, especially those founded on the unsupervised statistical segmentation principle. Such approaches necessitate definition of several parameters like image class number, class variables- estimation and generalised mixture distributions. Use of statistical images- attributes assured convincing and promoting results under the condition of having an optimal initialisation step with appropriated statistical distributions- affectation. Pearson system associated with a k-means clustering algorithm and Stochastic Expectation-Maximization 'SEM' algorithm could be adapted to such problem. For each image-s class, Pearson system attributes one distribution type according to different parameters and especially the Skewness 'β1' and the kurtosis 'β2'. The different adapted algorithms, K-Means clustering algorithm, SEM algorithm and Pearson system algorithm, are then applied to satellite image segmentation problem. Efficiency of those combined algorithms was firstly validated with the Mean Quadratic Error 'MQE' evaluation, and secondly with visual inspection along several comparisons of these unsupervised images- segmentation.
Data mining uses a variety of techniques each of which is useful for some particular task. It is important to have a deep understanding of each technique and be able to perform sophisticated analysis. In this article we describe a tool built to simulate a variation of the Kohonen network to perform unsupervised clustering and support the entire data mining process up to results visualization. A graphical representation helps the user to find out a strategy to optmize classification by adding, moving or delete a neuron in order to change the number of classes. The tool is also able to automatically suggest a strategy for number of classes optimization.The tool is used to classify macroeconomic data that report the most developed countries? import and export. It is possible to classify the countries based on their economic behaviour and use an ad hoc tool to characterize the commercial behaviour of a country in a selected class from the analysis of positive and negative features that contribute to classes formation.