Vörös A. szerk.: Fragmenta Mineralogica Et Palaentologica 14. 1989. (Budapest, 1989)

Considering a rock group, the correlations between the objects tend to approach the value +1.0. Therefore, a number of similar objects will appear in the dendrogram (O. KO­VÁCS 1987a). Although the main clusters can be separated, its application is not recommend­ed for a data set consisting of volcanic compositions. Consequently we take a point of view that the Euclidean distance with raw data can be applied effectively, as well as the cosine theta or theta coefficient for a data set consisting of major element oxides of volcanic rocks. If it is possible, both techniques are recommend­ed to perform one after another (Fig. 1, 2). The final step of the cluster analysis is the production of a dendrogram by means of a linkage method. The choice of the appropriate linkage method is as important as the choice of the similarity measure. Although there are many methods, generally three of them are used in the geological sciences; single-linkage, unweighted average, and the weighted-pair group average. Some sort of comparisons of these techniques was reviewed by DAVIS (1973), LE MAITRE (1982) and O. KOVÁCS (1987a). According to them, the weighted-pair group average tend to be superior to the other methods. Testing the unweighted average and the weighted-pair group average methods we found that different cluster structures were obtain­ed using the same similarity measure, but the change in the contents of the groups was in­significant. Finally we have to note that the single use of the cluster analysis may lead to false in­terpretations. If clusters of sample-points in high-dimensional space are not compact spher­ical groups, it may be difficult to achieve optimal clustering. In order to check the good­ness of clustering, several tests can be carried out. The ratio of the determinant of the W (within group dispersion matrix,see HOWARTH (1983) in detail) and T (total group disper­sion matrix) is a useful guide of the partitioning. Minimizing trace of the W implies spher­ical clusters (EVERITT 1981). MARIOTT Í1971) constructed a diagram using the Wilks' Lambda to establish successful cluster solution for a data set. DAVIS (1973) suggested a plot of cophenetic correlation values against the original correlations to check the structure of dendrograms. A simple way to expose the possible distortion of clustering is to compare it with the result of non-linear mapping. NON-LINEAR MAPPING Non-linear mapping (NLM) developed by SAMMON (1969) is a method which produces a two-dimensional picture of high-dimensional data, preserving the inherent natural struc­ture of the samples. The resulting mapping can be appreciated easily by eye, revealing the inter-sample and inter-cluster relationships. It is comparable to the dendrograms obtained in the cluster analysis, without the inherent distortions of clustering methods. The applica­tion of these two methods together has the additional benefit of exposing possible errors one another: '(wild-shot points' and 'chaining' respectively (O. KOVÁCS 1987b). HOWARTH (1973) gave comparisons of NLM, cluster-and principal component analysis for several geo­logical data sets. Although his initial tests suggested the. effectiveness of this method, it still has been of limited application in the geological sciences. Examples of using NLM to geochemical data are included in GARRETT (1973), HOWARTH (1973), HOWARTH et al. (1977) and O. KOVÁCS (1987b). Preservation of the inherent natural structure of the data is achieved by fitting the points in two-dimensional space so that their inter-point distances are similar to the dis­tances in the high-dimensional data-space. Although SAMMON (1969) suggested the use of Euclidean distance as a distance measure, other measures like the cosine theta or the theta can be applied as well. The FORTRAN program used for our data set (written by O. KOVÁCS) can employ both measures. In this manner the results of the NLM (Fig. 3) can be com­pared directly with the dendrogram of the cluster analysis. The initial configuration is obtained by randomizing the data points of the m-dimensio­nal space (m is the number of the variables) onto a two-dimensional space. Next, defining a mapping error (SAMMON 1969) the configuration is adjusted iteratively so as to decrease this error. Commonly, the convergence is achieved after 20-40 iterations (HOWARTH 1973). ZAHN (19 71) noted that 'wild-shot points' may appear in the plot due to the SAMMON' s measure based on the average distortion of individual points. A sample containing a variable

Next

/
Thumbnails
Contents