## CLASS-L@LISTS.SUNYSB.EDU

#### View:

 Message: [ First | Previous | Next | Last ] By Topic: [ First | Previous | Next | Last ] By Author: [ First | Previous | Next | Last ] Font: Proportional Font

Subject:

Re: Question about the Wards clustering method

From:

Classification, clustering, and phylogeny estimation

Date:

Sun, 5 Aug 2007 16:38:13 -0400

Content-Type:

text/plain

Parts/Attachments:

 text/plain (62 lines)
 The factor is \frac{w_i w_j}{w_i+w_j}. Sorry for any inconvenience. Thank you. -Kejun On 8/5/07, Kejun (Kevin) Mei <[log in to unmask]> wrote: > Dear All, > > I am applying the Ward's method to my clustering problem and have a couple > of questions. The original method tries to merge two clusters at a time > while minimizing the increase of the sum of squared errors at each step of > clustering. The error of a cluster element is the distance from the element > to the cluster centroid. The objective function is a double summation: first > computing the sum of within cluster errors for each cluster, then adding all > of them together. > > The original Ward's method assumes all elements have equal weight, but in my > application element weights are not so. This difference troubles me a lot in > how to construct my initial dissimilarity matrix. Reference [1] suggests the > initial matrix entries be d_{ij}^2, the squared Euclidean distance between > any two elements i and j. And ?it is unknown what properties the resulting > clusters would have unless the similarity is the squared Euclidean distance? > on page 145. > > My first question is what are these properties? I think minimizing the > increase of the sum of squared errors is the most important. What are > others? > > My second question is: shall I use \frac{w_i w_j}{w_i_j} d_(ij)^2 as > matrix entries in order to keep those properties, where w_i and w_j are the > weights of elements i and j, respectively? The factor \frac{w_i > w_j}{w_i_j} is nonlinear, and it may change a squared Euclidean distance > to something that is definitely not. The factor does not matter for the > original Ward?s method because it equals a half when all weights are equal. > On the other hand, if I think of my unequal-weight elements as clusters of > equal-weight elements, then my initial dissimilarity matrix should be > thought of an intermediate matrix on the course of clustering equal-weight > elements. For example, if my initial data set has three elements: e1, e2, > and e3, and their weights are 1,1, and 2, respectively, then I should be > able to think of e3 as a cluster of e6 and e7, and the weights of e6 and e7 > each is 1. > > Is the first step of merging questionable if I use d_{ij}^2 as matrix > entries? This is because that d_(ij)^2 is minimal does not mean so is > \frac{w_i w_j}{w_i_j} d_(ij)^2 . > > Thank you so much. > > -Kevin > > ------------- > [1]. M. R. Anderberg, Cluster Analysis for Applications, New York: Academic > Press, 1973. > > > > > ---------------------------------------------- CLASS-L list. Instructions: http://www.classification-society.org/csna/lists.html#class-l