API Reference

pyminimax.minimax(dists, return_prototype=False)[source]

Perform minimax-linkage clustering using nearest-neighbor chain algorithm.

Parameters
  • dists (ndarray) – The upper triangular of the distance matrix. The result of scipy.spatial.distance.pdist is returned in this form.

  • return_prototype (bool, default False) – whether to return prototypes. When this is False, the returned linkage matrix Z has 4 columns, structured the same as the return value of the scipy.cluster.hierarchy.linkage function. When this is True, the returned linkage matrix has a 5th column which contains the indices of the prototypes corresponding to each merge.

Returns

Z – A linkage matrix containing the hierarchical clustering. The first 4 columns has the same structure as the return value of the scipy.cluster.hierarchy.linkage function. See the documentation for more information on its structure. Depending on the value of return_prototype there is an optional 5th columns.

Return type

ndarray

pyminimax.fcluster_prototype(Z, t, criterion='inconsistent', depth=2, R=None, monocrit=None)[source]

Form flat clusters from the hierarchical clustering defined by the given linkage matrix, and the

Parameters
  • Z (ndarray) – The hierarchical clustering encoded with the matrix returned by the minimax function.

  • t (scalar) –

    For criteria ‘inconsistent’, ‘distance’ or ‘monocrit’,

    this is the threshold to apply when forming flat clusters.

    For ‘maxclust’ or ‘maxclust_monocrit’ criteria,

    this would be max number of clusters requested.

  • criterion (str, optional) –

    The criterion to use in forming flat clusters. This can be any of the following values:

    inconsistent :

    If a cluster node and all its descendants have an inconsistent value less than or equal to t, then all its leaf descendants belong to the same flat cluster. When no non-singleton cluster meets this criterion, every node is assigned to its own cluster. (Default)

    distance :

    Forms flat clusters so that the original observations in each flat cluster have no greater a cophenetic distance than t.

    maxclust :

    Finds a minimum threshold r so that the cophenetic distance between any two original observations in the same flat cluster is no more than r and no more than t flat clusters are formed.

    monocrit :

    Forms a flat cluster from a cluster node c with index i when monocrit[j] <= t. For example, to threshold on the maximum mean distance as computed in the inconsistency matrix R with a threshold of 0.8 do:

    MR = maxRstat(Z[:, :4], R, 3)
    fcluster_prototype(Z, t=0.8, criterion='monocrit', monocrit=MR)
    
    maxclust_monocrit :

    Forms a flat cluster from a non-singleton cluster node c when monocrit[i] <= r for all cluster indices i below and including c. r is minimized such that no more than t flat clusters are formed. monocrit must be monotonic. For example, to minimize the threshold t on maximum inconsistency values so that no more than 3 flat clusters are formed, do:

    MI = maxinconsts(Z[:, :4], R)
    fcluster_prototype(Z, t=3, criterion='maxclust_monocrit', monocrit=MI)
    

  • depth (int, optional) – The maximum depth to perform the inconsistency calculation. It has no meaning for the other criteria. Default is 2.

  • R (ndarray, optional) – The inconsistency matrix to use for the ‘inconsistent’ criterion. This matrix is computed if not provided.

  • monocrit (ndarray, optional) – An array of length n-1. monocrit[i] is the statistics upon which non-singleton i is thresholded. The monocrit vector must be monotonic, i.e., given a node c with index i, for all node indices j corresponding to nodes below c, monocrit[i] >= monocrit[j].

Returns

fcluster_prototype – An array of shape (n, 2). T[i] is the flat cluster number to which original observation i belongs, and the index of the prototype of this cluster.

Return type

ndarray