starling.utility

Module contents

class starling.utility.ConcatDataset(datasets)[source]

Bases: Dataset

A dataset composed of datasets

Parameters:

datasets (list[Tensor]) – the datasets to concatenate, each of d.shape[0] == m

starling.utility.compute_p_s_given_gamma(S, Theta, dist_option)[source]
Returns:

# of obs x # of cluster x # of cluster matrix - p(s_n | gamma_n = [c,c’])

starling.utility.compute_p_s_given_gamma_model_overlap(S, Theta)[source]
Returns:

# of obs x # of cluster x # of cluster matrix - p(s_n | gamma_n = [c,c’])

starling.utility.compute_p_s_given_z(S, Theta, dist_option)[source]
Returns:

# of obs x # of cluster matrix - p(s_n | z_n = c)

starling.utility.compute_p_y_given_gamma(Y, Theta, dist_option)[source]
Returns:

# of obs x # of cluster x # of cluster matrix - p(y_n | gamma_n = [c,c’])

starling.utility.compute_p_y_given_z(Y, Theta, dist_option)[source]
Returns:

# of obs x # of cluster matrix - p(y_n | z_n = c)

starling.utility.compute_posteriors(Y, S, Theta, dist_option, model_overlap)[source]
starling.utility.init_clustering(initial_clustering_method, adata, k=None, labels=None)[source]

Compute initial cluster centroids, variances & labels

Parameters:
  • adata (AnnData) – The initial data to be analyzed

  • initial_clustering_method (Literal['User', 'KM', 'GMM', 'FS', 'PG']) – The method for computing the initial clusters, one of KM (KMeans), GMM (Gaussian Mixture Model), FS (FlowSOM), User (user-provided), or PG (PhenoGraph).

  • k (Optional[int]) – The number of clusters, must be n_components when initial_clustering_method is GMM (required), k when initial_clustering_method is KM (required), k when initial_clustering_method is FS (required), ? when initial_clustering_method is PG (optional), and can be ommited when initial_clustering_method is “User”, because user will be passing in their own labels.

  • labels (Optional[ndarray]) – optional, user-provided labels

Raises:

ValueError

Return type:

AnnData

Returns:

The annotated data with labels, centroids, and variances

starling.utility.is_non_negative_float(arg)[source]
Parameters:

arg (float)

starling.utility.model_parameters(adata, singlet_prop)[source]

Return initial model parameters

Parameters:
  • adata (AnnData) – The sample to be analyzed, with clusters and annotations from init_clustering()

  • singlet_prop (float) – The proportion of anticipated segmentation error free cells

Return type:

Dict[str, ndarray]

Returns:

the model parameters

starling.utility.predict(dataLoader, model_params, dist_option, model_cell_size, model_zplane_overlap, threshold=0.5)[source]

return singlet/doublet probabilities, singlet cluster assignment probabilty matrix & assignment labels

Parameters:
  • dataLoader (DataLoader) – the dataloader

  • model_params (Dict[str, Tensor]) – the model parameters

  • dist_option (str) – str, one of ‘T’ for Student-T (df=2) or ‘N’ for Normal (Gaussian)

  • model_cell_size (bool) – bool

  • model_zplane_overlap (bool) – whether z-plane overlap is modeled

  • threshold (float)

Returns:

starling.utility.simulate_data(Y, S=None, model_overlap=True)[source]

Use real data to simulate singlets/doublets (equal proportions). Return same number of cells as in Y/S, half of them are singlets and another half are doublets

Parameters:
  • Y (Tensor) – data matrix of shape m x n

  • S (Optional[Tensor]) – data matrix of shape m

  • model_overlap (bool) – If cell size is modelled, should STARLING model z-plane overlap

Return type:

Tuple[tensor]

Returns:

the simulated data

starling.utility.validate_starling_arguments(adata, dist_option, singlet_prop, model_cell_size, cell_size_col_name, model_zplane_overlap, model_regularizer, learning_rate)[source]
Parameters:
  • adata (AnnData)

  • dist_option (str)

  • singlet_prop (float)

  • model_cell_size (bool)

  • cell_size_col_name (str)

  • model_zplane_overlap (bool)

  • model_regularizer (float)

  • learning_rate (float)