starling.utility

Module contents

class starling.utility.ConcatDataset(datasets)[source]

Bases: Dataset

A dataset composed of datasets

Parameters:: datasets (list[Tensor]) – the datasets to concatenate, each of d.shape[0] == m

starling.utility.compute_p_s_given_gamma(S, Theta, dist_option)[source]

Returns:: # of obs x # of cluster x # of cluster matrix - p(s_n | gamma_n = [c,c’])

starling.utility.compute_p_s_given_gamma_model_overlap(S, Theta)[source]

Returns:: # of obs x # of cluster x # of cluster matrix - p(s_n | gamma_n = [c,c’])

starling.utility.compute_p_s_given_z(S, Theta, dist_option)[source]

Returns:: # of obs x # of cluster matrix - p(s_n | z_n = c)

starling.utility.compute_p_y_given_gamma(Y, Theta, dist_option)[source]

Returns:: # of obs x # of cluster x # of cluster matrix - p(y_n | gamma_n = [c,c’])

starling.utility.compute_p_y_given_z(Y, Theta, dist_option)[source]

Returns:: # of obs x # of cluster matrix - p(y_n | z_n = c)

starling.utility.compute_posteriors(Y, S, Theta, dist_option, model_overlap)[source]

starling.utility.init_clustering(initial_clustering_method, adata, k=None, labels=None)[source]

Compute initial cluster centroids, variances & labels

Parameters:

adata (AnnData) – The initial data to be analyzed
initial_clustering_method (Literal['User', 'KM', 'GMM', 'FS', 'PG']) – The method for computing the initial clusters, one of KM (KMeans), GMM (Gaussian Mixture Model), FS (FlowSOM), User (user-provided), or PG (PhenoGraph).
k (Optional[int]) – The number of clusters, must be n_components when initial_clustering_method is GMM (required), k when initial_clustering_method is KM (required), k when initial_clustering_method is FS (required), ? when initial_clustering_method is PG (optional), and can be ommited when initial_clustering_method is “User”, because user will be passing in their own labels.
labels (Optional[ndarray]) – optional, user-provided labels

Raises:

ValueError

Return type:

AnnData

Returns:

The annotated data with labels, centroids, and variances

starling.utility.is_non_negative_float(arg)[source]

Parameters:: arg (float)

starling.utility.model_parameters(adata, singlet_prop)[source]

Return initial model parameters

Parameters:

adata (AnnData) – The sample to be analyzed, with clusters and annotations from init_clustering()
singlet_prop (float) – The proportion of anticipated segmentation error free cells

Return type:

Dict[str, ndarray]

Returns:

the model parameters

starling.utility.predict(dataLoader, model_params, dist_option, model_cell_size, model_zplane_overlap, threshold=0.5)[source]

return singlet/doublet probabilities, singlet cluster assignment probabilty matrix & assignment labels

Parameters:

dataLoader (DataLoader) – the dataloader
model_params (Dict[str, Tensor]) – the model parameters
dist_option (str) – str, one of ‘T’ for Student-T (df=2) or ‘N’ for Normal (Gaussian)
model_cell_size (bool) – bool
model_zplane_overlap (bool) – whether z-plane overlap is modeled
threshold (float)

Returns:

starling.utility.simulate_data(Y, S=None, model_overlap=True)[source]

Use real data to simulate singlets/doublets (equal proportions). Return same number of cells as in Y/S, half of them are singlets and another half are doublets

Parameters:

Y (Tensor) – data matrix of shape m x n
S (Optional[Tensor]) – data matrix of shape m
model_overlap (bool) – If cell size is modelled, should STARLING model z-plane overlap

Return type:

Tuple[tensor]

Returns:

the simulated data

starling.utility.validate_starling_arguments(adata, dist_option, singlet_prop, model_cell_size, cell_size_col_name, model_zplane_overlap, model_regularizer, learning_rate)[source]

Parameters:

adata (AnnData)
dist_option (str)
singlet_prop (float)
model_cell_size (bool)
cell_size_col_name (str)
model_zplane_overlap (bool)
model_regularizer (float)
learning_rate (float)