ark.utils

Subpackages

ark.utils.data_utils

class ark.utils.data_utils.AnnCollectionKwargs[source]

Bases: TypedDict

convert: Optional[anndata.experimental.multi_files._anncollection.ConvertType]
harmonize_dtypes: bool
index_unique: Optional[str]
indices_strict: bool
join_obs: Optional[Literal['inner', 'outer']]
join_obsm: Optional[Literal['inner']]
join_vars: Optional[Literal['inner']]
label: Optional[str]
class ark.utils.data_utils.AnnDataIterDataPipe(*args: Any, **kwargs: Any)[source]

Bases: IterDataPipe

The TorchData Iterable-style DataPipe. Takes an AnnCollection and makes it iterable by FOV for easy and flexible data pipelines.

Parameters:

fovs (AnnCollection) – The AnnCollection containing the AnnData objects.

property fovs: anndata.experimental.AnnCollection
class ark.utils.data_utils.ClusterMaskData(data: pandas.DataFrame, fov_col: str, label_col: str, cluster_col: str)[source]

Bases: object

A class containing the cell labels, cluster labels, and segmentation labels for the whole cohort. Also contains the mapping from the segmentation label to the cluster label for each FOV.

__init__(data: pandas.DataFrame, fov_col: str, label_col: str, cluster_col: str) None[source]

A class containing the cell data, cell label column, cluster column and the mapping from a cell label to a cluster.

Parameters:
  • data (pd.DataFrame) – A cell table with the cell label column and the cluster column.

  • fov_col (str) – The name of the column in the cell table that contains the FOV ID.

  • label_col (str) – The name of the column in the cell table that contains the cell label.

  • cluster_col (str) – The name of the column in the cell table that contains the cluster label.

cluster_column: str
cluster_id_column: str
property cluster_names: List[str]

Returns the cluster names. :returns: The cluster names. :rtype: List[str]

fov_column: str
fov_mapping(fov: str) pandas.DataFrame[source]

Returns the mapping for a specific FOV. :param fov: The FOV to get the mapping for. :type fov: str

Returns:

The mapping for the FOV.

Return type:

pd.DataFrame

label_column: str
mapping: pandas.DataFrame
n_clusters: int
unassigned_id: int
unique_fovs: List[str]
class ark.utils.data_utils.ConvertToAnnData(cell_table_path: PathLike, markers: Union[list[str], Literal[‘auto’]] = 'auto', extra_obs_parameters: list[str] = None)[source]

Bases: object

A class which converts the Cell Table csv file to a series of AnnData objects, one object per FOV.

The default parameters stored in the obs slot include: - area - cell_meta_cluster - centroid_dif - convex_area - convex_hull_resid - cell_meta_cluster - eccentricity - fov - major_axis_equiv_diam_ratio

Visit the Data Types document to see the full list of parameters. The default parameters stored in the obs slot include: - centroid_x - centroid_y

Parameters:
  • cell_table_path (os.PathLike) – The path to the cell table.

  • markers (list[str], “auto”) – The markers to extract and store in X. Defaults to “auto”,

  • markers. (which will extract all) –

  • extra_obs_parameters (list[str], optional) – Extra parameters to load in obs. Defaults

  • None. (to) –

convert_to_adata(save_dir: PathLike) dict[str, str][source]

Converts the cell table to a FOV-level AnnData object, and saves the results as a Zarr store to disk in the save_dir.

Parameters:

save_dir (os.PathLike) – The directory to save the AnnData objects to.

Returns:

A dictionary containing the names of the FOVs and the paths where they were saved.

Return type:

dict[str, str]

ark.utils.data_utils.erode_mask(seg_mask: numpy.ndarray, **kwargs) numpy.ndarray[source]

Erodes the edges labels of a segmentation mask. Other keyword arguments get passed to skimage.segmentation.find_boundaries.

Parameters:

seg_mask (np.ndarray) – The segmentation mask to erode.

Returns:

The eroded segmentation mask

Return type:

np.ndarray

ark.utils.data_utils.generate_and_save_cell_cluster_masks(fovs: List[str], save_dir: Union[Path, str], seg_dir: Union[Path, str], cell_data: pandas.DataFrame, cluster_id_to_name_path: Union[Path, str], fov_col: str = 'fov', label_col: str = 'label', cell_cluster_col: str = 'cell_meta_cluster', seg_suffix: str = '_whole_cell.tiff', sub_dir: str = None, name_suffix: str = '')[source]

Generates cell cluster masks and saves them for downstream analysis.

Parameters:
  • fovs (List[str]) – A list of fovs to generate and save pixel masks for.

  • save_dir (Union[pathlib.Path, str]) – The directory to save the generated cell cluster masks.

  • seg_dir (Union[pathlib.Path, str]) – The path to the segmentation data.

  • cell_data (pd.DataFrame) – The cell data with both cell SOM and meta cluster assignments.

  • cluster_id_to_name_path (Union[str, pathlib.Path]) – A path to a CSV identifying the cell cluster to manually-defined name mapping this is output by the remapping visualization found in metacluster_remap_gui.

  • fov_col (str, optional) – The column name containing the FOV IDs . Defaults to settings.FOV_ID ("fov").

  • label_col (str, optional) – The column name containing the cell label. Defaults to settings.CELL_LABEL ("label").

  • cell_cluster_col (str, optional) – Whether to assign SOM or meta clusters. Needs to be "cell_som_cluster" or "cell_meta_cluster". Defaults to settings.CELL_TYPE ("cell_meta_cluster").

  • seg_suffix (str, optional) – The suffix that the segmentation images use. Defaults to "_whole_cell.tiff".

  • sub_dir (str, optional) – The subdirectory to save the images in. If specified images are saved to "data_dir/sub_dir". If sub_dir = None the images are saved to "data_dir". Defaults to None.

  • name_suffix (str, optional) – Specify what to append at the end of every cell mask. Defaults to "".

ark.utils.data_utils.generate_and_save_neighborhood_cluster_masks(fovs: List[str], save_dir: Union[Path, str], seg_dir: Union[Path, str], neighborhood_data: pandas.DataFrame, fov_col: str = 'fov', label_col: str = 'label', cluster_col: str = 'kmeans_neighborhood', seg_suffix: str = '_whole_cell.tiff', xr_channel_name='label', sub_dir: Union[Path, str] = None, name_suffix: str = '')[source]

Generates neighborhood cluster masks and saves them for downstream analysis.

Parameters:
  • fovs (List[str]) – A list of fovs to generate and save neighborhood masks for.

  • save_dir (Union[pathlib.Path, str]) – The directory to save the generated pixel cluster masks.

  • seg_dir (Union[pathlib.Path, str]) – The path to the segmentation data.

  • neighborhood_data (pd.DataFrame) – Contains the neighborhood cluster assignments for each cell.

  • fov_col (str, optional) – The column name containing the FOV IDs . Defaults to settings.FOV_ID ("fov").

  • label_col (str, optional) – The column name containing the cell label. Defaults to settings.CELL_LABEL ("label").

  • cluster_col (str, optional) – The column name containing the cluster label. Defaults to settings.KMEANS_CLUSTER ("kmeans_neighborhood").

  • seg_suffix (str, optional) – The suffix that the segmentation images use. Defaults to '_whole_cell.tiff'

  • xr_channel_name (str) – Channel name for segmented data array.

  • sub_dir (str, optional) – The subdirectory to save the images in. If specified images are saved to "data_dir/sub_dir". If sub_dir = None the images are saved to "data_dir". Defaults to None.

  • name_suffix (str, optional) – Specify what to append at the end of every pixel mask. Defaults to ''.

ark.utils.data_utils.generate_and_save_pixel_cluster_masks(fovs: List[str], base_dir: Union[Path, str], save_dir: Union[Path, str], tiff_dir: Union[Path, str], chan_file: Union[Path, str], pixel_data_dir: Union[Path, str], cluster_id_to_name_path: Union[Path, str], pixel_cluster_col: str = 'pixel_meta_cluster', sub_dir: str = None, name_suffix: str = '')[source]

Generates pixel cluster masks and saves them for downstream analysis.

Parameters:
  • fovs (List[str]) – A list of fovs to generate and save pixel masks for.

  • base_dir (Union[pathlib.Path, str]) – The path to the data directory.

  • save_dir (Union[pathlib.Path, str]) – The directory to save the generated pixel cluster masks.

  • tiff_dir (Union[pathlib.Path, str]) – The path to the directory with the tiff data.

  • chan_file (Union[pathlib.Path, str]) – The path to the channel file inside each FOV folder (FOV folder as root). Used to determine dimensions of the pixel mask.

  • pixel_data_dir (Union[pathlib.Path, str]) – The path to the data with full pixel data. This data should also have the SOM and meta cluster labels appended.

  • cluster_id_to_name_path (Union[str, pathlib.Path]) – A path to a CSV identifying the pixel cluster to manually-defined name mapping this is output by the remapping visualization found in metacluster_remap_gui.

  • pixel_cluster_col (str, optional) – The path to the data with full pixel data. This data should also have the SOM and meta cluster labels appended. Defaults to ‘pixel_meta_cluster’.

  • sub_dir (str, optional) – The subdirectory to save the images in. If specified images are saved to "data_dir/sub_dir". If sub_dir = None the images are saved to "data_dir". Defaults to None.

  • name_suffix (str, optional) – Specify what to append at the end of every pixel mask. Defaults to ''.

ark.utils.data_utils.generate_cluster_mask(fov: str, seg_dir: Union[str, Path], cmd: ClusterMaskData, seg_suffix: str = '_whole_cell.tiff', erode: bool = True, **kwargs) numpy.ndarray[source]

For a fov, create a mask labeling each cell with their SOM or meta cluster label

Parameters:
  • fov (str) – The fov to relabel

  • seg_dir (str) – The path to the segmentation data

  • cmd (ClusterMaskData) – A dataclass containing the cell data, cell label column, cluster column and the mapping from the segmentation label to the cluster label for a given FOV.

  • seg_suffix (str) – The suffix that the segmentation images use. Defaults to '_whole_cell.tiff'.

  • erode (bool) – Whether to erode the edges of the segmentation mask. Defaults to True.

Returns:

The image where values represent cell cluster labels.

Return type:

numpy.ndarray

ark.utils.data_utils.generate_pixel_cluster_mask(fov, base_dir, tiff_dir, chan_file_path, pixel_data_dir, cluster_mapping, pixel_cluster_col='pixel_meta_cluster')[source]

For a fov, create a mask labeling each pixel with their SOM or meta cluster label

Parameters:
  • fov (list) – The fov to relabel

  • base_dir (str) – The path to the data directory

  • tiff_dir (str) – The path to the tiff data

  • chan_file_path (str) – The path to the sample channel file to load (tiff_dir as root). Used to determine dimensions of the pixel mask.

  • pixel_data_dir (str) – The path to the data with full pixel data. This data should also have the SOM and meta cluster labels appended.

  • cluster_mapping (pd.DataFrame) – Dataframe detailing which meta_cluster IDs map to which cluster_id

  • pixel_cluster_col (str) – Whether to assign SOM or meta clusters needs to be 'pixel_som_cluster' or 'pixel_meta_cluster'

Returns:

The image overlaid with pixel cluster labels

Return type:

numpy.ndarray

ark.utils.data_utils.label_cells_by_cluster(fov: str, cmd: ClusterMaskData, label_map: Union[numpy.ndarray, xarray.DataArray]) numpy.ndarray[source]

Translates cell-ID labeled images according to the clustering assignment found in cell_cluster_mask_data.

Parameters:
  • fov (str) – The FOV to relabel

  • cmd (ClusterMaskData) – A dataclass containing the cell data, cell label column, cluster column and the mapping from the segmentation label to the cluster label for a given FOV.

  • label_map (xarray.DataArray) – label map for a single FOV.

Returns:

The image with new designated label assignments

Return type:

numpy.ndarray

ark.utils.data_utils.load_anndatas(anndata_dir: PathLike, **anncollection_kwargs: Unpack[AnnCollectionKwargs]) anndata.experimental.AnnCollection[source]

Lazily loads a directory of AnnData objects into an AnnCollection. The concatination happens across the obs axis.

For AnnCollection kwargs, see https://anndata.readthedocs.io/en/latest/generated/anndata.experimental.AnnCollection.html

Parameters:

anndata_dir (os.PathLike) – The directory containing the AnnData objects.

Returns:

The AnnCollection containing the AnnData objects.

Return type:

AnnCollection

ark.utils.data_utils.map_segmentation_labels(labels: Union[pandas.Series, numpy.ndarray], values: Union[pandas.Series, numpy.ndarray], label_map: numpy.typing.ArrayLike, unassigned_id: float = 0) numpy.ndarray[source]

Maps an image consisting of segmentation labels to an image consisting of a particular type of statistic, metric, or value of interest.

Parameters:
  • labels (Union[pd.Series, np.ndarray]) – The segmentation labels.

  • values (Union[pd.Series, np.ndarray]) – The values to map to the segmentation labels.

  • label_map (ArrayLike) – The segmentation labels as an image to map to.

  • unassigned_id (int | float, optional) – A default value to assign there is exists no 1-to-1

  • 0. (mapping from a label in the label_map to a label in the labels argument. Defaults to) –

Returns:

Returns the mapped image.

Return type:

np.ndarray

ark.utils.data_utils.relabel_segmentation(mapping: numba.typed.typeddict, unassigned_id: numpy.int32, labeled_image: numpy.ndarray, _dtype: numpy.typing.DTypeLike = numpy.float64) numpy.ndarray

Relabels a labled segmentation image according to the provided values.

Parameters:
  • mapping (nb.typed.typeddict) – A Numba typed dictionary mapping segmentation labels to cluster labels.

  • unassigned_id (np.int32) – The label given to a pixel with no associated cluster.

  • labeled_image (np.ndarray) – The labeled segmentation image.

  • _dtype (DTypeLike, optional) – The data type of the relabeled image. Defaults to np.float64.

Returns:

The relabeled segmentation image.

Return type:

np.ndarray

ark.utils.data_utils.save_fov_mask(fov, data_dir, mask_data, sub_dir=None, name_suffix='')[source]

Saves a provided cluster label mask overlay for a FOV.

Parameters:
  • fov (str) – The FOV to save

  • data_dir (str) – The directory to save the cluster mask

  • mask_data (numpy.ndarray) – The cluster mask data for the FOV

  • sub_dir (Optional[str]) – The subdirectory to save the masks in. If specified images are saved to “data_dir/sub_dir”. If sub_dir = None the images are saved to "data_dir". Defaults to None.

  • name_suffix (str) – Specify what to append at the end of every fov.

ark.utils.data_utils.split_img_stack(stack_dir, output_dir, stack_list, indices, names, channels_first=True)[source]

Splits the channels in a given directory of images into separate files

Images are saved in the output_dir

Parameters:
  • stack_dir (str) – where we read the input files

  • output_dir (str) – where we write the split channel data

  • stack_list (list) – the names of the files we want to read from stack_dir

  • indices (list) – the indices we want to pull data from

  • names (list) – the corresponding names of the channels

  • channels_first (bool) – whether we index at the beginning or end of the array

ark.utils.data_utils.stitch_images_by_shape(data_dir, stitched_dir, img_sub_folder=None, channels=None, segmentation=False, clustering=False)[source]

Creates stitched images for the specified channels based on the FOV folder names

Parameters:
  • data_dir (str) – path to directory containing images

  • stitched_dir (str) – path to directory to save stitched images to

  • img_sub_folder (str) – optional name of image sub-folder within each fov

  • channels (list) – optional list of imgs to load, otherwise loads all imgs

  • segmentation (bool) – if stitching images from the single segmentation dir

  • clustering (bool or str) – if stitching images from the single pixel or cell mask dir, specify ‘pixel’ / ‘cell’

ark.utils.deepcell_service_utils

ark.utils.deepcell_service_utils.create_deepcell_output(deepcell_input_dir, deepcell_output_dir, fovs=None, wc_suffix='_whole_cell', nuc_suffix='_nuclear', host='https://deepcell.org', job_type='mesmer', scale=1.0, timeout=300, zip_size=5)[source]

Handles all of the necessary data manipulation for running deepcell tasks. Creates .zip files (to be used as input for DeepCell), calls run_deepcell_task method, and extracts zipped output files to the specified output location

Parameters:
  • deepcell_input_dir (str) – Location of preprocessed files (assume deepcell_input_dir contains <fov>.tiff for each fov in fovs list). This should not be a GoogleDrivePath.

  • deepcell_output_dir (str) – Location to save DeepCell output (as .tiff)

  • fovs (list) – List of fovs in preprocessing pipeline. if None, all .tiff files in deepcell_input_dir will be considered as input fovs. Default: None

  • wc_suffix (str) – Suffix for whole cell DeepCell output filename. e.g. for fovX, DeepCell output should be <fovX>+suffix.tif. Whole cell DeepCell files by default get suffixed with 'feature_0', it will be renamed to this arg.

  • nuc_suffix (str) – Suffix for nuclear DeepCell output filename. e.g. for fovX, DeepCell output should be <fovX>+suffix.tif. Nuclear DeepCell files by default get suffixed with 'feature_1', it will be renamed to this arg.

  • host (str) – Hostname and port for the kiosk-frontend API server Default: ‘https://deepcell.org

  • job_type (str) – Name of job workflow (multiplex, segmentation, tracking) Default: ‘multiplex’

  • scale (float) – Value to rescale data by Default: 1.0

  • timeout (int) – Approximate seconds until timeout. Default: 5 minutes (300)

  • zip_size (int) – Maximum number of files to include in zip. Default: 5

Raises:

ValueError – Raised if there is some fov X (from fovs list) s.t. the file <deepcell_input_dir>/fovX.tiff does not exist

ark.utils.deepcell_service_utils.extract_deepcell_response(deepcell_output_dir, fov_group, batch_num, wc_suffix, nuc_suffix)[source]

Helper function to extract the segmentation masks from the deepcell output zip file.

Parameters:
  • deepcell_output_dir (str) – path to where deepcell output zips are stored

  • fov_group (list) – list of fovs to process in this batch

  • batch_num (int) – the batch number

  • wc_suffix (str) – Suffix for whole cell DeepCell output filename. e.g. for fovX, DeepCell output should be <fovX>+suffix.tif. Whole cell DeepCell files by default get suffixed with 'feature_0', it will be renamed to this arg.

  • nuc_suffix (str) – Suffix for nuclear DeepCell output filename. e.g. for fovX, DeepCell output should be <fovX>+suffix.tif. Nuclear DeepCell files by default get suffixed with 'feature_1', it will be renamed to this arg.

ark.utils.deepcell_service_utils.generate_deepcell_input(data_dir, tiff_dir, nuc_channels, mem_channels, fovs, is_mibitiff=False, img_sub_folder='TIFs', dtype='int16')[source]

Saves nuclear and membrane channels into deepcell input format. Either nuc_channels or mem_channels should be specified.

Writes summed channel images out as multitiffs (channels first).

Parameters:
  • data_dir (str) – location to save deepcell input tifs

  • tiff_dir (str) – directory containing folders of images, is_mibitiff determines what type

  • nuc_channels (list) – nuclear channels to be summed over

  • mem_channels (list) – membrane channels to be summed over

  • fovs (list) – list of folders to or MIBItiff files to load imgs from

  • is_mibitiff (bool) – if the images are of type MIBITiff

  • img_sub_folder (str) – if is_mibitiff is False, define the image subfolder for each fov ignored if is_mibitiff is True

  • dtype (str/type) – optional specifier of image type. Overwritten with warning for float images

Raises:

ValueError – Raised if nuc_channels and mem_channels are both None or empty

ark.utils.deepcell_service_utils.run_deepcell_direct(input_dir, output_dir, host='https://deepcell.org', job_type='mesmer', scale=1.0, timeout=300)[source]

Uses direct calls to DeepCell API and saves output to output_dir.

Parameters:
  • input_dir (str) – location of .zip files

  • output_dir (str) – location to save deepcell output (as .zip)

  • host (str) – Hostname and port for the kiosk-frontend API server. Default: ‘https://deepcell.org

  • job_type (str) – Name of job workflow (mesmer, segmentation, tracking).

  • scale (float) – Value to rescale data by Default: 1.0

  • timeout (int) – Approximate seconds until timeout. Default: 5 minutes (300)

ark.utils.deepcell_service_utils.zip_input_files(deepcell_input_dir, fov_group, batch_num)[source]

Helper function which handles zipping the batch fov images into a single zip file.

Parameters:
  • deepcell_input_dir (str) – path to where deepcell input image files are stored

  • fov_group (list) – list of fovs to process in this batch

  • batch_num (int) – the batch number

Returns:

path to deepcell input zip file

Return type:

str

ark.utils.example_dataset

class ark.utils.example_dataset.ExampleDataset(dataset: str, overwrite_existing: bool = True, cache_dir: str = None, revision: str = None)[source]

Bases: object

__init__(dataset: str, overwrite_existing: bool = True, cache_dir: str = None, revision: str = None) None[source]

Constructs a utility class for downloading and moving the dataset with respect to it’s various partitions on Hugging Face: https://huggingface.co/datasets/angelolab/ark_example.

Parameters:
  • dataset (str) –

    The name of the dataset to download. Can be one of

    • "segment_image_data"

    • "cluster_pixels"

    • "cluster_cells"

    • "post_clustering"

    • "fiber_segmentation"

    • "LDA_preprocessing"

    • "LDA_training_inference"

    • "neighborhood_analysis"

    • "pairwise_spatial_enrichment"

    • "ome_tiff"

    • "ez_seg_data"

  • overwrite_existing (bool) – A flag to overwrite existing data. Defaults to True.

  • cache_dir (str, optional) – The directory to save the cache dir. Defaults to None, which internally in Hugging Face defaults to cache/huggingface/datasets.

  • revision (str, optional) – The commit ID from Hugging Face for the dataset. Used for internal development only. Allows the user to fetch a commit from a particular revision (Hugging Face’s terminology for branch). Defaults to None. Which defaults to the latest version in the main branch. (https://huggingface.co/datasets/angelolab/ark_example/tree/main).

check_empty_dst(dst_path: Path) bool[source]

Checks to see if the folder for a dataset config already exists in the save_dir (i.e. dst_path is the specific folder for the config.). If the folder exists, and there are no contents, then it’ll return True, False otherwise.

Parameters:
  • dst_path (pathlib.Path) – The destination directory to check to see if

  • it. (files exist in) –

Returns:

Returns True if there are no files in the directory dst_path.

Returns False if there are files in that directory dst_path.

Return type:

bool

download_example_dataset()[source]

Downloads the example dataset from Hugging Face Hub. The following is a link to the dataset used: https://huggingface.co/datasets/angelolab/ark_example

The dataset will be downloaded to the Hugging Face default cache cache/huggingface/datasets.

move_example_dataset(move_dir: Union[str, Path])[source]

Moves the downloaded example data from the cache_dir to the save_dir.

Parameters:

move_dir (Union[str, pathlib.Path]) – The path to save the dataset files in.

path_suffixes

Path suffixes for mapping each downloaded dataset partition to it’s appropriate relative save directory.

ark.utils.example_dataset.get_example_dataset(dataset: str, save_dir: Union[str, Path], overwrite_existing: bool = True)[source]

A user facing wrapper function which downloads a specified dataset from Hugging Face, and moves it to the specified save directory save_dir. The dataset may be found here: https://huggingface.co/datasets/angelolab/ark_example

Parameters:
  • dataset (str) –

    The name of the dataset to download. Can be one of

    • "segment_image_data"

    • "cluster_pixels"

    • "cluster_cells"

    • "post_clustering"

    • "fiber_segmentation"

    • "LDA_preprocessing"

    • "LDA_training_inference"

    • "neighborhood_analysis"

    • "pairwise_spatial_enrichment"

    • "ez_seg_data"

  • save_dir (Union[str, pathlib.Path]) – The path to save the dataset files in.

  • overwrite_existing (bool) – The option to overwrite existing configs of the dataset downloaded. Defaults to True.

ark.utils.masking_utils

ark.utils.masking_utils.create_cell_mask(seg_mask, cell_table, fov_name, cell_types, cluster_col='cell_meta_cluster', sigma=10, min_object_area=0, max_hole_area=1000)[source]

Generates a binary from the cells listed in cell_types

Parameters:
  • seg_mask (numpy.ndarray) – segmentation mask

  • cell_table (pandas.DataFrame) – cell table containing segmentation IDs and cell types

  • fov_name (str) – name of the fov to process

  • cell_types (list) – list of cell types to include in the mask

  • cluster_col (str) – column in cell table containing cell cluster

  • sigma (float) – sigma for gaussian smoothing

  • min_object_area (int) – minimum size of object to include, default 0

  • max_hole_area (int) – maximum size of a hole to leave without filling, default 0

Returns:

binary mask

Return type:

numpy.ndarray

ark.utils.masking_utils.generate_cell_masks(seg_dir, mask_dir, cell_table, cell_types, mask_name, cluster_col='cell_meta_cluster', sigma=10, min_object_area=0, max_hole_area=1000)[source]

Creates a single cell mask for each FOV when given the cell types to aggregate.

Parameters:
  • seg_dir (str) – path to the cell segmentation tiff directory

  • mask_dir (str) – path where the masks will be saved

  • cell_table (pd.DataFrame) – Dataframe containing all cell labels and their cell type

  • cell_types (list) – list of cell phenotypes that will be used to create the mask

  • cluster_col (str) – column in cell table containing cell cluster

  • mask_name (str) – name for the new mask file created

  • sigma (float) – sigma for gaussian smoothing

  • min_object_area (int) – minimum size of objects to include, default 0

  • max_hole_area (int) – maximum size of a hole to leave without filling, default 0

ark.utils.masking_utils.generate_signal_masks(img_dir, mask_dir, channels, mask_name, intensity_thresh_perc='auto', sigma=2, min_object_area=5000, max_hole_area=1000)[source]

Creates a single signal mask for each FOV when given the channels to aggregate.

Parameters:
  • img_dir (str) – path to the image tiff directory

  • mask_dir (str) – path where the masks will be saved

  • channels (list) – list of channels to combine to create a single mask for

  • mask_name (str) – name for the new mask file created

  • intensity_thresh_perc (int) – percentile to threshold intensity values in the image, defaults to “auto” which will calculate an appropriate percentile for the user

  • sigma (float) – sigma for gaussian blur

  • min_object_area (int) – minimum size of masked objects to include

  • max_hole_area (int) – maximum size of holes to leave in masked objects

ark.utils.plot_utils

class ark.utils.plot_utils.MetaclusterColormap(cluster_type: str, cluster_id_to_name_path: Union[str, Path], metacluster_colors: Dict)[source]

Bases: object

A dataclass which contains the colormap-related information for the metaclusters.

background_color: Tuple[float, ...]
cluster_id_to_name_path: Union[str, Path]
cluster_type: str
cmap: matplotlib.colors.ListedColormap
mc_colors: numpy.ndarray
metacluster_colors: Dict
metacluster_id_to_name: pandas.DataFrame
norm: matplotlib.colors.BoundaryNorm
unassigned_color: Tuple[float, ...]
unassigned_id: int
ark.utils.plot_utils.cohort_cluster_plot(fovs: List[str], seg_dir: Union[Path, str], save_dir: Union[Path, str], cell_data: pandas.DataFrame, fov_col: str = 'fov', label_col: str = 'label', cluster_col: str = 'cell_meta_cluster', seg_suffix: str = '_whole_cell.tiff', cmap: Union[str, pandas.DataFrame] = 'viridis', style: str = 'seaborn-v0_8-paper', erode: bool = False, display_fig: bool = False, fig_file_type: str = 'png', figsize: tuple = (10, 10), dpi: int = 300) None[source]

Saves the cluster masks for each FOV in the cohort as the following: - Cluster mask numbered 1-N, where N is the number of clusters (tiff) - Cluster mask colored by cluster with or without a colorbar (png) - Cluster mask colored by cluster (tiff).

Parameters:
  • fovs (List[str]) – A list of FOVs to generate cluster masks for.

  • seg_dir (Union[pathlib.Path, str]) – The directory containing the segmentation masks.

  • save_dir (Union[pathlib.Path, str]) – The directory to save the cluster masks to.

  • cell_data (pd.DataFrame) – The cell data table containing the cluster labels.

  • fov_col (str, optional) – The column containing the FOV name. Defaults to settings.FOV_ID.

  • label_col (str, optional) – The column containing the segmentaiton label. Defaults to settings.CELL_LABEL.

  • cluster_col (str, optional) – The column containing the cluster a segmentation label belongs to. Defaults to settings.CELL_TYPE.

  • seg_suffix (str, optional) – The kind of segmentation file to read. Defaults to “_whole_cell.tiff”.

  • cmap (str, pd.DataFrame, optional) – The colormap to generate clusters from, or a DataFrame, where the user can specify their own colors per cluster. The color column must be labeled “color”. Defaults to “viridis”.

  • style (str, optional) – Set the matplotlib style image style. Defaults to “seaborn-v0_8-paper”. View the available styles here: https://matplotlib.org/stable/gallery/style_sheets/style_sheets_reference.html Or run matplotlib.pyplot.style.available in a notebook to view all the styles.

  • erode (bool, optional) – Option to “thicken” the cell boundary via the segmentation label for visualization purposes. Defaults to False.

  • display_fig (bool, optional) – Option to display the cluster mask plots as they are generated. Defaults to False. Displaying each figure can use a lot of memory, so it’s best to try to visualize just a few FOVs, before generating the cluster masks for the entire cohort.

  • fig_file_type (str, optional) – The file type to save figures as. Defaults to ‘png’.

  • figsize (tuple, optional) – The size of the figure to display. Defaults to (10, 10).

  • dpi (int, optional) – The resolution of the image to use for saving. Defaults to 300.

ark.utils.plot_utils.color_segmentation_by_stat(fovs: List[str], data_table: pandas.DataFrame, seg_dir: Union[Path, str], save_dir: Union[Path, str], fov_col: str = 'fov', label_col: str = 'label', stat_name: str = 'cell_meta_cluster', cmap: str = 'viridis', reverse: bool = False, seg_suffix: str = '_whole_cell.tiff', cbar_visible: bool = True, style: str = 'seaborn-v0_8-paper', erode: bool = False, display_fig: bool = False, fig_file_type: str = 'png', figsize: tuple = (10, 10), dpi: int = 300)[source]

Colors segmentation masks by a given continuous statistic.

Parameters:
  • fovs – (List[str]): A list of FOVs to plot.

  • data_table (pd.DataFrame) –

    A DataFrame containing FOV and segmentation label identifiers as well as a collection of statistics for each label in a segmentation mask such as:

    • fov_id (identifier)

    • label (identifier)

    • area (statistic)

    • fiber (statistic)

    • etc…

  • seg_dir (Union[pathlib.Path, str]) – Path to the directory containing segmentation masks.

  • save_dir (Union[pathlib.Path, str]) – Path to the directory where the colored segmentation masks will be saved.

  • fov_col – (str, optional): The name of the column in data_table containing the FOV identifiers. Defaults to “fov”.

  • label_col (str, optional) – The name of the column in data_table containing the segmentation label identifiers. Defaults to “label”.

  • stat_name (str) – The name of the statistic to color the segmentation masks by. This should be a column in data_table.

  • seg_suffix (str, optional) – The suffix of the segmentation file and it’s file extension. Defaults to “_whole_cell.tiff”.

  • cmap (str, optional) – The colormap for plotting. Defaults to “viridis”.

  • reverse (bool, optional) – A flag to reverse the colormap provided. Defaults to False.

  • cbar_visible (bool, optional) – A flag to display the colorbar. Defaults to True.

  • erode (bool, optional) – Option to “thicken” the cell boundary via the segmentation label for visualization purposes. Defaults to False.

  • style (str, optional) – Set the matplotlib style image style. Defaults to “seaborn-v0_8-paper”. View the available styles here: https://matplotlib.org/stable/gallery/style_sheets/style_sheets_reference.html Or run matplotlib.pyplot.style.available in a notebook to view all the styles.

  • display_fig – (bool, optional): Option to display the cluster mask plots as they are generated. Defaults to False.

  • fig_file_type (str, optional) – The file type to save figures as. Defaults to ‘png’.

  • figsize (tuple, optional) – The size of the figure to display. Defaults to (10, 10).

  • dpi (int, optional) – The resolution of the image to use for saving. Defaults to 300.

ark.utils.plot_utils.create_cmap(cmap: Union[numpy.ndarray, list[str], str], n_clusters: int) tuple[matplotlib.colors.ListedColormap, matplotlib.colors.BoundaryNorm][source]

Creates a discrete colormap and a boundary norm from the provided colors.

Parameters:
  • cmap (Union[np.ndarray, list[str], str]) – The colormap, or set of colors to use.

  • n_clusters (int) – The numbe rof clusters for the colormap.

Returns:

The generated colormap and boundary norm.

Return type:

tuple[colors.ListedColormap, colors.BoundaryNorm]

ark.utils.plot_utils.create_mantis_dir(fovs: List[str], mantis_project_path: Union[str, Path], img_data_path: Union[str, Path], mask_output_dir: Union[str, Path], mapping: Union[str, Path, pandas.DataFrame], seg_dir: Optional[Union[str, Path]], cluster_type='pixel', mask_suffix: str = '_mask', seg_suffix_name: Optional[str] = '_whole_cell.tiff', img_sub_folder: str = None, new_mask_suffix: str = None)[source]

Creates a mantis project directory so that it can be opened by the mantis viewer. Copies fovs, segmentation files, masks, and mapping csv’s into a new directory structure. Here is how the contents of the mantis project folder will look like.

`{code-block} sh mantis_project ├── fov0    ├── cell_segmentation.tiff    ├── chan0.tiff    ├── chan1.tiff    ├── chan2.tiff    ├── ...    ├── population_mask.csv    └── population_mask.tiff └── fov1    ├── cell_segmentation.tiff    ├── chan0.tiff    ├── chan1.tiff    ├── chan2.tiff    ├── ...    ├── population_mask.csv    └── population_mask.tiff └── ... `

Parameters:
  • fovs (List[str]) – A list of FOVs to create a Mantis Project for.

  • mantis_project_path (Union[str, pathlib.Path]) – The folder where the mantis project will be created.

  • img_data_path (Union[str, pathlib.Path]) – The location of the all the fovs you wish to create a project from.

  • mask_output_dir (Union[str, pathlib.Path]) – The folder containing all the masks of the fovs.

  • mapping (Union[str, pathlib.Path, pd.DataFrame]) – The location of the mapping file, or the mapping Pandas DataFrame itself.

  • seg_dir (Union[str, pathlib.Path], optional) – The location of the segmentation directory for the fovs. If None, then the segmentation file will not be copied over.

  • cluster_type (str) – the type of clustering being done

  • mask_suffix (str, optional) – The suffix used to find the mask tiffs. Defaults to “_mask”.

  • seg_suffix_name (str, optional) – The suffix of the segmentation file and it’s file extension. If None, then the segmentation file will not be copied over. Defaults to “_whole_cell.tiff”.

  • img_sub_folder (str, optional) – The subfolder where the channels exist within the img_data_path. Defaults to None.

  • new_mask_suffix (str, optional) – The new suffix added to the copied mask tiffs.

ark.utils.plot_utils.create_overlay(fov, segmentation_dir, data_dir, img_overlay_chans, seg_overlay_comp, alternate_segmentation=None)[source]

Take in labeled contour data, along with optional mibi tif and second contour, and overlay them for comparison” Generates the outline(s) of the mask(s) as well as intensity from plotting tif. Predicted contours are colored red, while alternate contours are colored white.

Parameters:
  • fov (str) – The name of the fov to overlay

  • segmentation_dir (str) – The path to the directory containing the segmentation data

  • data_dir (str) – The path to the directory containing the nuclear and whole cell image data

  • img_overlay_chans (list) – List of channels the user will overlay

  • seg_overlay_comp (str) – The segmented compartment the user will overlay

  • alternate_segmentation (numpy.ndarray) – 2D numpy array of labeled cell objects

Returns:

The image with the channel overlay

Return type:

numpy.ndarray

ark.utils.plot_utils.plot_cluster(image: numpy.ndarray, fov: str, cmap: matplotlib.colors.ListedColormap, norm: matplotlib.colors.BoundaryNorm, cbar_visible: bool = True, cbar_labels: list[str] = None, dpi: int = 300, figsize: tuple[int, int] = None) matplotlib.figure.Figure[source]

Plots the cluster image with the provided colormap and norm.

Parameters:
  • image (np.ndarray) – The cluster image to plot.

  • fov (str) – The name of the clustered FOV.

  • cmap (colors.ListedColormap) – A colormap to use for the cluster image.

  • norm (colors.BoundaryNorm) – A normalization to use for the cluster image.

  • cbar_visible (bool, optional) – Whether or not to display the colorbar. Defaults to True.

  • cbar_labels (list[str], optional) – Colorbar labels for the clusters. Devaults to None, where the labels will be automatically generated.

  • dpi (int, optional) – The resolution of the image to use for saving. Defaults to 300.

  • figsize (tuple, optional) – The size of the image to display. Defaults to (10, 10).

Returns:

Returns the cluster image as a matplotlib Figure.

Return type:

Figure

ark.utils.plot_utils.plot_continuous_variable(image: numpy.ndarray, name: str, stat_name: str, cmap: Union[matplotlib.colors.Colormap, str], norm: matplotlib.colors.Normalize = None, cbar_visible: bool = True, dpi: int = 300, figsize: tuple[int, int] = (10, 10)) matplotlib.figure.Figure[source]

Plots an image measuring some type of continuous variable with a user provided colormap.

Parameters:
  • image (np.ndarray) – An array representing an image to plot.

  • name (str) – The name of the image.

  • stat_name (str) – The name of the statistic to plot, this will be the colormap’s label.

  • cmap (colors.Colormap, str, optional) – A colormap to plot the array with. Defaults to “viridis”.

  • cbar_visible (bool, optional) – A flag for setting the colorbar on or not. Defaults to True.

  • norm (colors.Normalize, optional) – A normalization to apply to the colormap.

  • dpi (int, optional) – The resolution of the image. Defaults to 300.

  • figsize (tuple[int, int], optional) – The size of the image. Defaults to (10, 10).

Returns:

The Figure object of the image.

Return type:

Figure

ark.utils.plot_utils.plot_neighborhood_cluster_result(img_xr: xarray.DataArray, fovs: list[str], k: int, cmap_name: str = 'tab20', cbar_visible: bool = True, save_dir: Union[str, Path] = None, fov_col: str = 'fovs', dpi: int = 300, figsize=(10, 10)) None[source]

Plots the neighborhood clustering results for the provided FOVs.

Parameters:
  • img_xr (xr.DataArray) – DataArray containing labeled cells.

  • fovs (list[str]) – A list of FOVs to plot.

  • k (int) – The number of neighborhoods / clusters.

  • cmap_name (str, optional) – The Colormap to use for clustering results. Defaults to “tab20”.

  • cbar_visible (bool, optional) – Whether or not to display the colorbar. Defaults to True.

  • save_dir (Union[str, pathlib.Path], optional) – The image will be saved to this location if provided. Defaults to None.

  • fov_col (str, optional) – The column with the fov names in img_xr. Defaults to “fovs”.

  • dpi (int, optional) – The resolution of the image to use for saving. Defaults to 300.

  • figsize (tuple, optional) – The size of the image to display. Defaults to (10, 10).

ark.utils.plot_utils.plot_pixel_cell_cluster(img_xr: xarray.DataArray, fovs: list[str], cluster_id_to_name_path: Union[str, Path], metacluster_colors: Dict, cluster_type: Union[Literal[‘pixel’], Literal[‘cell’]] = 'pixel', cbar_visible: bool = True, save_dir=None, fov_col: str = 'fovs', erode: bool = False, dpi=300, figsize=(10, 10))[source]

Overlays the pixel and cell clusters on an image

Parameters:
  • img_xr (xr.DataArray) – DataArray containing labeled pixel or cell clusters

  • fovs (list[str]) – A list of FOVs to plot.

  • cluster_id_to_name_path (str) – A path to a CSV identifying the pixel/cell cluster to manually-defined name mapping this is output by the remapping visualization found in metacluster_remap_gui

  • metacluster_colors (dict) – Dictionary which maps each metacluster id to a color

  • cluster_type (“pixel” or “cell”) – the type of clustering being done.

  • cbar_visible (bool, optional) – Whether or not to display the colorbar. Defaults to True.

  • save_dir (str) – If provided, the image will be saved to this location.

  • fov_col (str) – The column with the fov names in img_xr. Defaults to “fovs”.

  • erode (bool) – Whether or not to erode the segmentation mask. Defaults to False.

  • dpi (int) – The resolution of the image to use for saving. Defaults to 300.

  • figsize (tuple) – Size of the image that will be displayed.

ark.utils.plot_utils.save_colored_mask(fov: str, save_dir: str, suffix: str, data: numpy.ndarray, cmap: matplotlib.colors.ListedColormap, norm: matplotlib.colors.BoundaryNorm) None[source]

Saves the colored mask to the provided save directory.

Parameters:
  • fov (str) – The name of the FOV.

  • save_dir (str) – The directory where the colored mask will be saved.

  • suffix (str) – The suffix to append to the FOV name.

  • data (np.ndarray) – The mask to save.

  • cmap (colors.ListedColormap) – The colormap to use for the mask.

  • norm (colors.BoundaryNorm) – The normalization to use for the mask.

ark.utils.plot_utils.save_colored_masks(fovs: List[str], mask_dir: Union[str, Path], save_dir: Union[str, Path], cluster_id_to_name_path: Union[str, Path], metacluster_colors: Dict, cluster_type: Literal[‘cell’, ‘pixel’]) None[source]

Converts the pixie TIFF masks into colored TIFF masks using the provided colormap and saves them in the save_dir. Mainly used for visualization purposes.

Parameters:
  • fovs (List[str]) – A list of FOVs to save their associated color masks for.

  • mask_dir (Union[str, pathlib.Path]) – The directory where the pixie masks are stored.

  • save_dir (Union[str, pathlib.Path]) – The directory where the colored masks will be saved.

  • cluster_id_to_name_path (Union[str, pathlib.Path]) – A path to a CSV identifying the pixel/cell cluster to manually-defined name mapping this is output by the remapping visualization found in metacluster_remap_gui.

  • metacluster_colors (Dict) – Maps each metacluster id to a color.

  • cluster_type (Literal[“cell”, “pixel”]) – The type of clustering being done.

ark.utils.plot_utils.set_minimum_color_for_colormap(cmap, default=(0, 0, 0, 1))[source]

Changes minimum value in provided colormap to black (#000000) or provided color

This is useful for instances where zero-valued regions of an image should be distinct from positive regions (i.e transparent or non-colormap member color)

Parameters:
  • cmap (matplotlib.colors.Colormap) – matplotlib color map

  • default (Iterable) – RGBA color values for minimum color. Default is black, (0, 0, 0, 1).

Returns:

corrected colormap

Return type:

matplotlib.colors.Colormap

ark.utils.plot_utils.tif_overlay_preprocess(segmentation_labels, plotting_tif)[source]

Validates plotting_tif and preprocesses it accordingly :param segmentation_labels: 2D numpy array of labeled cell objects :type segmentation_labels: numpy.ndarray :param plotting_tif: 2D or 3D numpy array of imaging signal :type plotting_tif: numpy.ndarray

Returns:

The preprocessed image

Return type:

numpy.ndarray

ark.utils.spatial_lda_utils

ark.utils.spatial_lda_utils.check_featurize_cell_table_args(cell_table, featurization, radius, cell_index)[source]

Checks the input arguments of the featurize_cell_table() function.

Parameters:
  • cell_table (dict) – A dictionary whose elements are the correctly formatted dataframes for each field of view.

  • featurization (str) – One of “cluster”, “marker”, “avg_marker”, or “count”.

  • radius (int) – Pixel radius corresponding to cellular neighborhood size.

  • cell_index (str) – Name of the column in each field of view pd.Dataframe indicating reference cells.

ark.utils.spatial_lda_utils.check_format_cell_table_args(cell_table, markers, clusters)[source]

Checks the input arguments of the format_cell_table() function.

Parameters:
  • cell_table (pandas.DataFrame) – A DataFrame containing the columns of cell marker frequencies and/or cluster ids.

  • markers (list) – A list of strings corresponding to marker names.

  • clusters (list) – A list of cell cluster names.

ark.utils.spatial_lda_utils.make_plot_fn(plot='adjacency', difference_matrices=None, topic_weights=None, cell_table=None, color_palette=palettable.colorbrewer.qualitative.Set3_12.mpl_colors)[source]

Helper function for making plots using the spatial-lda library.

Parameters:
  • plot (str) – Which plot function to return. One of “adjacency” or “topic_assignment”

  • difference_matrices (dict) – A dictionary of featurized difference matrices for each field of view.

  • topic_weights (pandas.DataFrame) – The data frame of cell topic weights from a fitted spatial-LDA model.

  • cell_table (dict) – A formatted cell table

  • color_palette (List[Tuple[float, float, float]]) – Color palette in mpl format (list of rgb tuples)

Returns:

A function for plotting spatial-LDA data.

Return type:

Callable

ark.utils.spatial_lda_utils.plot_fovs_with_topics(ax, fov_idx, topic_weights, cell_table, uncolor_subset=None, color_palette=palettable.colorbrewer.qualitative.Set3_12.mpl_colors)[source]

Helper function for plotting outputs from a fitted spatial-LDA model.

Parameters:
  • ax – Plot axis

  • fov_idx (int) – The index of the field of view to plot

  • topic_weights (pandas.DataFrame) – The data frame of cell topic weights from a fitted spatial-LDA model.

  • cell_table (dict) – A formatted cell table

  • uncolor_subset (str | None) – Name of cell type to leave uncolored

  • color_palette (List[Tuple[float, float, float]]) – Color palette in mpl format

ark.utils.spatial_lda_utils.plot_topics_heatmap(topics, features, normalizer=None, transpose=False, scale=0.4)[source]

Plots topic heatmap. Topics will be displayed on lower axis by default.

Parameters:
  • topics (pd.DataFrame | np.ndarray) – topic assignments based off of trained featurization

  • features (list | np.ndarray) – feature names for display

  • normalizer (Callable[(np.ndarray,), np.ndarray]) – topic normalization for easier visualization. Default is standardization.

  • transpose (bool) – swap topic and features axes. helpful when the number of features is larger than the number of topics.

  • scale (float) – plot to text size scaling. for smaller text/larger label gaps, increase this value.

ark.utils.spatial_lda_utils.read_spatial_lda_file(dir, file_name, format='pkl')[source]

Helper function reading spatial-LDA objects.

Parameters:
  • dir (str) – The directory where the data is located.

  • file_name (str) – Name of the data file.

  • format (str) – The designated file extension. Must be one of either ‘pkl’ or ‘csv’.

Returns:

Either an individual data frame, a dictionary, or a spatial_lda model.

Return type:

pd.DataFrame | dict | spatial_lda.online_lda.LatentDirchletAllocation

ark.utils.spatial_lda_utils.save_spatial_lda_file(data, dir, file_name, format='pkl')[source]

Helper function saving spatial-LDA objects.

Parameters:
  • data (dict, pandas.DataFrame) – A dictionary or data frame.

  • dir (str) – The directory where the data will be saved.

  • file_name (str) – Name of the data file.

  • format (str) – The designated file extension. Must be one of either ‘pkl’ or ‘csv’.

ark.utils.spatial_lda_utils.within_cluster_sums(data, labels)[source]

Computes the pooled within-cluster sum of squares for the gap statistic .

Parameters:
  • data (pandas.DataFrame) – A formatted and featurized cell table.

  • labels (numpy.ndarray) – A list of cluster labels corresponding to cluster assignments in data.

Returns:

The pooled within-cluster sum of squares for a given clustering iteration.

Return type:

float