ark.segmentation

Subpackages

ark.segmentation.fiber_segmentation

ark.segmentation.fiber_segmentation.calculate_density(fov_fiber_table, total_pixels)[source]

Calculates both pixel area and fiber number based densities. pixel based = fiber pixel area / total image area fiber number based = number of fibers / total image area

Parameters:
  • fov_fiber_table (pd.DataFrame) – the array representation of the fiber segmented mask for an image

  • total_pixels (int) – area of the image

Returns:

  • returns the both densities scaled up by 100

Return type:

tuple (float, float)

ark.segmentation.fiber_segmentation.calculate_fiber_alignment(fiber_object_table, k=4, axis_thresh=2)[source]

Calculates an alignment score for each fiber in an image. Based on the angle difference of the fiber compared to it’s k nearest neighbors.

Parameters:
  • fiber_object_table (pd.DataFrame) –

    dataframe containing the fiber objects and their properties (fov, label, alignment,

    centroid-0, centroid-1, major_axis_length, minor_axis_length)

  • k (int) – number of neighbors to check alignment difference for

  • axis_thresh (int) – threshold for how much longer the length of the fiber must be compared to the width

Returns:

  • Dataframe with the alignment scores appended

Return type:

pd.DataFrame

ark.segmentation.fiber_segmentation.generate_summary_stats(fiber_object_table, fibseg_dir, tile_length=512, min_fiber_num=5, save_tiles=False)[source]

Calculates the fov level and tile level statistics for alignment, length, and density. Saves them to separate csvs.

Parameters:
  • fiber_object_table (pd.DataFrame) – dataframe containing the fiber objects and their properties (fov, label, alignment, centroid-0, centroid-1, major_axis_length, minor_axis_length)

  • fibseg_dir (string) – path to directory containing the fiber segmentation masks

  • tile_length (int) – length of tile size, must be a factor of the total image size (default 512)

  • min_fiber_num (int) – the amount of fibers to get tile statistics calculated, if not then NaN (default 5)

  • save_tiles (bool) – whether to save cropped images (default to False)

Returns:

  • returns the both fov and tile stats

Return type:

tuple (pd.DataFrame, pd.DataFrame)

ark.segmentation.fiber_segmentation.generate_tile_stats(fov_table, fov_fiber_img, fov_length, tile_length, min_fiber_num, save_dir, save_tiles)[source]

Calculates the tile level statistics for alignment, length, and density.

Parameters:
  • fov_table (pd.DataFrame) – dataframe containing the fiber objects and their properties (fov, label, alignment, centroid-0, centroid-1, major_axis_length, minor_axis_length)

  • fov_fiber_img (np.array) – represents the fiber mask

  • fov_length (int) – length of the image

  • tile_length (int) – length of tile size, must be a factor of the total image size (default 512)

  • min_fiber_num (int) – the amount of fibers to get tile statistics calculated, if not then NaN (default 5)

  • save_dir (str) – directory where to save tiled image folder to

  • save_tiles (bool) – whether to save cropped images (default to False)

Returns:

  • a dataframe specifying each tile in the image and its calculated stats

Return type:

pd.DataFrame

ark.segmentation.fiber_segmentation.plot_fiber_segmentation_steps(data_dir, fov_name, fiber_channel, img_sub_folder=None, blur=2, contrast_scaling_divisor=128, fiber_widths=range(1, 10, 2), ridge_cutoff=0.1, sobel_blur=1, min_fiber_size=15, img_cmap=matplotlib.pyplot.cm.bone, labels_cmap=matplotlib.pyplot.cm.cool)[source]

Plots output from each fiber segmentation step for single FoV

Parameters:
  • data_dir (str | PathLike) – Folder containing dataset

  • fov_name (str) – Name of test FoV

  • fiber_channel (str) – Channel for fiber segmentation, e.g collagen

  • img_sub_folder (str | NoneType) – Whether to expect image subfolder in data_dir. If no subfolder, set to None.

  • blur (float) – Preprocessing gaussian blur radius

  • contrast_scaling_divisor (int) – Roughly speaking, the average side length of a fibers bounding box. This argument controls the local contrast enhancement operation, which helps differentiate dim fibers from only slightly more dim backgrounds. This should always be a power of two.

  • fiber_widths (Iterable) – Widths of fibers to filter for. Be aware that adding larger fiber widths can join close, narrow branches into one thicker fiber.

  • ridge_cutoff (float) – Threshold for ridge inclusion post-frangi filtering.

  • sobel_blur (float) – Gaussian blur radius for sobel driven elevation map creation

  • min_fiber_size (int) – Minimum area of fiber object

  • img_cmap (matplotlib.cm.Colormap) – Matplotlib colormap to use for (non-labeled) images

  • labels_cmap (matplotlib.cm.Colormap) – Base matplotlib colormap to use for labeled images. This will only be applied to the non-zero labels, with the zero-region being colored black.

ark.segmentation.fiber_segmentation.run_fiber_segmentation(data_dir, fiber_channel, out_dir, img_sub_folder=None, csv_compression: Optional[Dict[str, str]] = None, **kwargs)[source]

Segments fibers one FOV at a time

Parameters:
  • data_dir (str | PathLike) – Folder containing dataset

  • fiber_channel (str) – Channel for fiber segmentation, e.g collagen.

  • out_dir (str | PathLike) – Directory to save fiber object labels and table.

  • img_sub_folder (str | NoneType) – Image subfolder name in data_dir. If there is not subfolder, set this to None.

  • csv_compression (Optional[Dict[str, str]]) – Dictionary of compression arguments to pass when saving csvs. See to_csv for details.

  • **kwargs – Keyword arguments for segment_fibers

Returns:

  • Dataframe containing the fiber objects and their properties

Return type:

pd.DataFrame

ark.segmentation.fiber_segmentation.segment_fibers(data_xr, fiber_channel, out_dir, fov, blur=2, contrast_scaling_divisor=128, fiber_widths=range(1, 10, 2), ridge_cutoff=0.1, sobel_blur=1, min_fiber_size=15, object_properties=('label', 'centroid', 'major_axis_length', 'minor_axis_length', 'orientation', 'area', 'eccentricity', 'euler_number'), save_csv=True, debug=False)[source]

Segments fiber objects from image data

Parameters:
  • data_xr (xr.DataArray) – Multiplexed image data in (fov, x, y, channel) format

  • fiber_channel (str) – Channel for fiber segmentation, e.g collagen.

  • out_dir (str | PathLike) – Directory to save fiber object labels and table.

  • fov (str) – name of the fov being processed

  • blur (float) – Preprocessing gaussian blur radius

  • contrast_scaling_divisor (int) – Roughly speaking, the average side length of a fibers bounding box. This argument controls the local contrast enhancement operation, which helps differentiate dim fibers from only slightly more dim backgrounds. This should always be a power of two.

  • fiber_widths (Iterable) – Widths of fibers to filter for. Be aware that adding larger fiber widths can join close, narrow branches into one thicker fiber.

  • ridge_cutoff (float) – Threshold for ridge inclusion post-frangi filtering.

  • sobel_blur (float) – Gaussian blur radius for sobel driven elevation map creation

  • min_fiber_size (int) – Minimum area of fiber object

  • object_properties (Iterable[str]) –

    Properties to compute, any keyword for region props may be used. Defaults are:
    • major_axis_length

    • minor_axis_length

    • orientation

    • centroid

    • label

    • eccentricity

    • euler_number

  • save_csv (bool) – Whether or not to save csv of fiber objects

  • debug (bool) – Save intermediate preprocessing steps

Returns:

  • Dataframe containing the fiber objects and their properties

Return type:

pd.DataFrame

ark.segmentation.marker_quantification

ark.segmentation.marker_quantification.assign_multi_compartment_features(marker_counts, regionprops_multi_comp, **kwargs)[source]

Assigns features to marker_counts that depend on multiple compartments

Parameters:
  • marker_counts (xarray.DataArray) – xarray containing segmentaed data of cells x markers

  • regionprops_multi_comp (list) – list of multi-compartment properties derived from regionprops to compute, each value should correspond to a value in REGIONPROPS_FUNCTION

  • **kwargs – arbitrary keyword arguments

Returns:

the updated marker_counts matrix with data for the specified cell_id and compartment

Return type:

xarray.DataArray

ark.segmentation.marker_quantification.assign_single_compartment_features(marker_counts, compartment, cell_props, cell_coords, cell_id, label_id, input_images, regionprops_names, extraction, **kwargs)[source]

Assign computed regionprops features and signal intensity to cell_id in marker_counts

Parameters:
  • marker_counts (xarray.DataArray) – xarray containing segmentaed data of cells x markers

  • compartment (str) – either ‘whole_cell’ or ‘nuclear’

  • cell_props (pandas.DataFrame) – regionprops information for each cell

  • cell_coords (numpy.ndarray) – values representing pixels within one cell

  • cell_id (int) – id of the cell

  • label_id (int) – id used to index into cell_props

  • input_images (xarray.DataArray) – rows x columns x channels matrix of imaging data

  • regionprops_names (list) – all of the regionprops features (including derived, except nuclear-specific)

  • extraction (str) – the extraction method to use for signal intensity calculation

  • **kwargs – arbitrary keyword arguments

Returns:

the updated marker_counts matrix with data for the specified cell_id and compartment

Return type:

xarray.DataArray

ark.segmentation.marker_quantification.compute_marker_counts(input_images, segmentation_labels, nuclear_counts=False, regionprops_base=['label', 'area', 'eccentricity', 'major_axis_length', 'minor_axis_length', 'perimeter', 'centroid', 'convex_area', 'equivalent_diameter'], regionprops_single_comp=['major_minor_axis_ratio', 'perim_square_over_area', 'major_axis_equiv_diam_ratio', 'convex_hull_resid', 'centroid_dif', 'num_concavities'], regionprops_multi_comp=['nc_ratio'], split_large_nuclei=False, extraction='total_intensity', fast_extraction=False, **kwargs)[source]

Extract single cell protein expression data from channel TIFs for a single fov

Parameters:
  • input_images (xarray.DataArray) – rows x columns x channels matrix of imaging data

  • segmentation_labels (numpy.ndarray) – rows x columns x compartment matrix of masks

  • nuclear_counts (bool) – boolean flag to determine whether nuclear counts are returned

  • regionprops_base (list) – base morphology features directly computed by regionprops to extract for each cell

  • regionprops_single_comp (list) – list of single compartment extra properties derived from regionprops to compute

  • regionprops_multi_comp (list) – list of multi compartment extra properties derived from regionprops to compute

  • split_large_nuclei (bool) – controls whether nuclei which have portions outside of the cell will get relabeled

  • extraction (str) – extraction function used to compute marker counts

  • fast_extraction (bool) – if set, skips custom regionprops and expensive base regionprops extraction steps regardless of other params set

  • **kwargs – arbitrary keyword arguments for get_cell_props

Returns:

xarray containing segmented data of cells x markers

Return type:

xarray.DataArray

ark.segmentation.marker_quantification.create_marker_count_matrices(segmentation_labels, image_data, nuclear_counts=False, split_large_nuclei=False, extraction='total_intensity', fast_extraction=False, **kwargs)[source]

Create a matrix of cells by channels with the total counts of each marker in each cell.

Parameters:
  • segmentation_labels (xarray.DataArray) – xarray of shape [fovs, rows, cols, compartment] containing segmentation masks for one FOV, potentially across multiple cell compartments

  • image_data (xarray.DataArray) – xarray containing all of the channel data across one FOV

  • nuclear_counts (bool) – boolean flag to determine whether nuclear counts are returned, note that if set to True, the compartments coordinate in segmentation_labels must contain ‘nuclear’

  • split_large_nuclei (bool) – boolean flag to determine whether nuclei which are larger than their assigned cell will get split into two different nuclear objects

  • extraction (str) – extraction function used to compute marker counts.

  • fast_extraction (bool) – if set, skips the custom regionprops and expensive base regionprops extraction steps

  • **kwargs – arbitrary keyword args for compute_marker_counts

Returns:

  • marker counts per cell normalized by cell size

  • arcsinh transformation of the above

Return type:

tuple (pandas.DataFrame, pandas.DataFrame)

ark.segmentation.marker_quantification.generate_cell_table(segmentation_dir, tiff_dir, img_sub_folder='TIFs', is_mibitiff=False, fovs=None, extraction='total_intensity', nuclear_counts=False, fast_extraction=False, mask_types=['whole_cell'], **kwargs)[source]

This function takes the segmented data and computes the expression matrices batch-wise while also validating inputs

Parameters:
  • segmentation_dir (str) – the path to the directory containing the segmentation labels generated by Mesmer

  • tiff_dir (str) – the name of the directory which contains the single_channel_inputs

  • img_sub_folder (str) – the name of the folder where the TIF images are located ignored if is_mibitiff is True

  • fovs (list) – a list of fovs we wish to analyze, if None will default to all fovs

  • is_mibitiff (bool) – a flag to indicate whether or not the base images are MIBItiffs

  • extraction (str) – extraction function used to compute marker counts

  • nuclear_counts (bool) – boolean flag to determine whether nuclear counts are returned, note that if set to True, the compartments coordinate in segmentation_labels must contain ‘nuclear’

  • fast_extraction (bool) – if set, skips the custom regionprops and expensive base regionprops extraction steps

  • mask_types (list) – list of masks to extract data for, defaults to [‘whole_cell’]

  • **kwargs – arbitrary keyword arguments for signal and regionprops extraction

Returns:

  • size normalized data

  • arcsinh transformed data

Return type:

tuple (pandas.DataFrame, pandas.DataFrame)

ark.segmentation.marker_quantification.get_existing_mask_types(fov_names: List[str], mask_names: List[str]) List[str][source]

Function to strip prefixes from list: fov_names, strip ‘.tiff’ suffix from list: mask names, and remove underscore prefixes, returning unique mask values (i.e. categories of masks).

Parameters:
  • fov_names (List[str]) – list of fov names. Matching fov names in mask names will be returned without fov prefix.

  • mask_names (List[str]) – list of mask names. Mask names will be returned without tiff suffix.

Returns:

Unique mask names (i.e. categories of masks)

Return type:

List[str]

ark.segmentation.marker_quantification.get_single_compartment_props(segmentation_labels, regionprops_base, regionprops_single_comp, **kwargs)[source]

Gets regionprops features from the provided segmentation labels for a fov

Based on segmentation labels from a single compartment

Parameters:
  • segmentation_labels (numpy.ndarray) – rows x columns matrix of masks

  • regionprops_base (list) – base morphology features directly computed by regionprops to extract for each cell

  • regionprops_single_comp (list) – list of single compartment extra properties derived from regionprops to compute

  • **kwargs – Arbitrary keyword arguments for compute_extra_props

Returns:

Contains the regionprops info (base and derived) for each labeled cell

Return type:

pandas.DataFrame

ark.segmentation.regionprops_extraction

ark.segmentation.regionprops_extraction.centroid_dif(prop, **kwargs)[source]

Return the normalized euclidian distance between the centroid of the cell and the centroid of the corresponding convex hull

Parameters:
  • prop (skimage.measure.regionprops) – The property information for a cell returned by regionprops

  • **kwargs – Arbitrary keyword arguments

Returns:

The centroid shift for the cell

Return type:

float

ark.segmentation.regionprops_extraction.convex_hull_resid(prop, **kwargs)[source]

Return the ratio of the difference between convex area and area to convex area

Parameters:
  • prop (skimage.measure.regionprops) – The property information for a cell returned by regionprops

  • **kwargs – Arbitrary keyword arguments

Returns:

(convex area - area) / convex area

Return type:

float

ark.segmentation.regionprops_extraction.major_axis_equiv_diam_ratio(prop, **kwargs)[source]

Return the ratio of the major axis length to the equivalent diameter

Parameters:
  • prop (skimage.measure.regionprops) – The property information for a cell returned by regionprops

  • **kwargs – Arbitrary keyword arguments

Returns:

major axis length / equivalent diameter

Return type:

float

ark.segmentation.regionprops_extraction.major_minor_axis_ratio(prop, **kwargs)[source]

Return the ratio of the major axis length to the minor axis length

Parameters:
  • prop (skimage.measure.regionprops) – The property information for a cell returned by regionprops

  • **kwargs – Arbitrary keyword arguments

Returns:

major axis length / minor axis length

Return type:

float

ark.segmentation.regionprops_extraction.nc_ratio(marker_counts, **kwargs)[source]

Return the ratio of the nuclear area to total cell area

Parameters:
  • marker_counts (xarray.DataArray) – xarray containing segmentaed data of cells x markers

  • **kwargs – Arbitrary keyword arguments

ark.segmentation.regionprops_extraction.num_concavities(prop, **kwargs)[source]

Return the number of concavities for a cell

Parameters:
  • prop (skimage.measure.regionprops) – The property information for a cell returned by regionprops

  • **kwargs – Arbitrary keyword arguments

Returns:

The number of concavities for a cell

Return type:

int

ark.segmentation.regionprops_extraction.perim_square_over_area(prop, **kwargs)[source]

Return the ratio of the squared perimeter to the cell area

Parameters:
  • prop (skimage.measure.regionprops) – The property information for a cell returned by regionprops

  • **kwargs – Arbitrary keyword arguments

Returns:

perimeter^2 / area

Return type:

float

ark.segmentation.segmentation_utils

ark.segmentation.segmentation_utils.concatenate_csv(base_dir, csv_files, column_name='fov', column_values=None)[source]

Take a list of CSV paths and concatenates them together, adding in the identifier in column_values

Saves combined CSV file into the same folder

Parameters:
  • base_dir (str) – directory to read and write csv_files into

  • csv_files (list) – a list csv files

  • column_name (str) – optional column name, defaults to fov

  • column_values (list) – optional values to use for each CSV, defaults to csv name

ark.segmentation.segmentation_utils.find_nuclear_label_id(nuc_segmentation_labels, cell_coords)[source]

Get the ID of the nuclear mask which has the greatest amount of overlap with a given cell

Parameters:
  • nuc_segmentation_labels (numpy.ndarray) – predicted nuclear segmentations

  • cell_coords (list) – list of coords specifying pixels that belong to a cell

Returns:

Integer ID of the nuclear mask that overlaps most with cell. If no matches found, returns None.

Return type:

int or None

ark.segmentation.segmentation_utils.save_segmentation_labels(segmentation_dir, data_dir, output_dir, fovs, channels=None)[source]

For each fov, generates segmentation borders and overlays over the channels if specified.

Saves overlay images to output directory.

Parameters:
  • segmentation_dir (str) – Path to the directory containing segmentation labels

  • data_dir (str) – Path to the directory containing the image data

  • output_dir (str) – path to directory where the output will be saved

  • fovs (list) – list of FOVs to include

  • channels (list) – list of channels to include

ark.segmentation.segmentation_utils.split_large_nuclei(cell_segmentation_labels, nuc_segmentation_labels, cell_ids, min_size=15)[source]

Splits nuclei that are bigger than the corresponding cell into multiple pieces

Parameters:
  • cell_segmentation_labels (numpy.ndarray) – predicted cell segmentations

  • nuc_segmentation_labels (numpy.ndarray) – predicted nuclear segmentations

  • cell_ids (numpy.ndarray) – the unique cells in the segmentation mask

  • min_size (int) – number of pixels of nucleus that must be outside of cell in order to be classified a new object. Nuclei with fewer than this many extra pixels will not be relabeled

Returns:

modified nuclear segmentation mask

Return type:

numpy.ndarray

ark.segmentation.segmentation_utils.transform_expression_matrix(cell_table, transform, transform_kwargs=None)[source]

Transform an xarray of marker counts with supplied transformation

Parameters:
  • cell_table (xarray.DataArray) – xarray containing marker expression values

  • transform (str) – the type of transform to apply. Must be one of [‘size_norm’, ‘arcsinh’]

  • transform_kwargs (dict) – optional dictionary with additional settings for the transforms

Returns:

xarray of counts per marker normalized by cell size

Return type:

xarray.DataArray

ark.segmentation.signal_extraction

ark.segmentation.signal_extraction.center_weighting_extraction(cell_coords, image_data, **kwargs)[source]

Extract channel counts by summing over weighted expression values based on distance from center.

Parameters:
  • cell_coords (numpy.ndarray) – values representing pixels within one cell

  • image_data (xarray.DataArray) – array containing channel counts

  • **kwargs – arbitrary keyword arguments

Returns:

Sums of counts for each channel

Return type:

numpy.ndarray

ark.segmentation.signal_extraction.positive_pixels_extraction(cell_coords, image_data, **kwargs)[source]

Extract channel counts by summing over the number of non-zero pixels in the cell.

Parameters:
  • cell_coords (numpy.ndarray) – values representing pixels within one cell

  • image_data (xarray.DataArray) – array containing channel counts

  • **kwargs – arbitrary keyword arguments

Returns:

Sums of counts for each channel

Return type:

numpy.ndarray

ark.segmentation.signal_extraction.total_intensity_extraction(cell_coords, image_data, **kwargs)[source]

Extract channel counts for an individual cell via basic summation for each channel

Parameters:
  • cell_coords (numpy.ndarray) – values representing pixels within one cell

  • image_data (xarray.DataArray) – array containing channel counts

  • **kwargs – arbitrary keyword arguments

Returns:

Sum of counts for each channel

Return type:

numpy.ndarray