ark.segmentation¶
Subpackages¶
ark.segmentation.fiber_segmentation¶
- ark.segmentation.fiber_segmentation.calculate_density(fov_fiber_table, total_pixels)[source]¶
Calculates both pixel area and fiber number based densities. pixel based = fiber pixel area / total image area fiber number based = number of fibers / total image area
- ark.segmentation.fiber_segmentation.calculate_fiber_alignment(fiber_object_table, k=4, axis_thresh=2)[source]¶
Calculates an alignment score for each fiber in an image. Based on the angle difference of the fiber compared to it’s k nearest neighbors.
- Parameters:
fiber_object_table (pd.DataFrame) –
- dataframe containing the fiber objects and their properties (fov, label, alignment,
centroid-0, centroid-1, major_axis_length, minor_axis_length)
k (int) – number of neighbors to check alignment difference for
axis_thresh (int) – threshold for how much longer the length of the fiber must be compared to the width
- Returns:
Dataframe with the alignment scores appended
- Return type:
pd.DataFrame
- ark.segmentation.fiber_segmentation.generate_summary_stats(fiber_object_table, fibseg_dir, tile_length=512, min_fiber_num=5, save_tiles=False)[source]¶
Calculates the fov level and tile level statistics for alignment, length, and density. Saves them to separate csvs.
- Parameters:
fiber_object_table (pd.DataFrame) – dataframe containing the fiber objects and their properties (fov, label, alignment, centroid-0, centroid-1, major_axis_length, minor_axis_length)
fibseg_dir (string) – path to directory containing the fiber segmentation masks
tile_length (int) – length of tile size, must be a factor of the total image size (default 512)
min_fiber_num (int) – the amount of fibers to get tile statistics calculated, if not then NaN (default 5)
save_tiles (bool) – whether to save cropped images (default to False)
- Returns:
returns the both fov and tile stats
- Return type:
tuple (pd.DataFrame, pd.DataFrame)
- ark.segmentation.fiber_segmentation.generate_tile_stats(fov_table, fov_fiber_img, fov_length, tile_length, min_fiber_num, save_dir, save_tiles)[source]¶
Calculates the tile level statistics for alignment, length, and density.
- Parameters:
fov_table (pd.DataFrame) – dataframe containing the fiber objects and their properties (fov, label, alignment, centroid-0, centroid-1, major_axis_length, minor_axis_length)
fov_fiber_img (np.array) – represents the fiber mask
fov_length (int) – length of the image
tile_length (int) – length of tile size, must be a factor of the total image size (default 512)
min_fiber_num (int) – the amount of fibers to get tile statistics calculated, if not then NaN (default 5)
save_dir (str) – directory where to save tiled image folder to
save_tiles (bool) – whether to save cropped images (default to False)
- Returns:
a dataframe specifying each tile in the image and its calculated stats
- Return type:
pd.DataFrame
- ark.segmentation.fiber_segmentation.plot_fiber_segmentation_steps(data_dir, fov_name, fiber_channel, img_sub_folder=None, blur=2, contrast_scaling_divisor=128, fiber_widths=range(1, 10, 2), ridge_cutoff=0.1, sobel_blur=1, min_fiber_size=15, img_cmap=matplotlib.pyplot.cm.bone, labels_cmap=matplotlib.pyplot.cm.cool)[source]¶
Plots output from each fiber segmentation step for single FoV
- Parameters:
data_dir (str | PathLike) – Folder containing dataset
fov_name (str) – Name of test FoV
fiber_channel (str) – Channel for fiber segmentation, e.g collagen
img_sub_folder (str | NoneType) – Whether to expect image subfolder in
data_dir
. If no subfolder, set to None.blur (float) – Preprocessing gaussian blur radius
contrast_scaling_divisor (int) – Roughly speaking, the average side length of a fibers bounding box. This argument controls the local contrast enhancement operation, which helps differentiate dim fibers from only slightly more dim backgrounds. This should always be a power of two.
fiber_widths (Iterable) – Widths of fibers to filter for. Be aware that adding larger fiber widths can join close, narrow branches into one thicker fiber.
ridge_cutoff (float) – Threshold for ridge inclusion post-frangi filtering.
sobel_blur (float) – Gaussian blur radius for sobel driven elevation map creation
min_fiber_size (int) – Minimum area of fiber object
img_cmap (matplotlib.cm.Colormap) – Matplotlib colormap to use for (non-labeled) images
labels_cmap (matplotlib.cm.Colormap) – Base matplotlib colormap to use for labeled images. This will only be applied to the non-zero labels, with the zero-region being colored black.
- ark.segmentation.fiber_segmentation.run_fiber_segmentation(data_dir, fiber_channel, out_dir, img_sub_folder=None, csv_compression: Optional[Dict[str, str]] = None, **kwargs)[source]¶
Segments fibers one FOV at a time
- Parameters:
data_dir (str | PathLike) – Folder containing dataset
fiber_channel (str) – Channel for fiber segmentation, e.g collagen.
out_dir (str | PathLike) – Directory to save fiber object labels and table.
img_sub_folder (str | NoneType) – Image subfolder name in
data_dir
. If there is not subfolder, set this to None.csv_compression (Optional[Dict[str, str]]) – Dictionary of compression arguments to pass when saving csvs. See
to_csv
for details.**kwargs – Keyword arguments for
segment_fibers
- Returns:
Dataframe containing the fiber objects and their properties
- Return type:
pd.DataFrame
- ark.segmentation.fiber_segmentation.segment_fibers(data_xr, fiber_channel, out_dir, fov, blur=2, contrast_scaling_divisor=128, fiber_widths=range(1, 10, 2), ridge_cutoff=0.1, sobel_blur=1, min_fiber_size=15, object_properties=('label', 'centroid', 'major_axis_length', 'minor_axis_length', 'orientation', 'area', 'eccentricity', 'euler_number'), save_csv=True, debug=False)[source]¶
Segments fiber objects from image data
- Parameters:
data_xr (xr.DataArray) – Multiplexed image data in (fov, x, y, channel) format
fiber_channel (str) – Channel for fiber segmentation, e.g collagen.
out_dir (str | PathLike) – Directory to save fiber object labels and table.
fov (str) – name of the fov being processed
blur (float) – Preprocessing gaussian blur radius
contrast_scaling_divisor (int) – Roughly speaking, the average side length of a fibers bounding box. This argument controls the local contrast enhancement operation, which helps differentiate dim fibers from only slightly more dim backgrounds. This should always be a power of two.
fiber_widths (Iterable) – Widths of fibers to filter for. Be aware that adding larger fiber widths can join close, narrow branches into one thicker fiber.
ridge_cutoff (float) – Threshold for ridge inclusion post-frangi filtering.
sobel_blur (float) – Gaussian blur radius for sobel driven elevation map creation
min_fiber_size (int) – Minimum area of fiber object
object_properties (Iterable[str]) –
- Properties to compute, any keyword for region props may be used. Defaults are:
major_axis_length
minor_axis_length
orientation
centroid
label
eccentricity
euler_number
save_csv (bool) – Whether or not to save csv of fiber objects
debug (bool) – Save intermediate preprocessing steps
- Returns:
Dataframe containing the fiber objects and their properties
- Return type:
pd.DataFrame
ark.segmentation.marker_quantification¶
- ark.segmentation.marker_quantification.assign_multi_compartment_features(marker_counts, regionprops_multi_comp, **kwargs)[source]¶
Assigns features to marker_counts that depend on multiple compartments
- Parameters:
marker_counts (xarray.DataArray) – xarray containing segmentaed data of cells x markers
regionprops_multi_comp (list) – list of multi-compartment properties derived from regionprops to compute, each value should correspond to a value in REGIONPROPS_FUNCTION
**kwargs – arbitrary keyword arguments
- Returns:
the updated marker_counts matrix with data for the specified cell_id and compartment
- Return type:
- ark.segmentation.marker_quantification.assign_single_compartment_features(marker_counts, compartment, cell_props, cell_coords, cell_id, label_id, input_images, regionprops_names, extraction, **kwargs)[source]¶
Assign computed regionprops features and signal intensity to cell_id in marker_counts
- Parameters:
marker_counts (xarray.DataArray) – xarray containing segmentaed data of cells x markers
compartment (str) – either ‘whole_cell’ or ‘nuclear’
cell_props (pandas.DataFrame) – regionprops information for each cell
cell_coords (numpy.ndarray) – values representing pixels within one cell
cell_id (int) – id of the cell
label_id (int) – id used to index into cell_props
input_images (xarray.DataArray) – rows x columns x channels matrix of imaging data
regionprops_names (list) – all of the regionprops features (including derived, except nuclear-specific)
extraction (str) – the extraction method to use for signal intensity calculation
**kwargs – arbitrary keyword arguments
- Returns:
the updated marker_counts matrix with data for the specified cell_id and compartment
- Return type:
- ark.segmentation.marker_quantification.compute_marker_counts(input_images, segmentation_labels, nuclear_counts=False, regionprops_base=['label', 'area', 'eccentricity', 'major_axis_length', 'minor_axis_length', 'perimeter', 'centroid', 'convex_area', 'equivalent_diameter'], regionprops_single_comp=['major_minor_axis_ratio', 'perim_square_over_area', 'major_axis_equiv_diam_ratio', 'convex_hull_resid', 'centroid_dif', 'num_concavities'], regionprops_multi_comp=['nc_ratio'], split_large_nuclei=False, extraction='total_intensity', fast_extraction=False, **kwargs)[source]¶
Extract single cell protein expression data from channel TIFs for a single fov
- Parameters:
input_images (xarray.DataArray) – rows x columns x channels matrix of imaging data
segmentation_labels (numpy.ndarray) – rows x columns x compartment matrix of masks
nuclear_counts (bool) – boolean flag to determine whether nuclear counts are returned
regionprops_base (list) – base morphology features directly computed by regionprops to extract for each cell
regionprops_single_comp (list) – list of single compartment extra properties derived from regionprops to compute
regionprops_multi_comp (list) – list of multi compartment extra properties derived from regionprops to compute
split_large_nuclei (bool) – controls whether nuclei which have portions outside of the cell will get relabeled
extraction (str) – extraction function used to compute marker counts
fast_extraction (bool) – if set, skips custom regionprops and expensive base regionprops extraction steps regardless of other params set
**kwargs – arbitrary keyword arguments for get_cell_props
- Returns:
xarray containing segmented data of cells x markers
- Return type:
- ark.segmentation.marker_quantification.create_marker_count_matrices(segmentation_labels, image_data, nuclear_counts=False, split_large_nuclei=False, extraction='total_intensity', fast_extraction=False, **kwargs)[source]¶
Create a matrix of cells by channels with the total counts of each marker in each cell.
- Parameters:
segmentation_labels (xarray.DataArray) – xarray of shape [fovs, rows, cols, compartment] containing segmentation masks for one FOV, potentially across multiple cell compartments
image_data (xarray.DataArray) – xarray containing all of the channel data across one FOV
nuclear_counts (bool) – boolean flag to determine whether nuclear counts are returned, note that if set to True, the compartments coordinate in segmentation_labels must contain ‘nuclear’
split_large_nuclei (bool) – boolean flag to determine whether nuclei which are larger than their assigned cell will get split into two different nuclear objects
extraction (str) – extraction function used to compute marker counts.
fast_extraction (bool) – if set, skips the custom regionprops and expensive base regionprops extraction steps
**kwargs – arbitrary keyword args for compute_marker_counts
- Returns:
marker counts per cell normalized by cell size
arcsinh transformation of the above
- Return type:
- ark.segmentation.marker_quantification.generate_cell_table(segmentation_dir, tiff_dir, img_sub_folder='TIFs', is_mibitiff=False, fovs=None, extraction='total_intensity', nuclear_counts=False, fast_extraction=False, mask_types=['whole_cell'], **kwargs)[source]¶
This function takes the segmented data and computes the expression matrices batch-wise while also validating inputs
- Parameters:
segmentation_dir (str) – the path to the directory containing the segmentation labels generated by Mesmer
tiff_dir (str) – the name of the directory which contains the single_channel_inputs
img_sub_folder (str) – the name of the folder where the TIF images are located ignored if is_mibitiff is True
fovs (list) – a list of fovs we wish to analyze, if None will default to all fovs
is_mibitiff (bool) – a flag to indicate whether or not the base images are MIBItiffs
extraction (str) – extraction function used to compute marker counts
nuclear_counts (bool) – boolean flag to determine whether nuclear counts are returned, note that if set to True, the compartments coordinate in segmentation_labels must contain ‘nuclear’
fast_extraction (bool) – if set, skips the custom regionprops and expensive base regionprops extraction steps
mask_types (list) – list of masks to extract data for, defaults to [‘whole_cell’]
**kwargs – arbitrary keyword arguments for signal and regionprops extraction
- Returns:
size normalized data
arcsinh transformed data
- Return type:
- ark.segmentation.marker_quantification.get_existing_mask_types(fov_names: List[str], mask_names: List[str]) List[str] [source]¶
Function to strip prefixes from list: fov_names, strip ‘.tiff’ suffix from list: mask names, and remove underscore prefixes, returning unique mask values (i.e. categories of masks).
- ark.segmentation.marker_quantification.get_single_compartment_props(segmentation_labels, regionprops_base, regionprops_single_comp, **kwargs)[source]¶
Gets regionprops features from the provided segmentation labels for a fov
Based on segmentation labels from a single compartment
- Parameters:
segmentation_labels (numpy.ndarray) – rows x columns matrix of masks
regionprops_base (list) – base morphology features directly computed by regionprops to extract for each cell
regionprops_single_comp (list) – list of single compartment extra properties derived from regionprops to compute
**kwargs – Arbitrary keyword arguments for compute_extra_props
- Returns:
Contains the regionprops info (base and derived) for each labeled cell
- Return type:
ark.segmentation.regionprops_extraction¶
- ark.segmentation.regionprops_extraction.centroid_dif(prop, **kwargs)[source]¶
Return the normalized euclidian distance between the centroid of the cell and the centroid of the corresponding convex hull
- Parameters:
prop (skimage.measure.regionprops) – The property information for a cell returned by regionprops
**kwargs – Arbitrary keyword arguments
- Returns:
The centroid shift for the cell
- Return type:
- ark.segmentation.regionprops_extraction.convex_hull_resid(prop, **kwargs)[source]¶
Return the ratio of the difference between convex area and area to convex area
- Parameters:
prop (skimage.measure.regionprops) – The property information for a cell returned by regionprops
**kwargs – Arbitrary keyword arguments
- Returns:
(convex area - area) / convex area
- Return type:
- ark.segmentation.regionprops_extraction.major_axis_equiv_diam_ratio(prop, **kwargs)[source]¶
Return the ratio of the major axis length to the equivalent diameter
- Parameters:
prop (skimage.measure.regionprops) – The property information for a cell returned by regionprops
**kwargs – Arbitrary keyword arguments
- Returns:
major axis length / equivalent diameter
- Return type:
- ark.segmentation.regionprops_extraction.major_minor_axis_ratio(prop, **kwargs)[source]¶
Return the ratio of the major axis length to the minor axis length
- Parameters:
prop (skimage.measure.regionprops) – The property information for a cell returned by regionprops
**kwargs – Arbitrary keyword arguments
- Returns:
major axis length / minor axis length
- Return type:
- ark.segmentation.regionprops_extraction.nc_ratio(marker_counts, **kwargs)[source]¶
Return the ratio of the nuclear area to total cell area
- Parameters:
marker_counts (xarray.DataArray) – xarray containing segmentaed data of cells x markers
**kwargs – Arbitrary keyword arguments
- ark.segmentation.regionprops_extraction.num_concavities(prop, **kwargs)[source]¶
Return the number of concavities for a cell
- Parameters:
prop (skimage.measure.regionprops) – The property information for a cell returned by regionprops
**kwargs – Arbitrary keyword arguments
- Returns:
The number of concavities for a cell
- Return type:
- ark.segmentation.regionprops_extraction.perim_square_over_area(prop, **kwargs)[source]¶
Return the ratio of the squared perimeter to the cell area
- Parameters:
prop (skimage.measure.regionprops) – The property information for a cell returned by regionprops
**kwargs – Arbitrary keyword arguments
- Returns:
perimeter^2 / area
- Return type:
ark.segmentation.segmentation_utils¶
- ark.segmentation.segmentation_utils.concatenate_csv(base_dir, csv_files, column_name='fov', column_values=None)[source]¶
Take a list of CSV paths and concatenates them together, adding in the identifier in column_values
Saves combined CSV file into the same folder
- ark.segmentation.segmentation_utils.find_nuclear_label_id(nuc_segmentation_labels, cell_coords)[source]¶
Get the ID of the nuclear mask which has the greatest amount of overlap with a given cell
- Parameters:
nuc_segmentation_labels (numpy.ndarray) – predicted nuclear segmentations
cell_coords (list) – list of coords specifying pixels that belong to a cell
- Returns:
Integer ID of the nuclear mask that overlaps most with cell. If no matches found, returns None.
- Return type:
int or None
- ark.segmentation.segmentation_utils.save_segmentation_labels(segmentation_dir, data_dir, output_dir, fovs, channels=None)[source]¶
For each fov, generates segmentation borders and overlays over the channels if specified.
Saves overlay images to output directory.
- Parameters:
- ark.segmentation.segmentation_utils.split_large_nuclei(cell_segmentation_labels, nuc_segmentation_labels, cell_ids, min_size=15)[source]¶
Splits nuclei that are bigger than the corresponding cell into multiple pieces
- Parameters:
cell_segmentation_labels (numpy.ndarray) – predicted cell segmentations
nuc_segmentation_labels (numpy.ndarray) – predicted nuclear segmentations
cell_ids (numpy.ndarray) – the unique cells in the segmentation mask
min_size (int) – number of pixels of nucleus that must be outside of cell in order to be classified a new object. Nuclei with fewer than this many extra pixels will not be relabeled
- Returns:
modified nuclear segmentation mask
- Return type:
- ark.segmentation.segmentation_utils.transform_expression_matrix(cell_table, transform, transform_kwargs=None)[source]¶
Transform an xarray of marker counts with supplied transformation
- Parameters:
cell_table (xarray.DataArray) – xarray containing marker expression values
transform (str) – the type of transform to apply. Must be one of [‘size_norm’, ‘arcsinh’]
transform_kwargs (dict) – optional dictionary with additional settings for the transforms
- Returns:
xarray of counts per marker normalized by cell size
- Return type:
ark.segmentation.signal_extraction¶
- ark.segmentation.signal_extraction.center_weighting_extraction(cell_coords, image_data, **kwargs)[source]¶
Extract channel counts by summing over weighted expression values based on distance from center.
- Parameters:
cell_coords (numpy.ndarray) – values representing pixels within one cell
image_data (xarray.DataArray) – array containing channel counts
**kwargs – arbitrary keyword arguments
- Returns:
Sums of counts for each channel
- Return type:
- ark.segmentation.signal_extraction.positive_pixels_extraction(cell_coords, image_data, **kwargs)[source]¶
Extract channel counts by summing over the number of non-zero pixels in the cell.
- Parameters:
cell_coords (numpy.ndarray) – values representing pixels within one cell
image_data (xarray.DataArray) – array containing channel counts
**kwargs – arbitrary keyword arguments
- Returns:
Sums of counts for each channel
- Return type:
- ark.segmentation.signal_extraction.total_intensity_extraction(cell_coords, image_data, **kwargs)[source]¶
Extract channel counts for an individual cell via basic summation for each channel
- Parameters:
cell_coords (numpy.ndarray) – values representing pixels within one cell
image_data (xarray.DataArray) – array containing channel counts
**kwargs – arbitrary keyword arguments
- Returns:
Sum of counts for each channel
- Return type: