Tiler

Different logics are implemented for tile extraction in the tiler module. The constructor of the three extractors RandomTiler, GridTiler, and ScoreTiler share a similar interface and common parameters that define the extraction design:

  1. tile_size: the tile size;

  2. level: the extraction level, from 0 to the number of available levels; negative indexing is also possible, counting backward from the number of available levels to 0 (e.g. level =-1 means selecting the last available level);

  3. check_tissue: True if a minimum percentage of tissue over the total area of the tile is required to save the tiles, False otherwise;

  4. tissue_percent: number between 0.0 and 100.0 representing the minimum required ratio of tissue over the total area of the image, considered only if check_tissue equals to True (default is 80.0);

  5. prefix: a prefix to be added at the beginning of the tiles’ filename (optional, default is the empty string);

  6. suffix: a suffix to be added to the end of the tiles’ filename (optional, default is .png).

The general mechanism is to (i) create a tiler object, (ii) define a Slide object, used to identify the input image, and (iii) create a mask object to determine the area for tile extraction within the tissue. The extraction process starts when the tiler’s extract() method is called, with the slide and the mask passed as parameters.

RandomTiler

The RandomTiler extractor allows for the extraction of tiles picked at random within the regions defined by the binary mask object. Since there is no intrinsic upper bound of the number of the tiles that could be extracted (no overlap check is performed), the number of wanted tiles must be specified.

In addition to 1-6, the RandomTiler constructor requires as two additional parameters the number of tiles requested (n_tiles), and the random seed (seed), to ensure reproducibility between different runs on the same WSI. Note that less than n_tiles could be extracted from a slide with not enough tissue pixels and a lot of background, which is checked when the parameter check_tissue is set to True. n_tiles will be interpreted as the upper bound of the number of tiles requested: it might not be possible to extract n_tiles tiles from a slide with a little tissue sample and a lot of background.

The extraction procedure will (i) find the regions to extract tiles from, defined by the binary mask object; (ii) generate n_tiles random tiles; (iii) save only the tiles with enough tissue if the attribute check_tissue was set to True, save all the generated tiles otherwise.

GridTiler

A second basic approach consists of extracting all the tiles in the areas defined by the binary mask. This strategy is implemented in the GridTiler class. The additional pixel_overlap parameter specifies the number of overlapping pixels between two adjacent tiles, i.e. tiles are cropped by using a sliding window with stride s defined as:

\[s=(w - \mathrm{pixel\_overlap}) \cdot (h - \mathrm{pixel\_overlap})\]

where w and h are customizable parameters defining the width and the height of the resulting tiles. Calling the extract method on the GridTiler instance will automatically (i) find the regions to extract tiles from, defined by the binary mask object; (ii) generate all the tiles according to the grid structure; (iii) save only the tiles with “enough tissue” if the attribute check_tissue was set to True, save all the generated tiles otherwise.

ScoreTiler

Tiles extracted from the same WSI may not be equally informative; for example, if the goal is the detection of mitotic activity on H&E slides, tiles with no nuclei are of little interest. The ScoreTiler extractor ranks the tiles with respect to a scoring function, described in the scorer module. In particular, the ScoreTiler class extends the GridTiler extractor by sorting the extracted tiles in a decreasing order, based on the computed score. Notably, the ScoreTiler is agnostic to the scoring function adopted, thus a custom function can be implemented provided that it inputs a Tile object and outputs a number. The additional parameter n_tiles controls the number of highest-ranked tiles to save; if n_tiles =0 all the tiles are kept. Similarly to the GridTiler extraction process, calling the extract method on the ScoreTiler instance will automatically (i) find the largest tissue area in the WSI; (ii) generate all the tiles according to the grid structure; (iii) retain all the tiles with enough tissue if the attribute check_tissue was set to True, all the generated tiles otherwise; (iv) sort the tiles in a decreasing order according to the scoring function defined in the scorer parameter; (v) save only the highest-ranked n_tiles tiles, if n_tiles>0; (vi) write a summary of the saved tiles and their scores in a CSV file, if the report_path is specified in the extract method. The summary reports for each tile t: (i) the tile filename; (ii) its raw score \(s_t\); (iii) the normalized score, scaled in the interval [0,1], computed as:

\[\hat{s}_t = \frac{s_t-\displaystyle{\min_{s\in S}}(s)}{\displaystyle{\max_{s\in S}}(s)-\displaystyle{\min_{s\in S}}(s)}\ ,\]

where S is the set of the raw scores of all the extracted tiles.

class GridTiler(*args, **kwds)[source]

Extractor of tiles arranged in a grid, at the given level, with the given size.

Parameters
  • tile_size (Tuple[int, int]) – (width, height) of the extracted tiles.

  • level (int, optional) – Level from which extract the tiles. Default is 0. Superceded by mpp if the mpp argument is provided.

  • check_tissue (bool, optional) – Whether to check if the tile has enough tissue to be saved. Default is True.

  • tissue_percent (float, optional) – Number between 0.0 and 100.0 representing the minimum required percentage of tissue over the total area of the image, default is 80.0. This is considered only if check_tissue equals to True.

  • pixel_overlap (int, optional) – Number of overlapping pixels (for both height and width) between two adjacent tiles. If negative, two adjacent tiles will be strided by the absolute value of pixel_overlap. Default is 0.

  • prefix (str, optional) – Prefix to be added to the tile filename. Default is an empty string.

  • suffix (str, optional) – Suffix to be added to the tile filename. Default is ‘.png’

  • mpp (float, optional) – Micron per pixel resolution of extracted tiles. Takes precedence over level. Default is None.

extract(slide, extraction_mask=<histolab.masks.BiggestTissueBoxMask object>, log_level='INFO')[source]

Extract tiles arranged in a grid and save them to disk, following this filename pattern: {prefix}tile_{tiles_counter}_level{level}_{x_ul_wsi}-{y_ul_wsi}-{x_br_wsi}-{y_br_wsi}{suffix}

Parameters
  • slide (Slide) – Slide from which to extract the tiles

  • extraction_mask (BinaryMask, optional) – BinaryMask object defining how to compute a binary mask from a Slide. Default BiggestTissueBoxMask.

  • log_level (str, {"DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"}) – Threshold level for the log messages. Default “INFO”

Raises
  • TileSizeError – If the tile size is larger than the slide size

  • LevelError – If the level is not available for the slide

Return type

None

locate_tiles(slide, extraction_mask=<histolab.masks.BiggestTissueBoxMask object>, scale_factor=32, alpha=128, outline='red', linewidth=1, tiles=None)

Draw tile box references on a rescaled version of the slide

Parameters
  • slide (Slide) – Slide reference where placing the tiles

  • extraction_mask (BinaryMask, optional) – BinaryMask object defining how to compute a binary mask from a Slide. Default BiggestTissueBoxMask

  • scale_factor (int, optional) – Scaling factor for the returned image. Default is 32.

  • alpha (int, optional) – The alpha level to be applied to the rescaled slide. Default is 128.

  • outline (Union[str, Iterable[str], Iterable[Tuple[int]]], optional) – The outline color for the tile annotations. Default is ‘red’. You can provide this as a string compatible with matplotlib, or you can provide a list of the same length as the tiles, where each color is your assigned color for the corresponding individual tile. This list can be a list of matplotlib-style string colors, or a list of tuples of ints in the [0, 255] range, each of length 3, representing the red, green and blue color for each tile. For example, if you have two tiles that you want to be colored yellow, you can pass this argument as any of the following .. - ‘yellow’ - [‘yellow’, ‘yellow’] - [(255, 255, 0), (255, 255, 0)]

  • linewidth (int, optional) – Thickness of line used to draw tiles. Default is 1.

  • tiles (Optional[Iterable[Tile]], optional) – Tiles to visualize. Will be extracted if None. Default is None. You may decide to provide this argument if you do not want the tiles to be re-extracted for visualization if you already have the tiles in hand.

Returns

PIL Image of the rescaled slide with the extracted tiles outlined

Return type

PIL.Image.Image

property tile_size: Tuple[int, int]

(width, height) of the extracted tiles.

class RandomTiler(*args, **kwds)[source]

Extractor of random tiles from a Slide, at the given level, with the given size.

Parameters
  • tile_size (Tuple[int, int]) – (width, height) of the extracted tiles.

  • n_tiles (int) – Maximum number of tiles to extract.

  • level (int, optional) – Level from which extract the tiles. Default is 0. Superceded by mpp if the mpp argument is provided.

  • seed (int, optional) – Seed for RandomState. Must be convertible to 32 bit unsigned integers. Default is 7.

  • check_tissue (bool, optional) – Whether to check if the tile has enough tissue to be saved. Default is True.

  • tissue_percent (float, optional) – Number between 0.0 and 100.0 representing the minimum required percentage of tissue over the total area of the image, default is 80.0. This is considered only if check_tissue equals to True.

  • prefix (str, optional) – Prefix to be added to the tile filename. Default is an empty string.

  • suffix (str, optional) – Suffix to be added to the tile filename. Default is ‘.png’

  • max_iter (int, optional) – Maximum number of iterations performed when searching for eligible (if check_tissue=True) tiles. Must be grater than or equal to n_tiles.

  • mpp (float, optional) – Micron per pixel resolution. If provided, takes precedence over level. Default is None.

extract(slide, extraction_mask=<histolab.masks.BiggestTissueBoxMask object>, log_level='INFO')[source]

Extract random tiles and save them to disk, following this filename pattern: {prefix}tile_{tiles_counter}_level{level}_{x_ul_wsi}-{y_ul_wsi}-{x_br_wsi}-{y_br_wsi}{suffix}

Parameters
  • slide (Slide) – Slide from which to extract the tiles

  • extraction_mask (BinaryMask, optional) – BinaryMask object defining how to compute a binary mask from a Slide. Default BiggestTissueBoxMask.

  • log_level (str, {"DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"}) – Threshold level for the log messages. Default “INFO”

Raises
  • TileSizeError – If the tile size is larger than the slide size

  • LevelError – If the level is not available for the slide

Return type

None

locate_tiles(slide, extraction_mask=<histolab.masks.BiggestTissueBoxMask object>, scale_factor=32, alpha=128, outline='red', linewidth=1, tiles=None)

Draw tile box references on a rescaled version of the slide

Parameters
  • slide (Slide) – Slide reference where placing the tiles

  • extraction_mask (BinaryMask, optional) – BinaryMask object defining how to compute a binary mask from a Slide. Default BiggestTissueBoxMask

  • scale_factor (int, optional) – Scaling factor for the returned image. Default is 32.

  • alpha (int, optional) – The alpha level to be applied to the rescaled slide. Default is 128.

  • outline (Union[str, Iterable[str], Iterable[Tuple[int]]], optional) – The outline color for the tile annotations. Default is ‘red’. You can provide this as a string compatible with matplotlib, or you can provide a list of the same length as the tiles, where each color is your assigned color for the corresponding individual tile. This list can be a list of matplotlib-style string colors, or a list of tuples of ints in the [0, 255] range, each of length 3, representing the red, green and blue color for each tile. For example, if you have two tiles that you want to be colored yellow, you can pass this argument as any of the following .. - ‘yellow’ - [‘yellow’, ‘yellow’] - [(255, 255, 0), (255, 255, 0)]

  • linewidth (int, optional) – Thickness of line used to draw tiles. Default is 1.

  • tiles (Optional[Iterable[Tile]], optional) – Tiles to visualize. Will be extracted if None. Default is None. You may decide to provide this argument if you do not want the tiles to be re-extracted for visualization if you already have the tiles in hand.

Returns

PIL Image of the rescaled slide with the extracted tiles outlined

Return type

PIL.Image.Image

class ScoreTiler(*args, **kwds)[source]

Extractor of tiles arranged in a grid according to a scoring function.

The extraction procedure is the same as the GridTiler extractor, but only the first n_tiles tiles with the highest score are saved.

Parameters
  • scorer (Scorer) – Scoring function used to score the tiles.

  • tile_size (Tuple[int, int]) – (width, height) of the extracted tiles.

  • n_tiles (int, optional) – The number of tiles to be saved. Default is 0, which means that all the tiles will be saved (same exact behaviour of a GridTiler). Cannot be negative.

  • level (int, optional) – Level from which extract the tiles. Default is 0. Superceded by mpp if the mpp argument is provided.

  • check_tissue (bool, optional) – Whether to check if the tile has enough tissue to be saved. Default is True.

  • tissue_percent (float, optional) – Number between 0.0 and 100.0 representing the minimum required percentage of tissue over the total area of the image, default is 80.0. This is considered only if check_tissue equals to True.

  • pixel_overlap (int, optional) – Number of overlapping pixels (for both height and width) between two adjacent tiles. If negative, two adjacent tiles will be strided by the absolute value of pixel_overlap. Default is 0.

  • prefix (str, optional) – Prefix to be added to the tile filename. Default is an empty string.

  • suffix (str, optional) – Suffix to be added to the tile filename. Default is ‘.png’

  • mpp (float, optional.) – Micron per pixel resolution. If provided, takes precedence over level. Default is None.

extract(slide, extraction_mask=<histolab.masks.BiggestTissueBoxMask object>, report_path=None, log_level='INFO')[source]

Extract grid tiles and save them to disk, according to a scoring function and following this filename pattern: {prefix}tile_{tiles_counter}_level{level}_{x_ul_wsi}-{y_ul_wsi}-{x_br_wsi}-{y_br_wsi}{suffix}

Save a CSV report file with the saved tiles and the associated score.

Parameters
  • slide (Slide) – Slide from which to extract the tiles

  • extraction_mask (BinaryMask, optional) – BinaryMask object defining how to compute a binary mask from a Slide. Default BiggestTissueBoxMask.

  • report_path (str, optional) – Path to the CSV report. If None, no report will be saved

  • log_level (str, {"DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"}) – Threshold level for the log messages. Default “INFO”

Raises
  • TileSizeError – If the tile size is larger than the slide size

  • LevelError – If the level is not available for the slide

Return type

None

locate_tiles(slide, extraction_mask=<histolab.masks.BiggestTissueBoxMask object>, scale_factor=32, alpha=128, outline='red', linewidth=1, tiles=None)

Draw tile box references on a rescaled version of the slide

Parameters
  • slide (Slide) – Slide reference where placing the tiles

  • extraction_mask (BinaryMask, optional) – BinaryMask object defining how to compute a binary mask from a Slide. Default BiggestTissueBoxMask

  • scale_factor (int, optional) – Scaling factor for the returned image. Default is 32.

  • alpha (int, optional) – The alpha level to be applied to the rescaled slide. Default is 128.

  • outline (Union[str, Iterable[str], Iterable[Tuple[int]]], optional) – The outline color for the tile annotations. Default is ‘red’. You can provide this as a string compatible with matplotlib, or you can provide a list of the same length as the tiles, where each color is your assigned color for the corresponding individual tile. This list can be a list of matplotlib-style string colors, or a list of tuples of ints in the [0, 255] range, each of length 3, representing the red, green and blue color for each tile. For example, if you have two tiles that you want to be colored yellow, you can pass this argument as any of the following .. - ‘yellow’ - [‘yellow’, ‘yellow’] - [(255, 255, 0), (255, 255, 0)]

  • linewidth (int, optional) – Thickness of line used to draw tiles. Default is 1.

  • tiles (Optional[Iterable[Tile]], optional) – Tiles to visualize. Will be extracted if None. Default is None. You may decide to provide this argument if you do not want the tiles to be re-extracted for visualization if you already have the tiles in hand.

Returns

PIL Image of the rescaled slide with the extracted tiles outlined

Return type

PIL.Image.Image

property tile_size: Tuple[int, int]

(width, height) of the extracted tiles.