--- jupytext: text_representation: extension: .md format_name: myst format_version: 0.13 jupytext_version: 1.18.1 kernelspec: display_name: Python 3 language: python name: python3 --- # 2D Outline Extraction from Images This tutorial shows how to extract outlines from images based on conventional image processing methods for use with ktch's Elliptic Fourier Analysis (EFA). ## Prerequisites ```{code-cell} ipython3 # !pip install ktch[data] opencv-python # Uncomment if needed ``` ## Step 1: Load Images We use the Passiflora leaf scan dataset bundled with ktch: 25 flatbed scan images of 10 *Passiflora* species spanning simple elliptical to deeply lobed leaf forms. Each image contains multiple leaves from one plant individual, arranged from tip (youngest) to base (oldest). This dataset is a subset of the *Passiflora* leaf scan data from [Chitwood & Otoni (2017)](https://doi.org/10.1093/gigascience/giw008) (data: [Chitwood & Otoni, 2016](https://doi.org/10.5524/100251)). See `data.DESCR` for full details. ```{code-cell} ipython3 import numpy as np import cv2 as cv import matplotlib.pyplot as plt from ktch.datasets import load_image_passiflora_leaves ``` ```{code-cell} ipython3 :tags: [hide-input] :mystnb: : code_prompt_show: "Show definition: plot_images()" : code_prompt_hide: "Hide definition: plot_images()" def plot_images(images, title_suffix="", cmap=None, outlines_per_image=None, labels=None, n_cols=2): """Plot images in a grid, optionally with outlines overlaid.""" n_images = len(images) n_rows = (n_images + n_cols - 1) // n_cols fig, axes = plt.subplots(n_rows, n_cols, figsize=(4 * n_cols, 4 * n_rows)) axes = axes.flatten() for i, ax in enumerate(axes): if i < n_images: label = labels[i] if labels is not None else f"Image {i+1}" if outlines_per_image is not None: ax.imshow(images[i]) outlines = outlines_per_image[i] colors = plt.cm.tab10(np.linspace(0, 1, max(len(outlines), 1))) for j, outline in enumerate(outlines): ax.plot(outline[:, 0], outline[:, 1], "-", linewidth=2, color=colors[j]) ax.set_title(f"{label}: {len(outlines)} leaves") else: ax.imshow(images[i], cmap=cmap) title = label if title_suffix: title += f" ({title_suffix})" ax.set_title(title) ax.axis("off") plt.tight_layout() ``` ```{code-cell} ipython3 data = load_image_passiflora_leaves(as_frame=True) print(f"Number of images: {len(data.images)}") print(f"Species: {data.meta['species'].nunique()}") print() print(data.meta[["abbreviation", "species"]].value_counts().sort_index()) ``` ```{code-cell} ipython3 labels = [ f"{row.abbreviation} ({row.species})" for _, row in data.meta.iterrows() ] plot_images(data.images, labels=labels) ``` ## Step 2: Convert to Grayscale ```{code-cell} ipython3 images_gray = [cv.cvtColor(img, cv.COLOR_RGB2GRAY) for img in data.images] ``` ```{code-cell} ipython3 plot_images(images_gray, "Grayscale", cmap="gray") ``` ## Step 3: Binarize the Image We combine two binarization methods with OR to capture leaves that are either darker than the background or green in color. ### Otsu's thresholding Automatically finds a threshold based on brightness histogram. ```{code-cell} ipython3 images_binary_otsu = [] for img_gray in images_gray: _, binary_otsu = cv.threshold( img_gray, 0, 255, cv.THRESH_BINARY_INV + cv.THRESH_OTSU ) images_binary_otsu.append(binary_otsu) ``` ```{code-cell} ipython3 plot_images(images_binary_otsu, "Otsu", cmap="gray") ``` ### Green mask in HSV Targets green hues with sufficient saturation and value. ```{code-cell} ipython3 hue_min, hue_max = 30, 90 sat_min, val_min = 40, 40 images_binary_green = [] for img_rgb in data.images: img_hsv = cv.cvtColor(img_rgb, cv.COLOR_RGB2HSV) lower_green = np.array([hue_min, sat_min, val_min]) upper_green = np.array([hue_max, 255, 255]) binary_green = cv.inRange(img_hsv, lower_green, upper_green) images_binary_green.append(binary_green) ``` ```{code-cell} ipython3 plot_images(images_binary_green, "Green Mask", cmap="gray") ``` ### Combine with OR ```{code-cell} ipython3 images_binary = [ cv.bitwise_or(otsu, green) for otsu, green in zip(images_binary_otsu, images_binary_green) ] ``` ```{code-cell} ipython3 plot_images(images_binary, "Combined (OR)", cmap="gray") ``` ## Step 4: Morphological Opening Remove small noise with morphological opening. ```{code-cell} ipython3 kernel_size = 5 kernel = cv.getStructuringElement(cv.MORPH_ELLIPSE, (kernel_size, kernel_size)) images_opened = [ cv.morphologyEx(img_binary, cv.MORPH_OPEN, kernel) for img_binary in images_binary ] ``` ```{code-cell} ipython3 plot_images(images_opened, "Opened", cmap="gray") ``` ## Step 5: Extract Outlines Extract contours and filter by area and color. :::{warning} This simple pipeline does not work perfectly for all images. For example, deeply lobed species like *P. cincinnata* may have individual leaflets detected as separate leaves, and pale-colored leaves in *P. edulis* scans may be partially merged with the background, producing jagged outlines. These cases require species-specific tuning or more advanced segmentation methods, but are left as-is here to illustrate typical failure cases. ::: ### Color filter function ```{code-cell} ipython3 def is_green_region(img_rgb, contour, hue_range=(30, 90), min_green_ratio=0.5): """Check if the contour region contains enough green pixels.""" mask = np.zeros(img_rgb.shape[:2], dtype=np.uint8) cv.drawContours(mask, [contour], 0, 255, -1) region_pixels = img_rgb[mask == 255] if len(region_pixels) == 0: return False region_rgb = region_pixels.reshape(-1, 1, 3) region_hsv = cv.cvtColor(region_rgb, cv.COLOR_RGB2HSV).reshape(-1, 3) h, s, v = region_hsv[:, 0], region_hsv[:, 1], region_hsv[:, 2] is_green = (h >= hue_range[0]) & (h <= hue_range[1]) & (s >= 40) & (v >= 40) return np.sum(is_green) / len(region_pixels) >= min_green_ratio ``` ### Filter parameters ```{code-cell} ipython3 min_area = 50 hue_range = (30, 90) min_green_ratio = 0.5 ``` ### Extract and filter contours ```{code-cell} ipython3 all_outlines = [] outlines_per_image = [] for idx, img_opened in enumerate(images_opened): img_rgb = data.images[idx] contours, _ = cv.findContours( img_opened, cv.RETR_EXTERNAL, cv.CHAIN_APPROX_NONE ) contours_filtered = [ c for c in contours if cv.contourArea(c) >= min_area and is_green_region(img_rgb, c, hue_range, min_green_ratio) ] outlines = [c[:, 0, :].astype(np.float64) for c in contours_filtered] outlines_per_image.append(outlines) all_outlines.extend(outlines) print(f"Total: {len(all_outlines)} outlines") ``` ```{code-cell} ipython3 plot_images(data.images, outlines_per_image=outlines_per_image, labels=labels) ``` ```{code-cell} ipython3 if len(all_outlines) > 0: fig, ax = plt.subplots(figsize=(10, 10)) for outline in all_outlines: ax.plot(outline[:, 0], -outline[:, 1], alpha=0.5, linewidth=1) ax.set_aspect("equal") ax.set_title(f"All Extracted Outlines ({len(all_outlines)} leaves)") plt.tight_layout() else: print("No outlines to plot") ``` ## Step 6: Verify Output Format The output format is compatible with ktch's `EllipticFourierAnalysis`. ```{code-cell} ipython3 print(f"Type: {type(all_outlines)}") print(f"Number of specimens: {len(all_outlines)}") if len(all_outlines) == 0: print("\nWarning: No outlines detected. Try adjusting parameters:") print(" - Decrease min_area") print(" - Adjust hue_range for your object color") print(" - Decrease min_green_ratio") else: print(f"Shape of first outline: {all_outlines[0].shape}") ``` ```{code-cell} ipython3 from ktch.harmonic import EllipticFourierAnalysis if len(all_outlines) > 0: efa = EllipticFourierAnalysis(n_harmonics=20) coef = efa.fit_transform(all_outlines) print(f"EFA output shape: {coef.shape}") else: print("Skipping EFA: no outlines to process") ``` ## Summary This tutorial demonstrated a complete pipeline for extracting outlines from images: grayscale conversion, binarization (Otsu + green mask), morphological opening, and contour filtering. The extracted outlines are ready for EFA analysis with ktch. ## Next Steps - {doc}`../harmonic/elliptic_Fourier_analysis` - EFA with PCA visualization - {doc}`../../how-to/data/load_chc` - Save/load outline data - {doc}`../../explanation/harmonic` - Theory behind harmonic analysis ## Troubleshooting ### Too few contours - Decrease `min_area` to include smaller regions - Widen `hue_range` or lower `min_green_ratio` for less strict color filtering - Decrease `kernel_size` to preserve fine details ### Too many contours (noise) - Increase `min_area` to filter out small artifacts - Increase `kernel_size` for stronger noise removal - Raise `min_green_ratio` for stricter color filtering ### Inconsistent orientation EFA results depend on outline direction (clockwise vs counter-clockwise). Compute signed area and reverse outlines if negative to standardize. ### Holes or gaps in binary image - Apply morphological closing after opening: `cv.morphologyEx(img, cv.MORPH_CLOSE, kernel)` - Adjust binarization thresholds or HSV ranges