---
jupytext:
  text_representation:
    format_name: myst
kernelspec:
  name: python3
  display_name: Python 3
---

(how-to-use-with-pipeline)=

# Use with scikit-learn Pipeline

ktch transformers follow the scikit-learn API.

## Basic usage with GPA

GPA expects flattened input of shape `(n_specimens, n_landmarks * n_dim)`:

```{code-cell} ipython3
import numpy as np
from sklearn.decomposition import PCA
from ktch.landmark import GeneralizedProcrustesAnalysis

# Minimal data: 5 specimens, 4 landmarks, 2D
landmarks_3d = np.array([
    [[0, 0], [1, 0], [1, 1], [0, 1]],
    [[0.1, 0], [1.1, 0], [1, 1.1], [0, 1]],
    [[0, 0.1], [1, 0], [1.1, 1], [0, 1.1]],
    [[0.05, 0.05], [1.05, 0], [1, 1.05], [0, 1]],
    [[0, 0], [1.05, 0.05], [1, 1], [0.05, 1.05]],
], dtype=float)

# Flatten to (n_specimens, n_landmarks * n_dim)
n_specimens, n_landmarks, n_dim = landmarks_3d.shape
landmarks = landmarks_3d.reshape(n_specimens, n_landmarks * n_dim)

# GPA then PCA
gpa = GeneralizedProcrustesAnalysis()
shapes = gpa.fit_transform(landmarks)

pca = PCA(n_components=2)
pc_scores = pca.fit_transform(shapes)
print(f"PC scores shape: {pc_scores.shape}")
```

## Basic usage with EFA

EFA can be used in a sklearn Pipeline for unsupervised transformations:

```{code-cell} ipython3
from sklearn.pipeline import Pipeline
from ktch.harmonic import EllipticFourierAnalysis

# Minimal data: 3 elliptical outlines with variations
theta = np.linspace(0, 2 * np.pi, 64, endpoint=False)
outlines = np.array([
    np.column_stack([1.0 * np.cos(theta), 0.8 * np.sin(theta)]),
    np.column_stack([1.2 * np.cos(theta), 0.7 * np.sin(theta)]),
    np.column_stack([0.9 * np.cos(theta), 1.0 * np.sin(theta)]),
])

# EFA + PCA pipeline (no y parameter)
pipeline = Pipeline([
    ('efa', EllipticFourierAnalysis(n_harmonics=10)),
    ('pca', PCA(n_components=2))
])

pc_scores = pipeline.fit_transform(outlines)
print(f"PC scores shape: {pc_scores.shape}")
```

## Classification with EFA coefficients

For supervised tasks, apply EFA separately before the classification pipeline:

```{code-cell} ipython3
from sklearn.svm import SVC

# More data for classification
np.random.seed(42)
outlines_more = []
labels = []
for i in range(20):
    scale = 1.0 + 0.1 * np.random.randn()
    outlines_more.append(np.column_stack([scale * np.cos(theta), np.sin(theta)]))
    labels.append(0 if scale < 1.0 else 1)
outlines_more = np.array(outlines_more)
labels = np.array(labels)

# Apply EFA first (unsupervised)
efa = EllipticFourierAnalysis(n_harmonics=10)
coefficients = efa.fit_transform(outlines_more)

# Then use PCA + SVC pipeline on coefficients
pipeline = Pipeline([
    ('pca', PCA(n_components=3)),
    ('svc', SVC())
])

pipeline.fit(coefficients, labels)
print(f"Training accuracy: {pipeline.score(coefficients, labels):.2f}")
```

```{seealso}
- {doc}`cross_validation` for cross-validation examples
- {doc}`../../explanation/morphometrics` for background
```