--- jupytext: text_representation: format_name: myst kernelspec: name: python3 display_name: Python 3 --- (how-to-cross-validation)= # Perform Cross-Validation Use scikit-learn's cross-validation with ktch transformers. ## Cross-validation with EFA Since EFA's `fit_transform` signature differs from sklearn's convention, apply EFA before cross-validation: ```{code-cell} ipython3 import numpy as np from sklearn.model_selection import cross_val_score from sklearn.pipeline import Pipeline from sklearn.decomposition import PCA from sklearn.svm import SVC from ktch.harmonic import EllipticFourierAnalysis # Generate outline data theta = np.linspace(0, 2 * np.pi, 64, endpoint=False) np.random.seed(42) outlines = [] labels = [] for i in range(20): scale = 1.0 + 0.2 * np.random.randn() outlines.append(np.column_stack([scale * np.cos(theta), np.sin(theta)])) labels.append(0 if scale < 1.0 else 1) outlines = np.array(outlines) labels = np.array(labels) # Apply EFA first (unsupervised transformation) efa = EllipticFourierAnalysis(n_harmonics=10) coefficients = efa.fit_transform(outlines) # PCA + SVC pipeline for cross-validation pipeline = Pipeline([ ('pca', PCA(n_components=3)), ('svc', SVC()) ]) scores = cross_val_score(pipeline, coefficients, labels, cv=3) print(f"Accuracy: {scores.mean():.2f} (+/- {scores.std():.2f})") ``` ## Cross-validation with GPA GPA expects flattened input. Apply GPA before cross-validation: ```{code-cell} ipython3 from sklearn.model_selection import StratifiedKFold from ktch.landmark import GeneralizedProcrustesAnalysis # Generate landmark data (3D then flatten) np.random.seed(42) landmarks_3d = np.random.randn(20, 4, 2) * 0.1 landmarks_3d += np.array([[0, 0], [1, 0], [1, 1], [0, 1]]) labels = np.array([0] * 10 + [1] * 10) # Flatten to (n_specimens, n_landmarks * n_dim) n_specimens, n_landmarks, n_dim = landmarks_3d.shape landmarks = landmarks_3d.reshape(n_specimens, n_landmarks * n_dim) # Apply GPA (unsupervised) gpa = GeneralizedProcrustesAnalysis() shapes = gpa.fit_transform(landmarks) # Cross-validation on aligned shapes pipeline = Pipeline([ ('pca', PCA(n_components=2)), ('svc', SVC()) ]) scores = cross_val_score(pipeline, shapes, labels, cv=3) print(f"Accuracy: {scores.mean():.2f} (+/- {scores.std():.2f})") ``` ```{seealso} - {doc}`use_with_pipeline` for Pipeline examples ```