Use Built-in Datasets#
ktch includes example datasets for learning and testing. Datasets are divided into two categories: bundled datasets shipped with the package, and remote datasets downloaded on first use.
Available datasets#
Bundled datasets#
These are included in the package and require no extra dependencies.
Function |
Type |
Description |
|---|---|---|
|
Landmarks |
18 landmarks from 127 mosquito wings |
|
Landmarks + Curves |
16 landmarks and 4 curves from 301 trilobite cephala |
|
Outlines |
100-point outlines from 126 mosquito wings |
|
Outlines (3D) |
200-point 3D outlines from 60 simulated leaves |
Remote datasets#
These are downloaded and cached locally on first use.
Install the optional data extras first:
pip install ktch[data]
Function |
Type |
Description |
|---|---|---|
|
Images |
25 leaf scan images of 10 Passiflora species |
Load landmark data#
from ktch.datasets import load_landmark_mosquito_wings
data = load_landmark_mosquito_wings()
print(f"Coordinates shape: {data.coords.shape}") # (n_specimens * n_landmarks, n_dim)
print(f"Metadata keys: {list(data.meta.keys())}")
Coordinates shape: (2286, 2)
Metadata keys: ['genus']
Load as pandas DataFrame#
data = load_landmark_mosquito_wings(as_frame=True)
print("Coordinates DataFrame:")
print(data.coords.head())
Coordinates DataFrame:
x y
specimen_id coord_id
1 0 -0.4933 0.0130
1 -0.0777 0.0832
2 0.2231 0.0861
3 0.2641 0.0462
4 0.2645 0.0261
Load outline data#
from ktch.datasets import load_outline_mosquito_wings
data = load_outline_mosquito_wings()
print(f"Outlines shape: {data.coords.shape}") # (n_specimens * n_points, n_dim)
Outlines shape: (12600, 2)
Load remote datasets#
Remote datasets support version pinning via the version parameter.
When omitted, the default version for the current ktch release is used.
from ktch.datasets import load_image_passiflora_leaves
# Use the default version
data = load_image_passiflora_leaves()
# Pin to a specific version
data = load_image_passiflora_leaves(version="2")
The dataset is downloaded once and cached locally. Subsequent calls load from the cache.
Dataset description#
Each dataset includes a description accessible via data.DESCR:
data = load_landmark_mosquito_wings()
print(data.DESCR[:200] + "...")
Mosquito Wing Landmark Dataset
======================================
...
See also
ktch.datasetsAPI reference