Skip to main content

Table 1 Pretrained vision transformer (ViT) models used for feature extraction

From: AI for rapid identification of major butyrate-producing bacteria in rhesus macaques (Macaca mulatta)

Model

Dataset

Use case

vit_tiny_patch16_224 [37]

ImageNet dataset

Lightweight vision tasks with faster inference, typically applied to ImageNet-like datasets.

vit_large_patch16_224. augreg_in21k_ft_in1k [37, 38]

ImageNet dataset

A larger ViT model for general vision tasks, also trained on ImageNet.

vit_base_patch16_clip_224 [39]

OpenAI’s CLIP dataset

Utilizes CLIP’s dataset for vision tasks, focusing on robustness to out-of-distribution data.

vit_base_patch16_224_dino [40]

ImageNet dataset

A self-supervised learning model, ideal for unsupervised applications beyond ImageNet-tasks

resnetv2_50 × 1_bit_distilled [41, 42]

JFT-300 M dataset

Optimized for transfer learning with large-scale datasets like JFT-300 M