Skip to main content

Table 2 Assessment of variable importance and predictive accuracy in bacterial community clustering using Random Forest analysis

From: Characterization of bacterial communities of ewe’s vaginal tract and its potential impact on reproductive efficiency

Variable

1Random Forest

Mean decrease accuracy

Mean decrease gini

Herd

0.367

100.039

Breed

0.295

68.394

Pregnancy

0.003

3.371

 

2Cross-Validation

Accuracy

SD

Herd

0.8919a

0.0177

Breed

0.7285a

0.0706

Pregnancy

0.3879b

0.0700

  1. Impact of herd, breed, and pregnancy, on bacterial cluster categorization, evaluated by Random Forest importance scores under CLR data transformations. Predictive accuracy was assessed through cross-validation
  2. 1Random Forest results where the predicted values were the clusters (n = 4) with variables evaluated including herd, breed, and pregnancy. Mean Decrease Accuracy: The average reduction in model accuracy when a variable is omitted. Mean Decrease Gini: The reduction in the Gini coefficient when a variable is omitted, indicating variable importance
  3. 2Cross Validation: Assesses the predictive accuracy of the Random Forest model via k-fold Method (k = 5). Accuracy: The proportion of correct predictions made by the model, estimated by the mean percentage of those which were correctly assigned. Std: Standard deviation of the accuracy, providing a measure of its variation across the cross-validation folds. Significance between accuracy results is denoted by superscript letters: different letters (a, b, c) indicate statistically significant differences between groups as determined by post-hoc Dunn's testing with FDR adjustment