Last remembered highest accuracy; 66% accuracy, and it had a bunch of other stuff too that apparently didn't get pushed from the logger.

Readme is busted, it uploaded a bad readme. I'll run test sets on all the models and accumulate a proper model list with accuracies asap.

These currently defeat the standard vit-beatrix in terms of pure classification accuracy, while leaving both blocks nearly independent.

This enables efficient transfer learning without high-decay processes, but the system is a bit jank.

Today I plan to shore up the actual repo's capacity to ensure this sort of fault doesn't happen again, where I run something and lose tracking information.

Additionally the train manifest from all models will likely be stored in an independent repo elsewhere for automated connection and linkage with the huggingface systems.

ViT-Beatrix Dual-Stream with Geometric Diversity

This system is a dual-block transformer model inspired by Flux's dual-block structure.

Experimental Tests

One set of blocks is devoted to the geometry while the other set is devoted to the images ingested.

The accuracy of the geometry can be completely decoupled and the image portion zeroed to retrain if systems start to decay.

This has shown robust capability with multiple lineage trains

Imported geometry from another version showed that the geometry kept a cohesive shape, even when the image portion completely exploded. The model learned quickly and in non-shallow variance - presenting the potential of completely burning a model's shell with quick learning and then extracting the useful portions due to the stubbornness of the simplex and cayley-menger formulas.

When the geometry being left in a "frozen" state yeilds by far the worst outcomes - yet I froze everything including the geometric cross-attention and the subsystems while leaving the image-end of the cross-attention scrambled and learning, so more than likely it relearned incorrect math and got stuck at around 20%.

When the geometry is imported from the burned model and left learning with a scrambled image system randomized, the outcome yielded some VERY interesting results. I'll need to rerun everything because NONE of the tensorboards got uploaded, which is annoying considering this was basically an afternoon - but it will be done.

Current Experiment: beatrix-dualstream-base

Model Path: weights/beatrix-dualstream-base/20251009_030219/

Architecture

Visual Dimension: 768
Geometric Dimension: 768
Geometric Tokens: 32
Dual Blocks: 8 layers
k-simplex: 9

Geometric Diversity Loss

This model uses a class-aware geometric diversity loss that encourages:

Intra-class compactness: Same-class samples cluster in geometric space
Inter-class separation: Different classes maintain margin-based distance
Volume diversity: Classes occupy diverse simplex volumes
3D complexity: Non-planar geometric structures

Performance

Best Accuracy: 66.000%~ from memory
Current Epoch: 100 give or take required, sorry about this I'll get real data here asap.
Dataset: CIFAR-100

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support