Last remembered highest accuracy; 66% accuracy, and it had a bunch of other stuff too that apparently didn't get pushed from the logger.
Readme is busted, it uploaded a bad readme. I'll run test sets on all the models and accumulate a proper model list with accuracies asap.
These currently defeat the standard vit-beatrix in terms of pure classification accuracy, while leaving both blocks nearly independent.
This enables efficient transfer learning without high-decay processes, but the system is a bit jank.
Today I plan to shore up the actual repo's capacity to ensure this sort of fault doesn't happen again, where I run something and lose tracking information.
Additionally the train manifest from all models will likely be stored in an independent repo elsewhere for automated connection and linkage with the huggingface systems.
ViT-Beatrix Dual-Stream with Geometric Diversity
This system is a dual-block transformer model inspired by Flux's dual-block structure.
Experimental Tests
One set of blocks is devoted to the geometry while the other set is devoted to the images ingested.
The accuracy of the geometry can be completely decoupled and the image portion zeroed to retrain if systems start to decay.
This has shown robust capability with multiple lineage trains
Imported geometry from another version showed that the geometry kept a cohesive shape, even when the image portion completely exploded. The model learned quickly and in non-shallow variance - presenting the potential of completely burning a model's shell with quick learning and then extracting the useful portions due to the stubbornness of the simplex and cayley-menger formulas.
When the geometry being left in a "frozen" state yeilds by far the worst outcomes - yet I froze everything including the geometric cross-attention and the subsystems while leaving the image-end of the cross-attention scrambled and learning, so more than likely it relearned incorrect math and got stuck at around 20%.
When the geometry is imported from the burned model and left learning with a scrambled image system randomized, the outcome yielded some VERY interesting results. I'll need to rerun everything because NONE of the tensorboards got uploaded, which is annoying considering this was basically an afternoon - but it will be done.
Current Experiment: beatrix-dualstream-base
Model Path: weights/beatrix-dualstream-base/20251009_030219/
Architecture
- Visual Dimension: 768
- Geometric Dimension: 768
- Geometric Tokens: 32
- Dual Blocks: 8 layers
- k-simplex: 9
Geometric Diversity Loss
This model uses a class-aware geometric diversity loss that encourages:
- Intra-class compactness: Same-class samples cluster in geometric space
- Inter-class separation: Different classes maintain margin-based distance
- Volume diversity: Classes occupy diverse simplex volumes
- 3D complexity: Non-planar geometric structures
Performance
- Best Accuracy: 66.000%~ from memory
- Current Epoch: 100 give or take required, sorry about this I'll get real data here asap.
- Dataset: CIFAR-100