DeepGHS

community

Verified

deepghs

Activity Feed Request to join this org

AI & ML interests

Computer Vision Technology and Data Collection for Anime Waifu

Recent Activity

Aratako authored a paper 2 days ago

T5Gemma-TTS Technical Report

Aratako submitted a paper 2 days ago

T5Gemma-TTS Technical Report

wangsssssss authored a paper 3 days ago

Cubic Discrete Diffusion: Discrete Visual Generation on High-Dimensional Representation Tokens

View all activity

Papers

A Large-scale Dataset for Robust Complex Anime Scene Text Detection

View all Papers

AbstractPhil

posted an update 1 day ago

Post

The geolip-transformer-v8 requires a fundamental rethinking of training a core structure.

I'll make this brief and to the point.

GEOLIP is an observer system at it's core. It watches, triangulates, and assists with correct answers.

Many experiments worked very well, many fell down and turned into a pile of broken circuits. The recent geometric-transformer being one of my biggest fumbles, still taught me many things about what I'm TRULY trying to accomplish here.

**Save money and lives**. Less hardware use for less need at inference. Train more calculations into a more reusable and accurate structure for near instant zero-shot or sequential inference.

In the process v8 unlocked a missing puzzle piece, EMA trajectory alignment compensation. I'm doing my best to build something that works.

The geolip distillation system is very powerful but requires much experimentation still.
* Genetic experiments were successful
* Data transfer experiments successful
* Analysis experiments successful - and expand large model accuracy
* Many distillation experiments were successful.
* The largest successes being the kernels, the distillation tools, and the geometric analysis systems.

With the good comes the bad, the faulty VITs, the simultaneous trains that fault, the internalized confusion that happens occasionally.
*** The observer NEEDS something to OBSERVE. If the observer observes the progressive development of point cloud structures, it learns how to observe THAT LEARNING PROCESS - drifting fault assessment.
*** In the process it DOES NOT learn how to improve the CE relations by embedding and compensating with anchored triangulation opinions.

BIGGEST CONCLUSION. Staged curriculum training.

These components must be DECOUPLED. One must be a compounding structural awareness beacon, the other must be an informationally aligned composition in a utilizable fashion.

This means stage-by-stage freeze/unfreeze processing. Independent task-oriented structural alignment.

2 replies

Aratako

authored a paper 2 days ago

T5Gemma-TTS Technical Report

Paper • 2604.01760 • Published 3 days ago • 5

Aratako

submitted a paper to Daily Papers 2 days ago

T5Gemma-TTS Technical Report

Paper • 2604.01760 • Published 3 days ago • 5

AbstractPhil

posted an update 4 days ago

Post

134

My heavily engineered repo; https://github.com/AbstractEyes/pytorch-parallel-compiler has been directly integrated into the geofractal repo for v1.2, if you use the geofractal repo be sure to pull for potential performance increases.

The WideRouter will enable multiple core new features; the predominant two for our next experiment are as follows.

1. Directly integrated multi-opinion constellation structures. This will enable dynamic compiled expansions internally within the structure for huge performance gains.
2. Controllable stage-by-stage compilation. Each stage can be compiled or not. SVD being notoriously non-compiler friendly due to the linalg.egens, I will be addressing this particular function DIRECTLY soon. There will be no quarter for graph breaks.

If the WideRouter causes any major bugs or breaks with your code, bad calculations, incorrect deviated gradients, twisted or contorted dtype outputs, or any major compilation errors; please don't hesitate to open a pull request. Claude and I will abruptly solve any major issues.

Once everything is perfectly in-line and the graph matches, the transformer will have massive geometric performance boosts for huge structural basins with multiple layers of depth.

I will be addressing the linalg.eig+eigh directly in conjunction with multiple argsort functions that are causing huge performance dips. As well as addressing every single use of .item() that can present itself in the compiler's path.

After this, the ensemble topological transformer will be a-go. Which will enable quaternion, FlowMagnitude, FlowAlignment, FlowVelocity, FlowVelocityQuaternion, FlowVelocityOrbital, FlowVelocityPentachoron, and multiple other flow matching systems that will improve performance by dominating amounts inline with minimal overhead cost due to the precomputed geometric structure.

The ensembles will feature multiple simultaneous batched and segmented forms of learning meant to train the oscillation omega predictor "Beatrix".

5 replies

KBlueLeaf

authored a paper 5 days ago

ImagenWorld: Stress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World Tasks

Paper • 2603.27862 • Published 6 days ago • 26

prithivMLmods

posted an update 9 days ago

Post

5113

Flux-Klein-KV-Edit-Consistency demo is now available on Spaces. It preserves character identity and delivers high-quality, realistic results after edits. No need for any special prompts, just upload the image, type your prompt, and get the resulting image blazing fast.

🔥 Demo Space: prithivMLmods/flux-klein-kv-edit-consistency
🤗 Model: black-forest-labs/FLUX.2-klein-9b-kv
🤗 Collection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection
🔗 Gradio Server Mode: https://www.gradio.app/main/guides/server-mode

➔ Built with Headless Gradio, an alternative to using gr.Blocks for creating the frontend and triggering events, powered by FastAPI + Gradio. You can now design the frontend however you want, with continued support for APIs, MCP, and ZeroGPU.

➔ Gradio Server Mode is now available from gradio@v6.10.0.

To learn more, visit the app page or the respective model pages.

AbstractPhil

posted an update 9 days ago

Post

130

geolip-ryan-spearman, the first dedicated protein observation structure meant to expand the tooling of the observer modeling system and introducing additional introspective analysis to the equation for genetic mutation and abnormality.

AbstractPhil/geolip-esm2_t33_650M_UR50D

This model is based on edm2 33 650m from facebook, assessed with specific benchmarks to be around 50% accurate or so. I'll be improving those numbers by self distillation spectrum. The models will never see the validation data while unfrozen. The full spectrum of training tools are visible.

This is the first self-distillation observer prototype, and it works. Not as rapidly as I had hoped, but it most definitely works. The SVD was the missing piece of geometric solidity required to preserve full rotational behavioral control. The kernel made this possible for rapid iteration, and the first results are coming in.

This inherits much of the functionality from the CLIP_L and CLIP_G memory banks, while benefitting from the advanced research I performed while extracting CaptionBert 5x bert pooled captions for target points.

The primary driving point here is the sheer data size - and the important contributions of that data size to a full construct of geometric aligned data. There is a massive amount of very specific information, all curated, perfectly labeled, and organized in a way that can be... well not so easily accessed, but I did find a few ways in.

This data is highly accurate and forged through life for billions of years. This is what is there, this is what is expected, and I have the tooling - stage by stage, to not only develop a solution for the problem, but to fully contribute to an improved version with minimal hardware requirement for training.

This is real expectation and the results are pouring in hourly, this can improve models beyond a reasonable baseline while preserving the baseline's correctness.

3 replies

AbstractPhil

posted an update 11 days ago

Post

169

SVD + Scatterpoint2D is the official encoding structure of the geolip system as of the image encoding tests.

Both unattuned scatterpoint2d and triton-aligned SVD are a cut above the rest by a large margin.

https://github.com/kymatio/kymatio
https://huggingface.co/blog/AbstractPhil/svd-triton-kernel-optimization
AbstractPhil/svd-triton
AbstractPhil/geolip-hypersphere-experiments

Most kymatio tests were done on standard pytorch models that yielded higher accuracy than simple conv or transformers before overfitting, but not in every instance. Most common tested low-count cifar10 and cifar100 instances yielded more for less. Those are in the hypersphere-experiments notebooks and are viewable via huggingface tensorboard metrics.

The accuracy, retention, agreement, disagreement, and sheer capacity of the refined SVD kernel shows that full Procrustes alignment is not just crucial to distillation, but also entirely representable within encoders themselves as students.

This structure can representationally re-impose layer-by-layer which is what I tested, and this capture system can behave as a global regularization system, a selector, a behavioral adjudication structure, an encoding solidification unit, a trajectory systemic accumulator, an anchored differentiation unit, and about 30 other tests show - all of the above simultaneously.

The preliminary rapid-iteration capable kernel shows that not only can these behaviorally represent utility, but the noise-drift can be directly accounted for using systems like GELU, drop path, dropout, and other elements to learn to ignore that very noise that accumulates.

Attention is now officially deemed valid when utilized based on the tests and examples allowing preserved geometric structure after attention selection.

This encoding structure is substantially more durable than I can give credit for.

Surge is coming, exactly as predicted. Late I admit.

1 reply

prithivMLmods

posted an update 15 days ago

Post

4424

Map-Anything v1 (Universal Feed-Forward Metric 3D Reconstruction) demo is now available on Hugging Face Spaces. Built with Gradio and integrated with Rerun, it performs multi-image and video-based 3D reconstruction, depth, normal map, and interactive measurements.

🤗 Demo: prithivMLmods/Map-Anything-v1
🤗 Model: facebook/map-anything-v1
🤗 Hf-Papers: MapAnything: Universal Feed-Forward Metric 3D Reconstruction (2509.13414)

AbstractPhil

posted an update 17 days ago

Post

218

I built an actionable todo based on current research, former research, and compounded a full spectrum of potentials for image encoding into pure geometric structures, hybrid geometric structures, partial geometric structures, and full spectrum analysis relational structures. Claude built the manifest based on our research after forming a full research spectrum to head into actionable directions.

AbstractPhil/geolip-hypersphere-experiments

I have to say before I continue, Claude managed to keep a large running manifest of our research, and with that list this was possible. Without that list, this would have been entirely devoid of purpose, and Claude would likely have not extracted the information in a utilizable state for this solution set.

I'll be running the full series of tests in conjunction with the constellation architecture. Either it survives, or something entirely new will form. Based on the results from these tests, the directions will evolve.

Either way, the most optimal and fastest methodologies for this system will be benchmarked and utilized as the primary use-cases. The slower and more obviously higher-resolution variations will be optimized as much as possible and solutions provided.

Lets do this right.

With that, the first experiment will be geolip-anchor-scattering and the structure will be based on the first in the list.

I will be updating posts based on benchmarks, landmarks, and new insights while the Bert data cooks.

4 replies

prithivMLmods

posted an update 19 days ago

Post

3076

Introducing QIE-Bbox-Studio! 🔥🤗

The QIE-Bbox-Studio demo is now live — more precise and packed with more options. Users can manipulate images with object removal, design addition, and even move objects from one place to another, all in just 4-step fast inference.

🤗 Demo: prithivMLmods/QIE-Bbox-Studio
🔗 GitHub: https://github.com/PRITHIVSAKTHIUR/QIE-Bbox-Studio

🚀 Models [LoRA] :

● QIE-2511-Object-Mover-Bbox: prithivMLmods/QIE-2511-Object-Mover-Bbox
● QIE-2511-Object-Remover-Bbox-v3: prithivMLmods/QIE-2511-Object-Remover-Bbox-v3
● QIE-2511-Outfit-Design-Layout: prithivMLmods/QIE-2511-Outfit-Design-Layout
● QIE-2509-Object-Remover-Bbox-v3: prithivMLmods/QIE-2509-Object-Remover-Bbox-v3
● QIE-2509-Object-Mover-Bbox: prithivMLmods/QIE-2509-Object-Mover-Bbox

🚀 Collection:

● Qwen Image Edit [Layout Bbox]: https://huggingface.co/collections/prithivMLmods/qwen-image-edit-layout-bbox

To learn more, visit the app page or the respective model pages.

AbstractPhil

posted an update 20 days ago

Post

189

Clawd breadcrumb trail AbstractPhil/geolip-hypersphere-experiments

With this I'll begin forming Clawd interface utility with the geofractal router, which will allow Clawd to form agentic clouds of utility that can be datawise trained on the go with minimal hardware requirement. This is not ready yet, but it begins very soon.

The recent experiments have solved the alignment issue that crippled collectives and forced my hand into ensemble research instead.

With those recent experiments, the geofractal router will allow modularization structural capacity after some preliminary alignment adjustment and adjudication experimentation. This will enable the full collective differentiation through codified attribution.

In other words, adding and removing modular AI elements to contribute to aligned communication streams, all speaking the same language. This is an adjacent and more powerful result than the anticipated geovocab patchwork, and it yields substantially more effective agentic solutions than moving around a bulky embedding echo-chamber.

https://github.com/AbstractEyes/geofractal

Procrustes whitening orthogonality will allow adding and removing elements from geofractal routers given a small amount of prep data, while the anchors of expectation can stay as a snap-on element.

The most inquisitive and interested researchers can follow the trail to find all of the experiments. Web crawl it with clawd and you can probably create a unified rationality pretty quickly, but I doubt you'll like what you find. The journey was extensive and the failures outweighed the successes, but I did find the lightbulb.

The represented outcomes are either in my articles in huggingface, my civit articles, my github repos, my huggingface repos, or I forgot to upload them and they're in my colab notebook heap.

As most research yields, it is mostly failures. However, there are many successes in the mix. Many. If you need solutions, you can dredge the bog.

1 reply

prithivMLmods

posted an update 22 days ago

Post

5061

QIE-2509-Object-Remover-Bbox-v3 is a more stable version of the Qwen Image Edit visual grounding–based object removal model. The app was previously featured in HF Spaces of the Week and is now updated with the latest Bbox-v3 LoRA adapter.

🤗 Demo: prithivMLmods/QIE-Object-Remover-Bbox
🤗 LoRA: prithivMLmods/QIE-2509-Object-Remover-Bbox-v3
🤗 Collection: https://huggingface.co/collections/prithivMLmods/qwen-image-edit-layout-bbox

To learn more, visit the app page or the respective model pages.

2 replies

AbstractPhil

posted an update 23 days ago

Post

145

geolip-vit-x34 - 34 expert vit. I can't train an extended version of 34 vits, but I can definitely run some experiments and make some starter weights with an anchor. That would yield a substantial amount of data.

AbstractPhil/bulk-coco-features

This... is going to be a odd one to describe. Based on the research with Bert, creating a uniformed patchwork using a multitude of vit composites will be very achievable. It shouldn't be soup, which is really hard to explain, but by creating a second geometric anchor, the system will align in a way that I could never predict without many more model analysis and must test. I simply didn't test all these vits for geometry, so this will be the test.

This is essentially 34 directly extracted views of coco, which is already prepared feature data. With this data, we have 34 experts that can distill into a single unified vit. I'm hesitant to even call this distillation anymore, it's more interpolative data alignment, and it's absurdly retentive.

ADDITIONALLY, we can anchor to frozen geolip-bert and create cross-contrast between the anchors for a learned anchor median, which will allow further integrations directly into the geometric core.

This will require a few overlapping internal mechanisms to guarantee vit differentiation, however I believe the full unified patchwork will be... different from what is currently known as a vit.

geolip-bert-vit will likely be cooking within the month. The alignment statistics say it will be... 100% accurate to the specifications.

I CAN prepare 34 vits worth of imagenet, but I would need probably 34 vits worth of laion aesthetics, which is substantially more than I currently have. In the process I would need to ensure everything isn't corrupt, and the captions are correctly synthesized in our expert student bert with the correct anchoring rotation.

Probably 3 vits is enough for the full version prototype, 34 vits for the bulk experiment.

3 replies

AbstractPhil

posted an update 24 days ago

Post

168

geolip-captionbert-8192

This bert is currently being distilled using 5 bert teachers using the conceptual captions dataset. The recall accuracy is based on the whitened procrustes alignment, and the losses reflect keeping that rotation aligned correctly.

The expectation from the smaller prototypes show this model will align to 100% accuracy recall based on the most optimal opinions based on the correct answer, aligning specifically to the correct answers in conjunction with all the geometric losses.

No joke, this may be the smallest, least computation, most accurate, and fastest bert I've trained thus far - and it will be based entirely on five teachers simultaneously feeding opinions through a relay hub.

13 replies

AbstractPhil

posted an update 26 days ago

Post

138

I'll attempt to expand the geolip-clip to full sequence context window to encompass sequential learning.
AbstractPhil/geolip-clip-vit-large-patch14-ctx576
The memory pod is specifically meant to tune everything based on final state pooling, which is fine if you aren't trying to actually use sequential utility.
HOWEVER, there are many elemental biases that present themselves if attempting to USE the standard sequence of 77 in conjunction with this final pooled state. Even though the standard 77 is predominantly noise past token 10 it still houses considerable amounts of information in terms of utility, so this should be handled carefully. Zero-shot structures are a tricky structure to analyze, especially structures based on attention mechanisms instead of true sequential accumulation. I've noticed I need to watch them for quite a while before the real bugs show up.

As it stands the token pool is essentially [B, 7+8, 768] for pools. This contains a robust and highly complex representation of useful accumulated bidirectional attention data, so it's quite powerful.

I'll build a few prototypes and tap into some papers. I'll either come up with something or a reason why I didn't. The end result will either produce an anchor bank set of tokens [B, 15, 768] for pooling, or [B, 15, 77, 768] ideally - which should expand the sequence of the clip to 1,155 if successful. That doesn't necessarily mean this sequence will be more useful than the [b, 15, 768], but it will be representationally valid to the context window expansion.

I wouldn't hold out for a single full-sequence option in a single day, that's a lot of moving parts to analyze, not to mention highly impractical to train with. A smaller dose of this information would be necessary for rapid prototyping so it'll likely be packaged as such.

Well I spoke too soon. It's ready to play with.
AbstractPhil/geolip-clip-vit-large-patch14-ctx576-seq77

4 replies

AbstractPhil

posted an update 29 days ago

Post

268

geolip-bertenstein-v1 - 5 experts chosen. A collective of shared transformer aligned experts, not a mixture of experts. Similar to a MOE, but not quite. This first prototype won't have the full mailing projection relay system afforded by the geofractal router, but it will definitely be a solid prototype.

It is not production ready yet, there needs to be a few upstream and downstream tools meant to consume and process the outputs to create useful representations.

This model will be able to text respond, use whisper, see with dinolip, code with codebert, and process proteins using esm2_t33_650m_ur50.

Our experts for the prototype are;
google-bert/bert-large-uncased
facebook/dinov2-large
microsoft/codebert-base
openai/whisper-large-v3
facebook/esm2_t33_650M_UR50

Not the smartest text model, but more than enough for this preliminary use case test setup. Text is predominantly meant to align and orient downward function, the entire machine is meant to be operated unilaterally as a collective, or independently through individual pairs requests via special token access.

This model will be capable of substantial power and feats as a prototype. It will be capable of seeing and processing differential equations utilizing dinov2 and esm2 data simultaneously, which can be used for downstream analysis - and I WILL use that data to create a more powerful connection between dinov2 tokens, protein tokens, video tokens, code tokens, and audio tokens.

This is the FIRST prototype of this case, and I will introduce video, genetics, shape analysis, pattern recognition processing, and a much more powerful and reusable text model.

The tests show the models can have differential communication through the geolip transformers after procrustes pairwise analysis and pentachoron CV protective measures.

Whitening procrustes for precalculation and center-aligning allows for a faster convergence, so that should help too.

2 replies

prithivMLmods

posted an update 29 days ago

Post

5029

The Qwen3.5 Multimodal Understanding Demo, powered by Qwen3.5-2B, is now available on HF Spaces! It is a lightweight model designed for fast image and video reasoning. Built with Gradio, the demo showcases Image QA, Video QA, object detection, and 2D point tracking, along with real-time token streaming.

🤗 Demo: prithivMLmods/Qwen-3.5-HF-Demo
✅ Collection: https://huggingface.co/collections/prithivMLmods/multimodal-implementations
🔗 Qwen3.5-2B: Qwen/Qwen3.5-2B

To learn more, visit the app page or the respective model pages.

AbstractPhil

posted an update 30 days ago

Post

1978

I've... done it. This, with experts, achieves near 100% R1 retrieval accuracy on an adjacent - unseen by the fusion transformer - dataset with around 40k steps from the seen dataset. This means the language of the models are at least tested fused within the constraints, not just projected or estimated.
AbstractPhil/geolip-procrustes

I encourage EVERYONE who is curious to check my work. Check it, double check it, and triple check it.

These were aligned using COCO and then validated with Flickr. Entirely different datasets. The experts arbitrated and the alignment yielded the correct answers. Preliminary tests show that with almost no alignment requirement, the models can reach 100% R1 retrieval accuracy.

Not to be confused with validation accuracy for a classification model or a text encoder's text response, this allows multispectral communication between entirely different models for direct downstream consumption with almost no training for the chosen models.

I have a working procrustes experiment that learns adjacent manifolds within a reasonable spectrum and the speed is... well, 1 epoch with COCO using Bert-Large and DinoV2 that allows the models to align nearly perfectly. For some scales in the experiment it shows that the 3 set epochs aren't quite enough to align R1 to highest, while many align nearly immediately.

These two were an obvious pair to pick, 60% similarity and >90% spectral similarity.

The trainer transfers layers, learns embeddings, and more - all by sticking strictly to geometric boundaries and procrustes informational accumulation within a modulation model's constraints.

I have many experiments to run.

1 reply

AbstractPhil

posted an update about 1 month ago

Post

270

The small projection-based approximator model for the geolip patchwork did not breach a certain level of accuracy as required by my specifications, so I've defaulted to harvesting direct geometric information from AI models until I get the comparative bounds required for a useful topology.

I must sincerely apologize for not solving this problem quickly.

This will take time. Without the approximator it's going to be considerably slower, but this model I begin training will be providing the approximations in a different way over time. As iterations progress, the system will conform to a huge array of geometric potentials and be capable at predicting those, but it will not be as powerful as the full patchmaker up front, and it will be slow training.

If I can get my hands on a cluster of A100's or H100's for a measure I'll make a post immediately, until then I must default to the slower process.

I really banked that the smaller version would have worked, but it simply couldn't hold complex topological shape without the correct boundaries being learnable AND endure entropic decay simultaneously. The only way to have a predominant shot at a full geometric shared language, is to make those boundaries learnable in the full spectrum of potentials, or at least more than I have placed on it.

I'll be refining my process in the coming days further, and I do apologize for pre-emptively announcing a potential that I have yet to fully explore.

There will be a full upgraded 38 shape geolip patchwork trained asap to fully encompass the Flux 1 AE spectrum, and another trained for SD15, SDXL, and Flux 2's VAE as well. These will accommodate DIRECT complex geometric patchwork learning, but not to the scale as promised yet. Autoregression is a complex mistress as many of you know, and I will be spending a great deal of time and compute analyzing all of the information required to build a uniformly useful and powerful autoregression patchwork to utilize as invariance to teaching.

2 replies

AI & ML interests

Recent Activity

Papers

Team members 142

deepghs's activity