Model card for Latent Zoning Networks

Model details

Model description

Generative modeling, representation learning, and classification are three core problems in machine learning (ML), yet their state-of-the-art (SoTA) solutions remain largely disjoint. In this paper, we ask: Can a unified principle address all three? Such unification could simplify ML pipelines and foster greater synergy across tasks. We introduce Latent Zoning Network (LZN) as a step toward this goal. At its core, LZN creates a shared Gaussian latent space that encodes information across all tasks. Each data type (e.g., images, text, labels) is equipped with an encoder that maps samples to disjoint latent zones, and a decoder that maps latents back to data. ML tasks are expressed as compositions of these encoders and decoders: for example, label-conditional image generation uses a label encoder and image decoder; image embedding uses an image encoder; classification uses an image encoder and label decoder. We demonstrate the promise of LZN in three increasingly complex scenarios: (1) LZN can enhance existing models (image generation): When combined with the SoTA Rectified Flow model, LZN improves FID on CIFAR10 from 2.76 to 2.59—without modifying the training objective. (2) LZN can solve tasks independently (representation learning): LZN can implement unsupervised representation learning without auxiliary loss functions, outperforming the seminal MoCo and SimCLR methods by 9.3% and 0.2%, respectively, on downstream linear classification on ImageNet. (3) LZN can solve multiple tasks simultaneously (joint generation and classification): With image and label encoders/decoders, LZN performs both tasks jointly by design, improving FID and achieving SoTA classification accuracy on CIFAR10.

The list of the released models are:

Image generation on AFHQ Cat dataset
Image embedding on ImageNet dataset

The models are trained from scratch.

Key information

Developed by: Zinan Lin
Model type: Image generation models and image embedding models
Language(s): The models do NOT have text input or output capabilities
License: MIT

Model sources

Model repository: https://huggingface.co/microsoft/latent-zoning-networks
Code repository: https://github.com/microsoft/latent-zoning-networks
Paper: https://arxiv.org/abs/2509.15591

Uses

Direct intended uses

Image generation models: We currently only release the unconditional image generation model for the AFHQ Cat dataset. Therefore, the model does not require any input such as class conditions. The model can generate new images similar to the training set.
Image embedding models: Given an image, the model can give the embedding (i.e., a vector of float numbers) of the image.

The released models do not currently have real-world applications. It is being shared with the research community to facilitate reproduction of our results and foster further research in this area.

Out-of-scope uses

These models do NOT have text-conditioned image generation capabilities, and cannot generate anything beyond images. We do not recommend using the models in commercial or real-world applications without further testing and development. It is being released for research purposes.

Risks, limitations, and mitigation

The quality of generated images is not perfect and might contain artifacts such as blurry or unrecognizable objects. If users find failure cases of the models, please contact us and we will update the arXiv paper to report such failure cases. If the models have severe and unexpected issues, we will remove the models from HuggingFace.

These models inherit any biases, errors, or omissions characteristic of their training data, which may be amplified by any AI-generated interpretations.

We used two specific datasets to demonstrate our technique for training image generation and embedding models. If users/developers wish to test our technique using other datasets, it is their responsibility to source those datasets legally and ethically. This could include securing appropriate rights, ensuring consent for the use of images, and/or the anonymization of data prior to use. Users are reminded to be mindful of data privacy concerns and comply with relevant data protection regulations and organizational policies.

How to get started with the model

Please see the GitHub repo for instructions: https://github.com/microsoft/latent-zoning-networks

Training details

Training data

Image generation: AFHQ Cat dataset https://github.com/clovaai/stargan-v2/blob/master/README.md#animal-faces-hq-dataset-afhq
Image embedding: ImageNet dataset http://www.image-net.org/

Training procedure

Preprocessing

Please see the paper for details: https://arxiv.org/abs/2509.15591

Training hyperparameters

Please see the paper for details: https://arxiv.org/abs/2509.15591

Speeds, sizes, times

Please see the paper for details: https://arxiv.org/abs/2509.15591

Evaluation

Testing data, factors, and metrics

Testing data

Image generation: AFHQ Cat dataset
Image embedding: ImageNet dataset

Metrics

Image generation: Image quality metrics including FID, Inception Score, Precision, Recall
Image embedding: Downstream image classification accuracy

Evaluation results

Image generation: The image quality of Latent Zoning Network models are better than the baselines. For example, on the AFHQ Cat dataset, latent zoning networks improve the FID, sFID, IS, Precision, Recall, and Reconstruction from 6.08, 49.60, 1.80, 0.86, 0.28, 17.92 to 5.68, 49.32, 1.96, 0.87, 0.30, 10.29, respectively.
Image embedding: The downstream image classification accuracy of Latent Zoning Network is on par with state-of-the-art approaches. For example, we train a linear classifier on top of the embedding and evaluate its accuracy on the ImageNet test set. The accuracy of latent zoning networks is 69.5%, beating the seminal MoCo method by 9.3% and SimCLR by 0.2%.

Summary

Overall, the results demonstrate that the Latent Zoning Network is a viable, unified framework to address multiple machine learning problems.

License

MIT

Nothing disclosed here, including the Out of Scope Uses section, should be interpreted as or deemed a restriction or modification to the license the code is released under.

Citation

BibTeX:

@article{lin2025latent,
  title={Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification},
  author={Lin, Zinan and Liu, Enshu and Ning, Xuefei and Zhu, Junyi and Wang, Wenyu and Yekhanin, Sergey},
  journal={arXiv preprint arXiv:2509.15591},
  year={2025}
}

Model card contact

We welcome feedback and collaboration from our audience. If you have suggestions, questions, or observe unexpected/offensive behavior in our technology, please contact us at Zinan Lin, zinanlin@microsoft.com.

If the team receives reports of undesired behavior or identifies issues independently, we will update this repository with appropriate mitigations.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support