--- license: apache-2.0 language: - en - zh base_model: stabilityai/stable-diffusion-xl-base-1.0 library_name: diffusers tags: - Text-to-Image - SDXL - Stable Diffusion - ID Customization pipeline_tag: text-to-image ---
alibaba alimama
[中文版Readme](./README_ZH.md) EcomID aims to generate customized images from a single reference ID image, ensuring strong semantic consistency while being controlled by keypoints. This repository provides the EcomID method and model, combining the strengths of [PuLID](https://github.com/ToTheBeginning/PuLID) and [InstantID](https://github.com/instantX-research/InstantID) for better background consistency, facial keypoint control, and realistic facial representation with improved similarity. # EcomID Overview ## EcomID Structure alibaba - **IP-Adapter of PuLID**: EcomID incorporates the ID-Encoder and cross-attention components from PuLID, trained with alignment loss. This method effectively reduces the interference of ID embeddings on text embeddings within the cross-attention part, minimizing disruption to the underlying model's text-to-image capabilities. - **InstantID’s IdentityNet Architecture**: Utilizing **a dataset of 2 million aesthetically pleasing portrait images**, IdentityNet enhances keypoint control, improving ID consistency and facial realism. During training, the IP-adapter is frozen, and only the IdentityNet is trained. Facial landmarks are used as conditional inputs, while face embeddings are integrated into IdentityNet via cross-attention. # Show Cases ## Comparison with Other Methods ### 1、Preserved Text-to-Image Capability
Prompt Reference Image EcomID InstantID
girl, white skin, black hair, long wavy hair, in European style living room, Retro tone, decorations, depth of field. 参考图像 EcomID图像 InstantID图像
As shown above, EcomID ***preserves background generation abilities while minimizing stylization, greatly enhancing realism***. The visualizations highlight more authentic portraits with improved background semantic consistency, showcasing EcomID's advantage in generating realistic images. ### 2、Improved Facial Control and Consistency
Prompt Reference Image EcomID InstantID PuLID
A close-up portrait of a man standing in the library, holding two smiling toddlers next to him. 参考图像 EcomID图像 InstantID图像 PuLID图像
As shown above, EcomID employs keypoints as conditional inputs for training, ***allowing for precise adjustments of facial positions, sizes, and orientations***. This capability ensures that the generated portraits are more controllable while further enhancing facial similarity and the overall quality of the images. ### More showcases EcomID enhances portrait representation, delivering a more authentic and aesthetically pleasing appearance while ensuring semantic consistency and greater internal ID similarity (i.e., traits that do not vary with age, hairstyle, glasses, or other physical changes).
Prompt Reference Image EcomID InstantID PuLID
A close-up portrait of a little girl with double braids, wearing a white dress, standing on the beach during sunset. 参考图像 EcomID图像 InstantID图像 PuLID图像
A close-up portrait of a very little girl with double braids, wearing a hat and white dress, standing on the beach during sunset. 参考图像 EcomID图像 InstantID图像 PuLID图像
Agrizzled detective, fedora casting a shadow over his square jaw, a cigar dangling from his lips, his trench coat evocative of film noir, in a rainy alley. 参考图像 EcomID图像 InstantID图像 PuLID图像
A smiling girl with bangs and long hair in a school uniform stands under cherry trees, holding a book. 参考图像 EcomID图像 InstantID图像 PuLID图像
A very old witch, wearing a black cloak, with a pointed hat, holding a magic wand, against a background of a misty forest. 参考图像 EcomID图像 InstantID图像 PuLID图像
A man clad in cyberpunk fashion: neon accents, reflective sunglasses, and a leather jacket with glowing circuit patterns. He stands stoically amidst a soaked cityscape. 参考图像 EcomID图像 InstantID图像 PuLID图像
### More Base Models, Resolutions, and Styles
SDXL models Prompt Reference Image EcomID InstantID PuLID
sd-xl-base-1.0 girl, solo, brown hair, holding a little teddy bear on her hands, wearing a school uniform, standing in the library, cartoon style. 参考图像 EcomID图像 InstantID图像 PuLID图像
EcomXL A close-up portrait of a very little girl with double braids, wearing a hat and white dress, standing on the beach during sunset. 参考图像 EcomID图像 InstantID图像 PuLID图像
DreamShaperXL solo, looking_at_viewer, smile, brown_hair, upper_body, open_clothes, teeth, open_jacket, black_jacket, blurry_background, realistic 参考图像 EcomID图像 InstantID图像 PuLID图像
leosam_xl_v7 A close-up portrait of a girl, solo, dress, jewelry, beach and sea, pink_dress, realistic. 参考图像 EcomID图像 InstantID图像 PuLID图像
### Notes - Unless otherwise specified, the showcases are generated using the base model EcomXL, which is also highly compatible with various other SDXL-based models, such as [leosams-helloworld-xl](https://civitai.com/models/43977/leosams-helloworld-xl), [dreamshaper-xl](https://civitai.com/models/112902/dreamshaper-xl), [stable-diffusion-xl-base-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) and so on. - It works very well with SDXL Turbo/Lighting, [EcomXL Inpainting ControlNet](https://huggingface.co/alimama-creative/EcomXL_controlnet_inpaint) and [EcomXL Softedge ControlNet](https://huggingface.co/alimama-creative/EcomXL_controlnet_softedge). # How to use ## ComfyUI - The EcomID_ComfyUI node has been released: [click here](https://github.com/alimama-creative/SDXL_EcomID_ComfyUI) # Training Details The model is trained on 2M Taobao images, where the proportion of human faces is greater than 3%. The images have a resolution greater than 800, and the aesthetic score is above 5.5. Mixed precision: fp16 Learning rate: 1e-4 Batch size: 2 Image size: 1024x1024