Tag: image-generation

35 discussions across 10 posts tagged "image-generation".

AI Signal - April 14, 2026

Free Open-Source Tool to Instantly Rig and Animate Your Illustrations (Also With Mesh Deform) r/StableDiffusion Score: 1226

The `see-through` model — released the week prior — decomposes a single static anime image into 23 separate layers for rigging. The author built an open-source tool on top of it that handles mesh deformation and animation, eliminating the need for expensive manual rigging. This makes professional-quality 2D character animation accessible without specialized software or large budgets. 0.98 upvote ratio on 81 comments.

#image-generation #open-source
Forget About VAEs? SenseNova's NEO-unify Achieves 31.5 PSNR Without an Encoder — Native Image Gen Is Coming r/StableDiffusion Score: 247

SenseNova's NEO-unify model operates directly on pixels without the conventional CLIP + VAE + diffusion architecture that has defined image generation since Stable Diffusion 1.0. It achieves 31.5 PSNR — a strong reconstruction quality score — eliminating the VAE bottleneck that causes color shift, detail loss, and latent space artifacts. If this architecture proves scalable, it could fundamentally change how image generation models are built.

#image-generation
Update: Distilled v1.1 Is Live (LTX-2.3) r/StableDiffusion Score: 518

LTX-2.3's distilled model gets a v1.1 checkpoint with improved audio quality and refined visual aesthetics. Updated ComfyUI workflows included. The 0.99 upvote ratio on 115 comments indicates this is a clean, uncontroversial improvement release. The companion post ([#29](/tags/29/)) provides a quantitative before/after comparison showing the audio mumbling issue from v1.0 is addressed.

#image-generation #open-source
ERNIE Image Released r/StableDiffusion Score: 168

Baidu released ERNIE Image and ERNIE Image Turbo on HuggingFace (baidu/ERNIE-Image and baidu/ERNIE-Image-Turbo). Low score but 88 comments and a 0.99 upvote ratio suggest genuine community interest. Another Chinese lab entering the open image generation space, worth tracking as a comparison point to FLUX and SD3.

#image-generation #open-source
LTX Distilled LoRA 1.1 vs. 1.0 Comparison r/StableDiffusion Score: 263

Side-by-side video comparison using identical settings and seeds, showing v1.1's improved audio output over v1.0's mumbling first-stage results. Provides the empirical before/after that complements the official release announcement ([#22](/tags/22/)). Useful for practitioners deciding whether to upgrade.

#image-generation

AI Signal - April 07, 2026

FLUX.2 [dev] works really well in ComfyUI now r/StableDiffusion Score: 254

ComfyUI's new low-VRAM optimizations enable FLUX.2 [dev] to run on consumer GPUs (RTX 4060Ti 16GB). While slower than Klein (75s vs 15s), it achieves superior character consistency across all open-weight image generation models.

#image-generation #open-source
Flux2Klein EXACT Preservation (No Lora needed) r/StableDiffusion Score: 254

ComfyUI-Flux2Klein-Enhancer node pack achieves exact character preservation without LoRA training by improving prompt adherence and style consistency. Demonstrates architectural improvements to FLUX.2 Klein's capabilities through better node configurations.

#image-generation #open-source
Ace-step 1.5XL's already up r/StableDiffusion Score: 98

Ace-step v1.5 XL released with ComfyUI support in nightly builds. Multiple variants available (turbo, merge, SFT) optimized for different speed/quality tradeoffs in image generation workflows.

#image-generation #open-source

AI Signal - March 24, 2026

daVinci-MagiHuman: This new opensource video model beats LTX 2.3 r/StableDiffusion Score: 359

New 15B open-source Audio-Video model from GAIR claiming to beat LTX 2.3. Expanding capabilities for local video generation with audio synchronization.

#image-generation #open-source

AI Signal - March 17, 2026

Showing real capability of LTX loras! Dispatch LTX 2.3 LORA with multiple characters + style r/StableDiffusion Score: 751

Impressive demonstration of LTX 2.3 LORA training with 440 clips from the game Dispatch, achieving multiple character and style preservation in text-to-video generation. The training included 6+ characters with distinct voices and game aesthetics. Shows progress in controllable video generation with LoRA fine-tuning.

#image-generation #open-source

AI Signal - March 10, 2026

ComfyUI launches App Mode and ComfyHub r/StableDiffusion Score: 334

ComfyUI introduced App Mode (internally called "comfyui 1111"), which transforms complex workflows into simple, shareable UIs. Users can select input parameters and create web UI-like interfaces from any workflow. ComfyHub provides a centralized workflow repository, lowering the barrier to entry for non-technical users while preserving ComfyUI's node-based power for advanced users.

#image-generation #open-source

AI Signal - February 24, 2026

ZIB vs ZIT vs Flux 2 Klein r/StableDiffusion Score: 250

Comprehensive comparison of Z-image Base, Z-image Turbo, and Flux 2 Klein across different prompt complexities and qualities. Tests both high-quality long prompts (overall generation quality) and short/low-quality prompts (creative gap-filling ability). Provides detailed visual comparisons and analysis of each model's strengths and weaknesses.

#image-generation #open-source
Just with a single prompt and this result is insane for first attempt in Seedance 2.0 r/singularity Score: 2841

User generated impressive Transformers-style video (plane transforming into robot and attacking city) using Seedance 2.0 with single Chinese prompt. The video shows Hollywood-level visual effects, mechanical detail, physics simulation, and destruction effects—all from one text prompt. This demonstrates rapid progress in video generation quality and complexity.

#image-generation
I created this time travel short scene using Seedance 2.0 in just one day for under $200. r/ChatGPT Score: 2129

Creator produced polished time travel short film using Seedance 2.0 in one day for under $200. Demonstrates accessibility of high-quality video generation for independent creators and rapid iteration capabilities. The speed and cost represent orders of magnitude improvement over traditional video production.

#image-generation

AI Signal - February 10, 2026

Qwen-Image-2.0 is out - 7B unified gen+edit model with native 2K and actual text rendering r/LocalLLaMA Score: 327

Qwen's new 7B image model combines generation and editing in a single pipeline with native 2K resolution and improved text rendering. Currently API-only but likely to receive open-weight release based on Qwen's track record with v1.

#image-generation #local-models
Simple, Effective and Fast Z-Image Headswap for characters V1 r/StableDiffusion Score: 1257

Workflow for character headswapping in Stable Diffusion with minimal variables to adjust. The simplicity and effectiveness make it accessible for users wanting consistent character transfer across images.

#image-generation
Seedance 2.0 Generates Realistic 1v1 Basketball Against Lebron Video r/singularity Score: 1999

Video generation showing dramatic improvements in physics simulation, body dynamics, and cloth simulation. Marks a significant step forward from models that struggled with acrobatic movements and realistic physics.

#image-generation
I asked AI to remodel my ugly apartment kitchen, then did it in real life...(photos) r/ChatGPT Score: 6255

Practical application of AI image generation for real-world design decisions, followed through to actual implementation. Demonstrates the practical utility of AI tools for visualization and planning.

#image-generation
Coloring Book Qwen Image Edit LoRA r/StableDiffusion Score: 357

LoRA trained for Qwen-Image-Edit that converts photographic scenes into coloring book art with high precision. Created as part of a Tongyi Lab + ModelScope hackathon with full training walkthrough available.

#image-generation
Did creativity die with SD 1.5? r/StableDiffusion Score: 373

Discussion lamenting the shift from artistic experimentation in early Stable Diffusion to current focus on photorealism. Questions whether AI art has become over-trained and market-driven rather than exploratory.

#image-generation

AI Signal - February 03, 2026

Qwen-Image2512 is a severely underrated model (realism examples) r/StableDiffusion Score: 889

Qwen-Image2512 delivers exceptional realism and responds particularly well to LoRAs, yet receives less attention than ZIT or Klein in community discussions. Users report it excels at realistic image generation and general refining tasks, offering quality that rivals more hyped alternatives.

#image-generation #open-source
Z-Image Edit is basically already here, but it is called LongCat r/StableDiffusion Score: 123

While the community awaits Alibaba's Z-Image Edit, Meituan's LongCat ecosystem offers comparable image editing capabilities now. LongCat uses a larger vision-language encoder (Qwen 2.5-VL 7B vs Z-Image's Qwen 3 4B), enabling the model to actually see and understand images during editing tasks, not just text descriptions.

#image-generation #open-source
New fire just dropped: ComfyUI-CacheDiT ⚡ r/StableDiffusion Score: 286

ComfyUI-CacheDiT delivers 1.4-1.6x speedup for Diffusion Transformer models through intelligent residual caching with zero configuration required. The optimization works transparently across DiT models with minimal quality impact, representing the kind of practical performance optimization that compounds across the ecosystem.

#image-generation #development-tools
New Anime Model, Anima is Amazing. Can't wait for the full release r/StableDiffusion Score: 360

Anima, a new anime-focused image generation model, shows impressive artist style recognition that users prefer over established alternatives like Illustrious or Pony. The model demonstrates strong prompt adherence and authentic style reproduction, though it's currently just a preview with the full trained version pending release.

#image-generation #open-source

AI Signal - January 27, 2026

Z-Image Base Model Released by Alibaba r/StableDiffusion Score: 366

Alibaba's Tongyi-MAI released Z-Image base model on HuggingFace with official ComfyUI support merged within hours. The model represents a new generation of open image generation, with the community rapidly integrating it into existing workflows.

#image-generation #open-source
LTX-2 Image-to-Video Adapter LoRA r/StableDiffusion Score: 275

High-rank LoRA adapter for LTX-Video 2 that substantially improves image-to-video generation quality. Direct image embedding pipeline without complex workflows, preprocessing, or compression tricks. Addresses reliability issues with base model's image-to-video capabilities.

#image-generation #open-source
Lazy weekend with flux2 klein edit - lighting experiments r/StableDiffusion Score: 876

User tested Flux2 Klein's lighting capabilities by feeding the official prompting guide into an LLM to generate varied benchmark prompts. Lighting has the single greatest impact on Klein output quality, requiring photographer-style descriptions rather than generic terms.

#image-generation
Anyone else feel this way about StableDiffusion workflows? r/StableDiffusion Score: 589

Argument that output quality issues are about settings, not workflows. Good prompts + good settings + high resolution + patience = great output. Lock seed and perform parameter search on CFG, model shift, LoRA strength. ComfyUI isn't scary - build incrementally with clean, modular nodes.

#image-generation

AI Signal - January 20, 2026

🧠💥 My HomeLab GPU Cluster – 12× RTX 5090, AI / K8s / Self-Hosted Everything r/StableDiffusion Score: 901

An impressive self-hosted GPU cluster featuring 12 RTX 5090s (1.5TB+ VRAM total) across 6 machines running Kubernetes with GPU scheduling. Built for AI/LLM inference, training, image/video generation, and self-hosted APIs—a glimpse into serious local AI infrastructure.

#local-models #self-hosted #image-generation
LTX 2 is amazing : LTX-2 in ComfyUI on RTX 3060 12GB r/StableDiffusion Score: 956

LTX-2 video generation running successfully on modest consumer hardware (RTX 3060 12GB). The creator produced coherent spy story scenes with cyberpunk aesthetic, demonstrating that high-quality video generation is accessible without datacenter GPUs.

#image-generation #local-models
LTX-2 Updates r/StableDiffusion Score: 848

The LTX-2 team releases improvements based on community feedback just two weeks after launch. The post highlights rapid iteration cycles, community engagement through configurations/LoRAs shared across Discord and Civitai, and the value of responsive open-source development.

#image-generation #open-source
How to generate proper Japanese in LTX-2 r/StableDiffusion Score: 484

A technical deep-dive into generating authentic Japanese audio with LTX-2 video generation. The author tests whether the model can produce real Japanese (not gibberish), shares successful workflows, and provides practical guidance for multilingual content generation.

#image-generation
Flux.2 Klein (Distilled)/ComfyUI - Use "File-Level" prompts to boost quality while maintaining max fidelity r/StableDiffusion Score: 195

A clever prompting technique for Flux 2 Klein: using "file-level" technical prompts (e.g., "sharpen edges," "increase local contrast") instead of descriptive prompts prevents the model from hallucinating new faces when upscaling/restoring old photos.

#image-generation
Flux Klein gives me SD3 vibes r/StableDiffusion Score: 113

A critique comparing Flux2 Klein's text-to-image quality unfavorably to Z Image Turbo, particularly for difficult poses which result in "body horror almost every time." While Flux2's editing ability is praised, this raises concerns about the distilled model's image generation quality.

#image-generation
Last week in Image & Video Generation r/StableDiffusion Score: 226

A curated weekly roundup of open-source image and video generation highlights, including FLUX.2 Klein release, LTX-2 updates, and other multimodal AI developments. Useful digest for staying current without scrolling through everything.

#image-generation #open-source