IQA-Adapter: Exploring Knowledge Transfer from Image Quality Assessment to Diffusion-based Generative Models

1 MSU Institute for Artificial Intelligence
2 ISP RAS Research Center for Trusted Artificial Intelligence
3 Lomonosov Moscow State University
4 Yandex
MY ALT TEXT

Abstract

Diffusion-based models have recently transformed conditional image generation, achieving unprecedented fidelity in generating photorealistic and semantically accurate images. However, consistently generating high-quality images remains challenging, partly due to the lack of mechanisms for conditioning outputs on perceptual quality. In this work, we propose methods to integrate image quality assessment (IQA) models into diffusion-based generators, enabling quality-aware image generation.
    First, we experiment with gradient-based guidance to optimize image quality directly and show this approach has limited generalizability. To address this, we introduce IQA-Adapter , a novel architecture that conditions generation on target quality levels by learning the relationship between images and quality scores. When conditioned on high target quality, IQA-Adapter shifts the distribution of generated images towards a higher-quality subdomain. This approach achieves up to a 10% improvement across multiple objective metrics, as confirmed by a subjective study, while preserving generative diversity and content. Additionally, IQA-Adapter can be used inversely as a degradation model, generating progressively more distorted images when conditioned on lower quality scores. Our quality-aware methods also provide insights into the adversarial robustness of IQA models, underscoring the potential of quality conditioning in generative modeling and the importance of robust IQA methods.

TLDR

IQA-Adapter is a tool that combines Image Quality/Aesthetics Assessment (IQA/IAA) models with image-generation and enables quality-aware generation with diffusion-based models. It allows to condition image generators on target quality/aesthetics scores. IQA-Adapter is based on IP-Adapter architecture.

Architecture

We use the IP-Adapter technique to condition the generative model on image quality by integrating visual quality scores into the model without altering core weights, implemented as the IQA-Adapter. The IQA-Adapter projects these scores into tokens processed through decoupled cross-attention layers, enabling adjustments based on specified quality scores. It can integrate multiple image fidelity aspects and includes a scaling parameter to control quality conditioning impact during inference.


Overall architecture of the proposed IQA-Adapter. It allows to condition
diffusion model on outputs of IQA model.

High-quality conditioning

The results depicted in Figure (a) show that IQA-Adapters trained on various IQA/IAA models consistently improve image quality over the base model, with average relative gains of 4-6%. Conditioning on high target quality (99th percentile) leads to quality improvements across multiple metrics, demonstrating cross-metric transferability. The subjective study in Figure (b) further confirms these results, where participants preferred images generated with quality-conditioned IQA-Adapters over those from the base model, especially at higher quality levels. Pairwise win rates in Figure (c) indicate that images conditioned on the highest quality levels outperform those conditioned on medium or low levels. Thus, IQA-Adapters can be used for improving both objective and perceived image quality.


(a) IQA-Adapters conditioned on 99th-percentile metrics achieve
consistent quality improvements across models and metrics.
(b, c) Subjective study results show higher win rates and preference
for images generated with higher quality conditioning.

Alignment with qualitative condition

Figure illustrates the impact of varying quality levels on generated images using the IQA-Adapter conditioned on different percentiles (1st to 99th) of the target quality metric. The distributions in (a) show progressively higher IQA scores as the target quality increases, while (b) provides example images, showcasing sharper and more detailed visuals at higher quality levels. These results highlight the adapter's ability to align generated image quality with the specified input conditions.


Generated image quality improves with higher percentile conditioning,
showing sharper details and higher IQA scores.

Artifacts and Biases

We uncover how IQA models can be pushed to their limits, revealing hidden vulnerabilities and biases. Under high guidance scales, the gradient-based method manipulates models to generate adversarial patterns unique to each, while IQA-Adapters expose subtle preferences like TOPIQ’s love for sharpness or LAION-AES’s affinity for vibrant colors. When pushed further with negative-quality guidance, these models inflate scores by exploiting biases, creating overly-stylized but hollow improvements. These discoveries show how IQA-Adapters can serve as powerful tools to probe and challenge the robustness of IQA models.


The gradient-based method produces adversarial artifacts
unique to each IQA model, while the IQA-Adapter reveals style biases,
such as sharpness or high saturation.

BibTeX

                    
@misc{iqaadapter,
  title={IQA-Adapter: Exploring Knowledge Transfer from Image Quality Assessment to Diffusion-based Generative Models}, 
  author={Khaled Abud and Sergey Lavrushkin and Alexey Kirillov and Dmitriy Vatolin},
  year={2024},
  eprint={2412.01794},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2412.01794}, 
}