Diffusers庫的初識及使用

diffusers庫的目標是：

將擴散模型（diffusion models）集中到一個單一且長期維護的專案中
以公眾可存取的方式復現高影響力的機器學習系統，如DALLE、Imagen等
讓開發人員可以很容易地使用API進行模型訓練或者使用現有模型進行推理

diffusers的核心分成三個元件：

Pipelines: 高層類，以一種使用者友好的方式，基於流行的擴散模型快速生成樣本
Models：訓練新擴散模型的流行架構，如UNet
Schedulers：推理場景下基於噪聲生成影象或訓練場景下基於噪聲生成帶噪影象的各種技術

diffusers的安裝

pip install diffusers

先看推理

匯入Pipeline，from_pretrained()載入模型，可以是本地模型，或從the Hugging Face Hub自動下載。

from diffusers import StableDiffusionPipeline

image_pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")
# 載入本地模型：
# image_pipe = StableDiffusionPipeline.from_pretrained("./models/Stablediffusion/stable-diffusion-v1-4")
image_pipe.to("cuda")

prompt = "a photograph of an astronaut riding a horse"
pipe_out = image_pipe(prompt)

image = pipe_out.images[0]
# you can save the image with
# image.save(f"astronaut_rides_horse.png")

我們檢視下image_pipe的內容：

StableDiffusionPipeline {
  "_class_name": "StableDiffusionPipeline",
  "_diffusers_version": "0.10.2",
  "feature_extractor": [
    "transformers",
    "CLIPFeatureExtractor"
  ],
  "requires_safety_checker": true,
  "safety_checker": [
    "stable_diffusion",
    "StableDiffusionSafetyChecker"
  ],
  "scheduler": [
    "diffusers",
    "PNDMScheduler"
  ],
  "text_encoder": [
    "transformers",
    "CLIPTextModel"
  ],
  "tokenizer": [
    "transformers",
    "CLIPTokenizer"
  ],
  "unet": [
    "diffusers",
    "UNet2DConditionModel"
  ],
  "vae": [
    "diffusers",
    "AutoencoderKL"
  ]
}

檢視Images的結構：

StableDiffusionPipelineOutput(
images=[<PIL.Image.Image image mode=RGB size=512x512 at 0x1A14BDD7730>], 
nsfw_content_detected=[False])

由此，可以看到pipe_out的包含兩部分，第一部分就是生成的圖片列表，如果只有一張圖片，則pipe_out.images[0]即可取出目標影象。

如果我們要一次生成多張影象呢？只需要修改prompt的list長度即可，程式碼如下。

from diffusers import StableDiffusionPipeline

image_pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")

image_pipe.to("cuda")
prompt = ["a photograph of an astronaut riding a horse"] * 3
out_images = image_pipe(prompt).images
for i, out_image in enumerate(out_images):
    out_image.save("astronaut_rides_horse" + str(i) + ".png")

在使用image_pipe生成影象時，預設是float32精度的，若本地現在不足，可能會報Out of memory的錯誤，此時，可以通過載入float16精度的模型來解決。

Note: If you are limited by GPU memory and have less than 10GB of GPU RAM available, please make sure to load the StableDiffusionPipeline in float16 precision instead of the default float32 precision as done above.

You can do so by loading the weights from the fp16 branch and by telling diffusers to expect the weights to be in float16 precision:
image_pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", revision="fp16", torch_dtype=torch.float16)

對於每個PipeLine都有一些特定的設定，如StableDiffusionPipeline除了必要的prompt引數，還可以設定如下引數：

num_inference_steps: int = 50
guidance_scale: float = 7.5
generator: Optional[torch.Generator] = None
等等

範例：如果你想要每次得到的結果均一致，可以設定每次的種子都一樣

generator = torch.Generator("cuda").manual_seed(1024)
prompt = ["a photograph of an astronaut riding a horse"] * 3
out_images = image_pipe(prompt, generator=generator).images

Diffusers庫的初識及使用

diffusers的安裝

先看推理

再看訓練