diffusers庫的目標是:
diffusers的核心分成三個元件:
pip install diffusers
匯入Pipeline,from_pretrained()
載入模型,可以是本地模型,或從the Hugging Face Hub自動下載。
from diffusers import StableDiffusionPipeline
image_pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")
# 載入本地模型:
# image_pipe = StableDiffusionPipeline.from_pretrained("./models/Stablediffusion/stable-diffusion-v1-4")
image_pipe.to("cuda")
prompt = "a photograph of an astronaut riding a horse"
pipe_out = image_pipe(prompt)
image = pipe_out.images[0]
# you can save the image with
# image.save(f"astronaut_rides_horse.png")
我們檢視下image_pipe
的內容:
StableDiffusionPipeline {
"_class_name": "StableDiffusionPipeline",
"_diffusers_version": "0.10.2",
"feature_extractor": [
"transformers",
"CLIPFeatureExtractor"
],
"requires_safety_checker": true,
"safety_checker": [
"stable_diffusion",
"StableDiffusionSafetyChecker"
],
"scheduler": [
"diffusers",
"PNDMScheduler"
],
"text_encoder": [
"transformers",
"CLIPTextModel"
],
"tokenizer": [
"transformers",
"CLIPTokenizer"
],
"unet": [
"diffusers",
"UNet2DConditionModel"
],
"vae": [
"diffusers",
"AutoencoderKL"
]
}
檢視Images的結構:
StableDiffusionPipelineOutput(
images=[<PIL.Image.Image image mode=RGB size=512x512 at 0x1A14BDD7730>],
nsfw_content_detected=[False])
由此,可以看到pipe_out
的包含兩部分,第一部分就是生成的圖片列表,如果只有一張圖片,則pipe_out.images[0]
即可取出目標影象。
如果我們要一次生成多張影象呢?只需要修改prompt的list長度即可,程式碼如下。
from diffusers import StableDiffusionPipeline
image_pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")
image_pipe.to("cuda")
prompt = ["a photograph of an astronaut riding a horse"] * 3
out_images = image_pipe(prompt).images
for i, out_image in enumerate(out_images):
out_image.save("astronaut_rides_horse" + str(i) + ".png")
在使用image_pipe
生成影象時,預設是float32
精度的,若本地現在不足,可能會報Out of memory
的錯誤,此時,可以通過載入float16
精度的模型來解決。
Note: If you are limited by GPU memory and have less than 10GB of GPU RAM available, please make sure to load the
StableDiffusionPipeline
in float16 precision instead of the default float32 precision as done above.You can do so by loading the weights from the
fp16
branch and by tellingdiffusers
to expect the weights to be in float16 precision:image_pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", revision="fp16", torch_dtype=torch.float16)
對於每個PipeLine
都有一些特定的設定,如StableDiffusionPipeline
除了必要的prompt
引數,還可以設定如下引數:
num_inference_steps: int = 50
guidance_scale: float = 7.5
generator: Optional[torch.Generator] = None
範例:如果你想要每次得到的結果均一致,可以設定每次的種子都一樣
generator = torch.Generator("cuda").manual_seed(1024)
prompt = ["a photograph of an astronaut riding a horse"] * 3
out_images = image_pipe(prompt, generator=generator).images