๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

AI/Computer Vision

[Computer Vision] DreamBooth

728x90
๋ฐ˜์‘ํ˜•
๐ŸŒˆ https://dreambooth.github.io/ ๋‚ด์šฉ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์ž‘์„ฑ๋œ ํฌ์ŠคํŠธ ์ž…๋‹ˆ๋‹ค...

 

๊ฐœ์š”

๋Œ€๊ทœ๋ชจ T2I ๋ชจ๋ธ์€ ์ฃผ์–ด์ง„ ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ์—์„œ ๊ณ ํ’ˆ์งˆ์˜ ๋‹ค์–‘ํ•œ ์ด๋ฏธ์ง€๋ฅผ ํ•ฉ์„ฑํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•จ์œผ๋กœ์จ AI ๋ฐœ์ „์— ๊ด„๋ชฉํ•  ๋งŒํ•œ ๋„์•ฝ์„ ์ด๋ฃจ์—ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ฃผ์–ด์ง„ ์ฐธ์กฐ ์ด๋ฏธ์ง€ ์„ธํŠธ์˜ ํ”ผ์‚ฌ์ฒด์˜ ๋ชจ์Šต์„ ๋ชจ๋ฐฉํ•˜๊ณ  ๋‹ค์–‘ํ•œ ๋งฅ๋ฝ์—์„œ ์ƒˆ๋กœ์šด ํ‘œํ˜„์„ ํ•ฉ์„ฑํ•˜๋Š” ๊ธฐ๋Šฅ์ด ๋ถ€์กฑํ•˜๋‹ค.

 

DreamBooth์—์„œ๋Š” T2I ํ™•์‚ฐ ๋ชจ๋ธ์˜ "๊ฐœ์ธํ™”"๋ฅผ ์œ„ํ•œ ์ƒˆ๋กœ์šด ์ ‘๊ทผ ๋ฐฉ์‹์„ ์ œ์‹œํ•œ๋‹ค.

ํ”ผ์‚ฌ์ฒด ์ด๋ฏธ์ง€ ๋ช‡ ์žฅ๋งŒ(3~5์žฅ) ์ž…๋ ฅํ•˜๋ฏ€๋กœ์จ ์‚ฌ์ „ ํ•™์Šต๋œ T2I ๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ •ํ•˜์—ฌ ํŠน์ • ํ”ผ์‚ฌ์ฒด์— ๊ณ ์œ  ์‹๋ณ„์ž๋ฅผ ๊ฒฐํ•ฉํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ํ•™์Šตํ•˜๋„๋ก ํ•œ๋‹ค. 

๊ณ ์œ  ์‹๋ณ„์ž๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋‹ค์–‘ํ•œ ์žฅ๋ฉด์—์„œ ๋งฅ๋ฝ์— ๋งž๋Š” ํ”ผ์‚ฌ์ฒด์˜ ์™„์ „ํžˆ ์ƒˆ๋กœ์šด ์‚ฌ์‹ค์ ์ธ ์ด๋ฏธ์ง€๋ฅผ ํ•ฉ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค.

์ƒˆ๋กœ์šด ์ž์ƒ์  ํด๋ž˜์Šค๋ณ„ ์‚ฌ์ „ ๋ณด์กด ์†์‹ค๊ณผ ํ•จ๊ป˜ ๋ชจ๋ธ์— ๋‚ด์žฅ๋œ ์‹œ๋งจํ‹ฑ ์‚ฌ์ „์„ ํ™œ์šฉํ•˜๋ฉด ์ฐธ์กฐ ์ด๋ฏธ์ง€์— ๋‚˜ํƒ€๋‚˜์ง€ ์•Š์€ "๋‹ค์–‘ํ•œ ์žฅ๋ฉด, ํฌ์ฆˆ, ์‹œ์•ผ ๋ฐ ์กฐ๋ช… ์กฐ๊ฑด"์—์„œ ํ”ผ์‚ฌ์ฒด๋ฅผ ํ•ฉ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค.

 

DreamBooth๋Š” "ํ”ผ์‚ฌ์ฒด์˜ ์ฃผ์š” ํŠน์ง•์„ ๋ณด์กด"ํ•˜๋ฉด์„œ ํ”ผ์‚ฌ์ฒด ์žฌ๋งฅ๋ฝํ™”, ํ…์ŠคํŠธ ๊ฐ€์ด๋“œ ๋ทฐ ํ•ฉ์„ฑ, ์™ธ๊ด€ ์ˆ˜์ •, ์˜ˆ์ˆ ์  ๋ Œ๋”๋ง ๋“ฑ ์ด์ „์— ๋ถˆ๊ฐ€๋Šฅํ–ˆ๋˜ ์—ฌ๋Ÿฌ ์ž‘์—…์— ์ ์šฉ๋œ๋‹ค.

 

 

๋ฐฐ๊ฒฝ

์ถœ์ฒ˜ : https://dreambooth.github.io/

 

์‹œ๊ณ„(Input Images ๊ทธ๋ฆผ)์™€ ๊ฐ™์€ ํŠน์ • ํ”ผ์‚ฌ์ฒด๊ฐ€ ์ฃผ์–ด์ง€๋ฉด ์ตœ์ฒจ๋‹จ T2I ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋‹ค์–‘ํ•œ ์ปจํ…์ŠคํŠธ์—์„œ ์ฃผ์š” ์‹œ๊ฐ์  ํŠน์ง•์„ ์ •ํ™•ํ•˜๊ฒŒ ์œ ์ง€ํ•˜๋ฉด์„œ ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ์€ ๋งค์šฐ ์–ด๋ ค์šด ์ž‘์—…์ด๋‹ค.

 

์‹œ๊ณ„์˜ ์™ธ๊ด€์— ๋Œ€ํ•œ ์ž์„ธํ•œ ์„ค๋ช…์ด ํฌํ•จ๋œ ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ(ex. "retro style yellow alarm clock with a white clock face and a yellow number three on the right part of the clock face in the jungle")๋ฅผ ์ˆ˜์‹ญ ๋ฒˆ ๋ฐ˜๋ณตํ•ด๋„ ์ด๋ฏธ์ง€ ๋ชจ๋ธ์€ ์ฃผ์š” ์‹œ๊ฐ์  ํŠน์ง•(Text-guided, Imagen)์„ ์žฌ๊ตฌ์„ฑ ํ•˜์ง€ ๋ชปํ•œ๋‹ค. 

๋˜ํ•œ, ํ…์ŠคํŠธ ์ž„๋ฒ ๋”ฉ์ด ๊ณต์œ  ์–ธ์–ด-์‹œ๊ฐ ๊ณต๊ฐ„์— ์œ„์น˜ํ•˜์—ฌ ์ด๋ฏธ์ง€์˜ ์˜๋ฏธ์  ๋ณ€ํ˜•์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋Š” ๋ชจ๋ธ(ex. DALL-E2 ๋“ฑ)๋„ ์ฃผ์–ด์ง„ ํ”ผ์‚ฌ์ฒด์˜ ๋ชจ์–‘์„ ์žฌ๊ตฌ์„ฑํ•˜๊ฑฐ๋‚˜ ๋ฌธ๋งฅ์„ ์ˆ˜์ •ํ•  ์ˆ˜ ์—†๋‹ค(Image-guided, DALL-E2).

 

๋ฐ˜๋ฉด, DreamBooth์˜ ์ ‘๊ทผ ๋ฐฉ์‹(Ours)์€ ์ƒˆ๋กœ์šด ๋งฅ๋ฝ์—์„œ ๋†’์€ ์ •ํ™•๋„๋กœ ์‹œ๊ณ„๋ฅผ ํ•ฉ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค(" a [V] clock in the jungle").

 

 

์ ‘๊ทผ๋ฒ•

์ถœ์ฒ˜ : https://dreambooth.github.io/

 

ํ”ผ์‚ฌ์ฒด "์˜ˆ์‹œ : ํŠน์ • ๊ฐœ"์™€ ํ•ด๋‹น ํด๋ž˜์Šค ์ด๋ฆ„ "์˜ˆ์‹œ : dog"์˜ ์ด๋ฏธ์ง€ ๋ช‡ ์žฅ(3 ~ 5์žฅ)์„ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„ ํ”ผ์‚ฌ์ฒด๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๊ณ ์œ  ์‹๋ณ„์ž๋ฅผ ์ธ์ฝ”๋”ฉํ•˜๋Š” fine-tuning ๋œ "๊ฐœ์ธํ™” T2I ๋ชจ๋ธ"์„ ๋ฐ˜ํ™˜ํ•œ๋‹ค.

๊ทธ ํ›„ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ถ”๋ก  ์‹œ ๊ณ ์œ  ์‹๋ณ„์ž(a [V] dog)๋ฅผ ๋‹ค๋ฅธ ๋ฌธ์žฅ์— ์ด์‹ํ•˜์—ฌ ๋‹ค์–‘ํ•œ ๋ฌธ๋งฅ์—์„œ ์ฃผ์ฒด๋ฅผ ํ•ฉ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค.

 

์ถœ์ฒ˜ : https://dreambooth.github.io/

 

ํ”ผ์‚ฌ์ฒด์— ๋Œ€ํ•œ 3~5๊ฐœ์˜ ์ด๋ฏธ์ง€๊ฐ€ ์ฃผ์–ด์ง€๋ฉด ๋‘ ๋‹จ๊ณ„๋กœ T2I ํ™•์‚ฐ์„ fine-tuning ํ•œ๋‹ค.

  1. ๊ณ ์œ  ์‹๋ณ„์ž์™€ ํ”ผ์‚ฌ์ฒด๊ฐ€ ์†ํ•œ ํด๋ž˜์Šค ์ด๋ฆ„(์˜ˆ: "a photo of a [T] dog ")์ด ํฌํ•จ๋œ ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ์™€ ์ง์„ ์ด๋ฃฌ ์ž…๋ ฅ ์ด๋ฏธ์ง€๋กœ ์ €ํ•ด์ƒ๋„ T2I ๋ชจ๋ธ์„ fine-tuningํ•˜๋Š” ๋™์‹œ์—, ๋ชจ๋ธ์ด ํด๋ž˜์Šค์— ๋Œ€ํ•ด ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ์˜๋ฏธ์  ์‚ฌ์ „์„ ํ™œ์šฉํ•˜์—ฌ ํด๋ž˜์Šค๋ณ„ ์‚ฌ์ „ ๋ณด์กด ์†์‹ค(Class-Specific Prior Preservation Loss)์„ ์ ์šฉํ•˜๊ณ  ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ์— ํด๋ž˜์Šค ์ด๋ฆ„์„ ์ฃผ์ž…ํ•˜์—ฌ(์˜ˆ: "A photo of a dog") ํ”ผ์‚ฌ์ฒด์˜ ํด๋ž˜์Šค์— ์†ํ•œ ๋‹ค์–‘ํ•œ ์‚ฌ๋ก€๋ฅผ ์ƒ์„ฑํ•˜๋„๋ก ์žฅ๋ คํ•˜๋Š” ํด๋ž˜์Šค๋ณ„ ์‚ฌ์ „ ๋ณด์กด ์†์‹ค(Class-Specific Prior Preservation Loss)์„ ๋ณ‘ํ–‰ํ•˜์—ฌ fine-tuning ํ•œ๋‹ค.
  2. ์ž…๋ ฅ ์ด๋ฏธ์ง€ ์„ธํŠธ์—์„œ ๊ฐ€์ ธ์˜จ ์ €ํ•ด์ƒ๋„ ๋ฐ ๊ณ ํ•ด์ƒ๋„ ์ด๋ฏธ์ง€ ์Œ์œผ๋กœ ์ดˆ๊ณ ํ•ด์ƒ๋„ ๊ตฌ์„ฑ ์š”์†Œ๋ฅผ ๋ฏธ์„ธ ์กฐ์ •ํ•˜์—ฌ ํ”ผ์‚ฌ์ฒด์˜ ์ž‘์€ ๋””ํ…Œ์ผ๊นŒ์ง€ ๋†’์€ ์ถฉ์‹ค๋„๋ฅผ ์œ ์ง€ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•œ๋‹ค.

 

 

๊ฒฐ๊ณผ๋ฌผ - Image Re-Contextualization(์ด๋ฏธ์ง€ ์žฌ๋งฅ๋ฝ)

๊ฐ ์ด๋ฏธ์ง€ ์•„๋ž˜ "์ปจ๋””์…”๋‹ ํ”„๋กฌํ”„ํŠธ"๊ฐ€ ํ‘œ์‹œ๋œ๋‹ค.

์ถœ์ฒ˜ : https://dreambooth.github.io/

 

 

๊ฒฐ๊ณผ๋ฌผ - Art Rendition(์•„ํŠธ ๋ Œ๋”๋ง)

์ถœ์ฒ˜ : https://dreambooth.github.io/

 

์œ ๋ช… ํ™”๊ฐ€์˜ ์Šคํƒ€์ผ๋กœ ํ”ผ์‚ฌ์ฒด ๊ฐœ๋ฅผ ์˜ˆ์ˆ ์ ์œผ๋กœ ํ‘œํ˜„ํ–ˆ๋‹ค. 

ํŠธ๋ ˆ์ด๋‹ ์„ธํŠธ์—์„œ ๋ณผ ์ˆ˜ ์—†์—ˆ๋˜ ํฌ์ฆˆ๊ฐ€ ๋งŽ์ด ์ƒ์„ฑ๋˜ ์—ˆ๋‹ค. 

ํ™”๊ฐ€์˜ ์Šคํƒ€์ผ์„ ์ถฉ์‹คํžˆ ๋ชจ๋ฐฉํ•œ๊ฒƒ ์ฒ˜๋Ÿผ ๋ณด์ด๊ณ , ์ผ์ข…์˜ ์ฐฝ์˜์„ฑ(์ด์ „ ์ง€์‹์„ ๋ฐ”ํƒ•์œผ๋กœ ํ•œ ์ถ”์ •)์„ ์•”์‹œํ•˜๊ธฐ๋„ ํ•œ๋‹ค.

 

 

๊ฒฐ๊ณผ๋ฌผ - Text-Guided View Synthesis(ํ…์ŠคํŠธ ์•ˆ๋‚ด ํ•ฉ์„ฑ)

์ถœ์ฒ˜ : https://dreambooth.github.io/

 

ํ”ผ์‚ฌ์ฒด์˜ ํฌ์ฆˆ๋„ ๋ณ€๊ฒฝํ•ด์„œ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค. ์ž…๋ ฅ์˜ ํ”ผ์‚ฌ์ฒด์˜ ๊ธฐ๋ณธ ํ˜•ํƒœ(๊ณ ์–‘์ด์˜ ์ด๋งˆ์˜ ํŒจํ„ด ๋ณด์กด)๊ฐ€ ์œ ์ง€๋œ๋‹ค.

 

 

๊ฒฐ๊ณผ๋ฌผ - Property Modification(์†์„ฑ ์ˆ˜์ •)

์ถœ์ฒ˜ : https://dreambooth.github.io/

 

์ฒซ๋ฒˆ์งธ, ์ž๋™์ฐจ ๊ทธ๋ฆผ์„ ๋ณด๋ฉด 'a [color] [V] car'๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ƒ‰์ƒ ์ˆ˜์ •ํ–ˆ๋‹ค.

๋‘๋ฒˆ์งธ, ๊ฐœ ๊ทธ๋ฆผ์„ ๋ณด๋ฉด 'A cross of a [V] dog and a [target species]'๋ผ๋Š” ํ”„๋กฌํ”„ํŠธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํŠน์ • ๊ฐœ์™€ ๋‹ค๋ฅธ ๋™๋ฌผ ์‚ฌ์ด์˜ ํ•ฉ์„ฑ์„ ๋ณด์—ฌ์ค€๋‹ค.

ํ”ผ์‚ฌ์ฒด์˜ ์ •์ฒด์„ฑ์ด๋‚˜ ๋ณธ์งˆ์„ ๋‚˜ํƒ€๋‚ด๋Š” ๊ณ ์œ ํ•œ ์‹œ๊ฐ์  ํŠน์ง•์„ ๋ณด์กดํ•œ๋‹ค.

 

 

๊ฒฐ๊ณผ๋ฌผ - Accessorization(์•ก์„ธ์„œ๋ฆฌ)

์ถœ์ฒ˜ : https://dreambooth.github.io/

 

ํ”ผ์‚ฌ์ฒด ๋ณธ์งˆ์€ ์œ ์ง€ํ•˜๋ฉด์„œ 'a [V] dog wearing a police/chef/witch outfit' ํ”„๋กฌํ”„ํŠธ๋ฅผ ํ†ตํ•ด ๋‹ค์–‘ํ•œ ์˜์ƒ์ด๋‚˜ ์•ก์„ธ์„œ๋ฆฌ๋ฅผ ๊ฐœ์—๊ฒŒ ์ ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค. 

 

 

 

DreamBooth ๋…ผ๋ฌธ

 

DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation

Large text-to-image models achieved a remarkable leap in the evolution of AI, enabling high-quality and diverse synthesis of images from a given text prompt. However, these models lack the ability to mimic the appearance of subjects in a given reference se

arxiv.org

 

728x90
๋ฐ˜์‘ํ˜•

'AI > Computer Vision' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

[Computer Vision] ๊ธฐ๋ณธ ์‚ฌ์šฉ๋ฒ•  (0) 2024.08.27
[Computer Vision] Inpainting  (0) 2024.03.25
[Computer Vision] LoRA(Low-Rank Adaptation)  (0) 2024.03.10
[Computer Vision] IP-Adapter  (0) 2024.03.10
[Computer Vision] Control Net  (0) 2024.03.09