Inference of Text-Driven Stylized Synthesis 

👋 
Thanks for the great job 👏 
I have 2 questions:
1) It's not quite clear from the paper what does Text-Driven Stylized Synthesis mean. AFAIU it's quite the same as using IP-adapter on the style block of Unet (the only difference is that you have your own adapter model, trained exactly for this task). am I right or not? Controlnet in this case is not used at all
2) In the [notebook](https://github.com/instantX-research/CSGO/blob/main/infer/infer_csgo.ipynb)  I assume that the example with a cat is  Text-Driven Stylized Synthesis (cell 15). Although I'm quite confused, why do you still use content image to generate it (even though with a really small `controlnet_conditioning_scale`). Moreover, why `CSGO.generate` method has `pil_content_image` as a required param? The Idea is to condition only on the style image and text, no content image provided. If it's not the case, than why do we need content image at all?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference of Text-Driven Stylized Synthesis #13

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Inference of Text-Driven Stylized Synthesis #13

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions