Andrey Voynov

Google (Israel)

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Trends

Author Order

Document Type

Co-Authors

Daniel Cohen‐Or

Tel Aviv University

Amir Hertz

Google (United States)

Vladimir Yu. Protasov

University of L'Aquila

Shlomi Fruchter

Google (Israel)

Ariel Shamir

Brandman University

Artem Babenko

Yandex (Russia)

Moab Arar

Tel Aviv University

АС

Андрей Сергеевич Войнов

Kfir Aberman

Snap (United States)

Hadar Averbuch‐Elor

Tel Aviv University

Cooperative Institutions

Tel Aviv University

Google (United States)

Yandex (Russia)

Google (Israel)

Reichman University

National Research University Higher School of Economics

Lomonosov Moscow State University

Hebrew University of Jerusalem

Moscow Institute of Physics and Technology

Skolkovo Institute of Science and Technology

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Field

PALP: Prompt Aligned Personalization of Text-to-Image Models

arXiv (Cornell University) (2024)

Moab Arar Andrey Voynov Amir Hertz Omri Avrahami Shlomi Fruchter

Content creators often aim to create personalized images using personal subjects that go beyond the capabilities of conventional text-to-image models. Additionally, they may want the resulting image to encompass a specific location, style, ambiance, and more. Existing personalization methods may compromise personalization ability or the alignment to complex textual prompts. This trade-off can impede the fulfillment of user prompts and subject fidelity. We propose a new approach focusing on personalization methods for a \emph{single} prompt to address this issue. We term our approach prompt-aligned personalization. While this may seem restrictive, our method excels in improving text alignment, enabling the creation of images with complex and intricate prompts, which may pose a challenge for current techniques. In particular, our method keeps the personalized model aligned with a target prompt using an additional score distillation sampling term. We demonstrate the versatility of our method in multi- and single-shot settings and further show that it can compose multiple subjects or use inspiration from reference images, such as artworks. We compare our approach quantitatively and qualitatively with existing baselines and state-of-the-art techniques.

Compromise

Modalities

10.48550/arxiv.2401.06105

Cite

Citations (0)

AnyLens: A Generative Diffusion Model with Any Rendering Lens

arXiv (Cornell University) (2023)

Andrey Voynov Amir Hertz Moab Arar Shlomi Fruchter Daniel Cohen‐Or

State-of-the-art diffusion models can generate highly realistic images based on various conditioning like text, segmentation, and depth. However, an essential aspect often overlooked is the specific camera geometry used during image capture. The influence of different optical systems on the final scene appearance is frequently overlooked. This study introduces a framework that intimately integrates a text-to-image diffusion model with the particular lens geometry used in image rendering. Our method is based on a per-pixel coordinate conditioning method, enabling the control over the rendering geometry. Notably, we demonstrate the manipulation of curvature properties, achieving diverse visual effects, such as fish-eye, panoramic views, and spherical texturing using a single diffusion model.

10.48550/arxiv.2311.17609

Cite

Citations (1)

Concept Decomposition for Visual Exploration and Inspiration

ACM Transactions on Graphics (2023)

Yael Vinker Andrey Voynov Daniel Cohen‐Or Ariel Shamir

A creative idea is often born from transforming, combining, and modifying ideas from existing visual examples capturing various concepts. However, one cannot simply copy the concept as a whole, and inspiration is achieved by examining certain aspects of the concept. Hence, it is often necessary to separate a concept into different aspects to provide new perspectives. In this paper, we propose a method to decompose a visual concept, represented as a set of images, into different visual aspects encoded in a hierarchical tree structure. We utilize large vision-language models and their rich latent space for concept decomposition and generation. Each node in the tree represents a sub-concept using a learned vector embedding injected into the latent space of a pretrained text-to-image model. We use a set of regularizations to guide the optimization of the embedding vectors encoded in the nodes to follow the hierarchical structure of the tree. Our method allows to explore and discover new concepts derived from the original one. The tree provides the possibility of endless visual sampling at each node, allowing the user to explore the hidden sub-concepts of the object of interest. The learned aspects in each node can be combined within and across trees to create new visual ideas, and can be used in natural language sentences to apply such aspects to new designs. Project page: https://inspirationtree.github.io/inspirationtree/

Tree (set theory)

10.1145/3618315

Cite

Citations (13)

ReNoise: Real Image Inversion Through Iterative Noising

Lecture notes in computer science (2024)

Daniel Garibi Or Patashnik Andrey Voynov Hadar Averbuch‐Elor Daniel Cohen‐Or

10.1007/978-3-031-72630-9_23

Cite

Citations (0)

On the structure of self-affine convex bodies

Sbornik Mathematics (2013)

Andrey Voynov

We study the structure of convex bodies in R{sup d} that can be represented as a union of their affine images with no common interior points. Such bodies are called self-affine. Vallet's conjecture on the structure of self-affine bodies was proved for d = 2 by Richter in 2011. In the present paper we disprove the conjecture for all d≥3 and derive a detailed description of self-affine bodies in R{sup 3}. Also we consider the relation between properties of self-affine bodies and functional equations with a contraction of an argument. Bibliography: 10 titles.

Mixed volume

10.1070/sm2013v204n08abeh004332

Cite

Citations (4)

A counterexample to Valette’s conjecture

Proceedings of the Steklov Institute of Mathematics (2011)

Andrey Voynov

Counterexample

Convex polytope

10.1134/s0081543811080207

Cite

Citations (3)

К вопросу о структуре самоаффинных выпуклых тел

Математический сборник (2013)

Андрей Сергеевич Войнов Andrey Voynov

А. С. Войнов К вопросу о структуре самоаффинных выпуклых телИзучается структура выпуклых тел в R d , допускающих представление в виде конечного числа своих аффинных копий с непересекающимися внутренностями.Такие тела называются самоаффинными.Гипотеза об их общем виде была сформулирована в 1991 г. Г. Валлетом.Эта гипотеза была доказана для d = 2 в 2011 г

10.4213/sm8169

Cite

Citations (4)

P+: Extended Textual Conditioning in Text-to-Image Generation

arXiv (Cornell University) (2023)

Andrey Voynov Qinghao Chu Daniel Cohen‐Or Kfir Aberman

We introduce an Extended Textual Conditioning space in text-to-image models, referred to as $P+$. This space consists of multiple textual conditions, derived from per-layer prompts, each corresponding to a layer of the denoising U-net of the diffusion model. We show that the extended space provides greater disentangling and control over image synthesis. We further introduce Extended Textual Inversion (XTI), where the images are inverted into $P+$, and represented by per-layer tokens. We show that XTI is more expressive and precise, and converges faster than the original Textual Inversion (TI) space. The extended inversion method does not involve any noticeable trade-off between reconstruction and editability and induces more regular inversions. We conduct a series of extensive experiments to analyze and understand the properties of the new space, and to showcase the effectiveness of our method for personalizing text-to-image models. Furthermore, we utilize the unique properties of this space to achieve previously unattainable results in object-style mixing using text-to-image models. Project page: https://prompt-plus.github.io

10.48550/arxiv.2303.09522

Cite

Citations (27)

Sketch-Guided Text-to-Image Diffusion Models

Andrey Voynov Kfir Aberman Daniel Cohen‐Or

Text-to-Image models have introduced a remarkable leap in the evolution of machine learning, demonstrating high-quality synthesis of images from a given text-prompt. However, these powerful pretrained models still lack control handles that can guide spatial properties of the synthesized images. In this work, we introduce a universal approach to guide a pretrained text-to-image diffusion model, with a spatial map from another domain (e.g., sketch) during inference time. Unlike previous works, our method does not require to train a dedicated model or a specialized encoder for the task. Our key idea is to train a Latent Guidance Predictor (LGP) - a small, per-pixel, Multi-Layer Perceptron (MLP) that maps latent features of noisy images to spatial maps, where the deep features are extracted from the core Denoising Diffusion Probabilistic Model (DDPM) network. The LGP is trained only on a few thousand images and constitutes a differential guiding map predictor, over which the loss is computed and propagated back to push the intermediate images to agree with the spatial map. The per-pixel training offers flexibility and locality which allows the technique to perform well on out-of-domain sketches, including free-hand style drawings. We take a particular focus on the sketch-to-image translation task, revealing a robust and expressive way to generate images that follow the guidance of a sketch of arbitrary style or domain.

Sketch

Image translation

Image editing

Perceptron

10.1145/3588432.3591560

Cite

Citations (83)

Compact noncontraction semigroups of affine operators

Sbornik Mathematics (2015)

Andrey Voynov Vladimir Yu. Protasov

We analyze compact multiplicative semigroups of affine operators acting in a finite-dimensional space. The main result states that every such semigroup is either contracting, that is, contains elements of arbitrarily small operator norm, or all its operators share a common invariant affine subspace on which this semigroup is contracting. The proof uses functional difference equations with contraction of the argument. We look at applications to self-affine partitions of convex sets, the investigation of finite affine semigroups and the proof of a criterion of primitivity for nonnegative matrix families. Bibliography: 32 titles.

10.1070/sm2015v206n07abeh004483

Cite

Citations (8)