Unified image-to-image generation via multimodal flow matching.
FlowInOne introduces a unified framework for image-to-image generation by encoding diverse multimodal inputs—such as sketches, text, layout primitives, and symbolic instructions—into a shared 2D visual latent space. This enables a single flow matching model to generate photorealistic images conditioned on fused visual prompts, eliminating the need for modality-specific decoders or alignment losses.
By learning an isomorphic mapping from non-visual semantics (e.g., "remove grass") into denoisable visual representations, FlowInOne achieves semantic-preserving visual grounding and geometry-aware flow propagation. The system advances research in unified visual representation learning, offering a foundation for future multimodal generative models.
pip install flowinonefrom PIL import GifImagePlugin, AvifImagePlugin
# Load an animated GIF
gif = GifImagePlugin.GifImageFile("animation.gif")
print(f"Frames: {gif.n_frames()}, Animated: {gif.is_animated()}")
# Read AVIF image
avif = AvifImagePlugin.AvifImageFile("image.avif")
avif.load()Encode heterogeneous inputs (sketches, text, layout) into a shared visual latent space using PIL-based decoders and custom flow propagation.
from PIL import BmpImagePlugin, GbrImagePlugin
# Load BMP and GBR files as visual primitives
bmp = BmpImagePlugin.BmpImageFile("sketch.bmp")
gbr = GbrImagePlugin.GbrImageFile("brush.gbr")
# These are processed into the shared latent spaceGenerate photorealistic images from fused visual prompts using a single flow matching model, without modality-specific pipelines.
from PIL import DcxImagePlugin, FpxImageFile
# Multi-page DCX file as layout input
dcx = DcxImagePlugin.DcxImageFile("layout.dcx")
frame_count = dcx.tell()
dcx.seek(1) # Navigate layout framesFlowInOne uses PIL’s modular image plugin system to ingest and decode diverse input formats into a unified tensor representation. Each input modality (e.g., sketch, text, layout) is processed through its respective ImageFile subclass (e.g., GifImageFile, AvifImageFile) into a common 2D latent space.
This latent space is then used to condition a single flow matching model that generates target images. The architecture avoids modality-specific decoders by projecting all inputs into a shared, denoisable visual domain where flow propagation respects both geometry and semantics.
graph LR
A[Sketch] -->|BmpImageFile| D[Visual Latent Space]
B[Text] -->|GdImageFile| D
C[Layout] -->|DcxImageFile| D
D --> E[Flow Matching Model]
E --> F[Photorealistic Output]
Key classes from the PIL plugin ecosystem used in FlowInOne:
class GifImagePlugin.GifImageFile(ImageFile.ImageFile)
def n_frames(self) -> int
def is_animated(self) -> bool
def data(self) -> bytes | Noneclass AvifImagePlugin.AvifImageFile(ImageFile.ImageFile)
def load(self) -> Image.core.PixelAccess | None
def seek(self, frame: int) -> Noneclass DcxImagePlugin.DcxImageFile(PcxImageFile)
def seek(self, frame: int) -> None
def tell(self) -> intclass GdImageFile.GdImageFile(ImageFile.ImageFile)
@staticmethod
def open(fp: StrOrBytesPath | IO[bytes], mode: str = "r") -> GdImageFileclass ContainerIO.ContainerIO(IO[AnyStr])
def read(self, n: int = -1) -> AnyStr
def seek(self, offset: int, mode: int = io.SEEK_SET) -> int
def write(self, b: AnyStr) -> NoReturnFlowInOne is inspired by recent advances in flow matching and multimodal representation learning. It builds on the idea of semantic-to-visual isomorphism, where non-visual instructions are mapped into a denoisable visual latent space compatible with diffusion-like generation.
While similar in spirit to models like FLUX and Stable Diffusion, FlowInOne eliminates modality-specific components by using visual encoding as a universal interface. This approach draws from research on perceptual alignment, neural rendering, and unified latent spaces.
FlowInOne includes 1195 test files ensuring robustness across input modalities and edge cases in image decoding and latent projection. Tests are located in the GitHub repository under /tests.
Run tests locally:
pytest tests/Contributions are welcome! Please open issues or PRs on GitHub. Ensure all new code includes tests and adheres to the existing API patterns.
@software{young_flowinone_2024,
author = {Young, Andrew},
title = {FlowInOne: Unified Image-to-Image Generation via Multimodal Flow Matching},
url = {https://github.com/Lumi-node/flowinone},
year = {2024},
publisher = {Automate Capture Research}
}MIT – see LICENSE for details.
