FlowInOne

Unified image-to-image generation via multimodal flow matching.

FlowInOne introduces a unified framework for image-to-image generation by encoding diverse multimodal inputs—such as sketches, text, layout primitives, and symbolic instructions—into a shared 2D visual latent space. This enables a single flow matching model to generate photorealistic images conditioned on fused visual prompts, eliminating the need for modality-specific decoders or alignment losses.

By learning an isomorphic mapping from non-visual semantics (e.g., "remove grass") into denoisable visual representations, FlowInOne achieves semantic-preserving visual grounding and geometry-aware flow propagation. The system advances research in unified visual representation learning, offering a foundation for future multimodal generative models.

Quick Start

pip install flowinone

from PIL import GifImagePlugin, AvifImagePlugin

# Load an animated GIF
gif = GifImagePlugin.GifImageFile("animation.gif")
print(f"Frames: {gif.n_frames()}, Animated: {gif.is_animated()}")

# Read AVIF image
avif = AvifImagePlugin.AvifImageFile("image.avif")
avif.load()

What Can You Do?

Feature 1: Multimodal Input Encoding

Encode heterogeneous inputs (sketches, text, layout) into a shared visual latent space using PIL-based decoders and custom flow propagation.

from PIL import BmpImagePlugin, GbrImagePlugin

# Load BMP and GBR files as visual primitives
bmp = BmpImagePlugin.BmpImageFile("sketch.bmp")
gbr = GbrImagePlugin.GbrImageFile("brush.gbr")
# These are processed into the shared latent space

Feature 2: Flow Matching on Unified Visual Prompts

Generate photorealistic images from fused visual prompts using a single flow matching model, without modality-specific pipelines.

from PIL import DcxImagePlugin, FpxImageFile

# Multi-page DCX file as layout input
dcx = DcxImagePlugin.DcxImageFile("layout.dcx")
frame_count = dcx.tell()
dcx.seek(1)  # Navigate layout frames

Architecture

FlowInOne uses PIL’s modular image plugin system to ingest and decode diverse input formats into a unified tensor representation. Each input modality (e.g., sketch, text, layout) is processed through its respective ImageFile subclass (e.g., GifImageFile, AvifImageFile) into a common 2D latent space.

This latent space is then used to condition a single flow matching model that generates target images. The architecture avoids modality-specific decoders by projecting all inputs into a shared, denoisable visual domain where flow propagation respects both geometry and semantics.

graph LR
    A[Sketch] -->|BmpImageFile| D[Visual Latent Space]
    B[Text] -->|GdImageFile| D
    C[Layout] -->|DcxImageFile| D
    D --> E[Flow Matching Model]
    E --> F[Photorealistic Output]

API Reference

Key classes from the PIL plugin ecosystem used in FlowInOne:

class GifImagePlugin.GifImageFile(ImageFile.ImageFile)
def n_frames(self) -> int
def is_animated(self) -> bool
def data(self) -> bytes | None

class AvifImagePlugin.AvifImageFile(ImageFile.ImageFile)
def load(self) -> Image.core.PixelAccess | None
def seek(self, frame: int) -> None

class DcxImagePlugin.DcxImageFile(PcxImageFile)
def seek(self, frame: int) -> None
def tell(self) -> int

class GdImageFile.GdImageFile(ImageFile.ImageFile)
@staticmethod
def open(fp: StrOrBytesPath | IO[bytes], mode: str = "r") -> GdImageFile

class ContainerIO.ContainerIO(IO[AnyStr])
def read(self, n: int = -1) -> AnyStr
def seek(self, offset: int, mode: int = io.SEEK_SET) -> int
def write(self, b: AnyStr) -> NoReturn

Research Background

FlowInOne is inspired by recent advances in flow matching and multimodal representation learning. It builds on the idea of semantic-to-visual isomorphism, where non-visual instructions are mapped into a denoisable visual latent space compatible with diffusion-like generation.

While similar in spirit to models like FLUX and Stable Diffusion, FlowInOne eliminates modality-specific components by using visual encoding as a universal interface. This approach draws from research on perceptual alignment, neural rendering, and unified latent spaces.

Testing

FlowInOne includes 1195 test files ensuring robustness across input modalities and edge cases in image decoding and latent projection. Tests are located in the GitHub repository under /tests.

Run tests locally:

pytest tests/

Contributing

Contributions are welcome! Please open issues or PRs on GitHub. Ensure all new code includes tests and adheres to the existing API patterns.

Citation

@software{young_flowinone_2024,
  author = {Young, Andrew},
  title = {FlowInOne: Unified Image-to-Image Generation via Multimodal Flow Matching},
  url = {https://github.com/Lumi-node/flowinone},
  year = {2024},
  publisher = {Automate Capture Research}
}

License

MIT – see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
.worktrees		.worktrees
assets		assets
bin		bin
docs		docs
paper		paper
src		src
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
debug_dataset.py		debug_dataset.py
debug_normalize.py		debug_normalize.py
journal-flowinone.mdx		journal-flowinone.mdx
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FlowInOne

Quick Start

What Can You Do?

Feature 1: Multimodal Input Encoding

Feature 2: Flow Matching on Unified Visual Prompts

Architecture

API Reference

Research Background

Testing

Contributing

Citation

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FlowInOne

Quick Start

What Can You Do?

Feature 1: Multimodal Input Encoding

Feature 2: Flow Matching on Unified Visual Prompts

Architecture

API Reference

Research Background

Testing

Contributing

Citation

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages