Update walrus post for formatting

mikemccabe210 · mikemccabe210 · commit 1c8da502cb75 · 2025-11-19T21:19:44.000-05:00
diff --git a/_posts/2025-11-19-walrus.md b/_posts/2025-11-19-walrus.md
@@ -2,7 +2,7 @@
 layout: post
 title: "Walrus: A Cross-domain Foundation Model for Continuum Dynamics"
 authors: Michael McCabe, Payel Mukhopadhyay, Tanya Marwah, Bruno Regaldo-Saint Blancard, Francois Rozet, Cristiana Diaconu, Lucas Meyer, Kaze W. K. Wong, Hadi Sotoudeh, Alberto Bietti, Irina Espejo, Rio Fear, Siavash Golkar, Tom Hehir, Keiya Hirashima, Geraud Krawezik, Francois Lanusse, Rudy Morel, Ruben Ohana, Liam Parker, Mariel Pettee, Jeff Shen, Kyunghyun Cho, Miles Cranmer, Shirley Ho
-shorttitle: "Walrus Foundation Model"
+shorttitle: "Walrus: A Cross-domain Foundation Model for Continuum Dynamics"
 date: 2025-11-19 11:00
 smallimage: walrus-splash.jpg
 image: walrus-splash.jpg
@@ -19,10 +19,10 @@ Over the last few years, researchers have sought to side-step this data dependen
 
 ---
 
-#### Why is it so Hard to Build a Foundation Model for Physical Simulation?
-
+## Why is it so Hard to Build a Foundation Model for Physical Simulation?
+<br/>
 Physical simulation is an enormously broad field. It can be difficult to even define what we mean by “physical simulation” since there are many varieties and scales that are regularly simulated. Here, we’re mostly speaking of “continuum-level” simulation where we’re simulating macroscopic objects as though they were coherent objects rather than astronomically large collections of molecules crashing into each other. But even at this particular level, there is still an enormous amount of diversity that must be accounted for:
-<div style="float: right; width: 45%; margin: 0 0 1rem 1rem;">
+<div style="float: right; width: 30%; margin: 0 0 1rem 1rem;">
   <video style="width: 100%;" controls>
     <source src="/images/blog/walrus/Walrus_Example_rayleigh_benard.mp4" type="video/mp4">
   </video>
@@ -36,80 +36,88 @@ Overcoming these challenges requires rethinking training strategies and developi
 
 ---
 
-#### Introducing Walrus
+## Introducing Walrus  
+<br/> 
 <p align="center">
   <img src="/images/blog/walrus/ArchitectureWIP.png" alt="Walrus Architecture" width="95%" style="mix-blend-mode: darken;">
 </p>
 Walrus is a transformer-based model designed specifically to learn across diverse physical systems. It contains **1.3 billion parameters** and is trained on a dataset of unprecedented scale and variety: **19 scenarios**, encompassing **63 physical fields**, drawn from areas including **acoustics, classical fluids, non-Newtonian flows, plasma physics, active matter**, and several **high-resolution astrophysical regimes**. Walrus is one of the largest, most broadly pretrained models yet for physical emulation.
 
 ---
 
-#### The Anatomy of a Walrus
-<p align="center">
-  <img src="/images/blog/walrus/walrus_closeup.jpg" alt="Comparisons on downstream 2D challenges" width="95%" style="mix-blend-mode: darken;">
-</p>
+### The Anatomy of a Walrus
+\
 Walrus learns by watching large amounts of simulation data— movies of physical systems evolving over time. **Walrus takes a short trajectory of system snapshots and predicts the next state in the system. Rather than being explicitly provided information about the equations or system coefficients, Walrus must infer this information in-context from the provided history.** This allows Walrus to be used on experimental data or settings where there may not be a clean equation that models the system. 
 
 To make this possible over so many types of data, we had to build in a few ideas that help the model learn efficiently and stay accurate over long sequences: 
 
-**Stabilizing the model**
-<img 
+#### Stabilizing the model
+<br/>
+<p align="center">
+<video width="90%" controls>
+  <source src="/images/blog/walrus/Walrus_Example_euler_multi_quadrants_periodicBC.mp4" type="video/mp4">
+</video>
+</p>
+<!-- <img 
   src="/images/blog/walrus/jitter_example.png" 
   alt="Patching jittering stabilizes longer rollouts by reducing accumulation of grid modes." 
   width="40%" 
-  style="float: right; margin-left: 1rem; margin-bottom: 1rem; mix-blend-mode: darken;">
-Physical systems are sensitive: if you make a tiny mistake early on, that mistake can grow and completely change what happens later. Machine learning models can amplify this error due to architectural choices. For example, the “patching” or “tokenization” procedure used for compression in higher dimensional transformer models can break translation equivariance, the property of physical dynamics that says that outside of boundary effects physics should not depend on the location inside a domain. Walrus avoids this by randomizing the compression process. Before downsampling, Walrus randomly **“jitters”** the data, so that it reads the data slightly differently each step. **These tiny shifts prevent the model from locking onto grid patterns or numerical artifacts. The result is that Walrus stays stable for far longer than earlier models. **
+  style="float: right; margin-left: 1rem; margin-bottom: 1rem; mix-blend-mode: darken;"> -->
 
+Many physical systems are sensitive: a tiny source of error early can be magnified by the dynamics and completely change what happens later.
+Machine learning models can amplify this error due to architectural choices. 
+For example, the “patching” or “tokenization” procedure used for compression in higher dimensional transformer models can break translation equivariance, 
+the property of physical dynamics that says that outside of boundary effects physics should not depend on the location inside a domain. Walrus avoids this by randomizing the compression process. 
+Before downsampling, Walrus randomly **“jitters”** the data, so that it reads the data slightly differently each step. **These tiny shifts prevent the model from locking onto grid patterns or numerical artifacts. The result is that Walrus stays stable for far longer than earlier models.**
 
-**Adaptive Compute Patching**
-<p align="center">
-<video width="95%" controls>
-  <source src="/images/blog/walrus/Walrus_Example_euler_multi_quadrants_periodicBC.mp4" type="video/mp4">
-</video>
-</p>
-Not all systems require the same amount of compute to emulate. Walrus is built with this in mind and **uses recently developed [compute-adaptive patching](https://arxiv.org/pdf/2507.09264) techniques to apply different levels of compression to different inputs**. Walrus can, for instance, apply less compression to already coarse-grained data while applying more to higher resolution data to scale each problem to the available compute to maximize accuracy.
+This isn't just a heuristic. We can root this method in solid analysis of the operations used in these
+models, but that's beyond the scope of this blog post, so read the paper if you want to know more. 
+
+#### Adaptive Compute Patching
 
-**Dimensional Augmentation**
+Not all systems require the same amount of compute to emulate. 
+Walrus is built with this in mind and **uses recently developed [compute-adaptive patching](https://arxiv.org/pdf/2507.09264) techniques to apply different levels of compression to different inputs**. 
+Walrus can, for instance, apply less compression to already coarse-grained data while applying more to higher resolution data to scale each problem to the available compute to maximize accuracy.
+
+This helps us avoid some of the limitations of fixed resolution models, especially for 3D data where the trade-off between accuracy 
+and accuracy is especially impactful.
+
+#### Dimensional Augmentation
 <img 
   src="/images/blog/walrus/DimensionPadding.png" 
   alt="Treating 2D data as 3D for joint augmentation." 
-  width="40%" 
+  width="35%" 
   style="float: right; margin-left: 1rem; margin-bottom: 1rem; mix-blend-mode: darken;">
 Another key idea is **treating 2D and 3D data in a unified way through shared augmentation**. The presence or absence of certain fields can allow models to easily learn to predict entirely different dynamics for 2D and 3D systems, but this defeats the purpose of joint pretraining. In training Walrus, we avoid this by an aggressive augmentation strategy in which all 2D data is embedded in a 3D space, sort of like placing a sheet of paper inside a thin box, and then randomly transformed with tensor law aware transformations in time and space so that the 2D data corresponds to a random 2D plane in the 3D space. 
 These design choices, that are described in further detail in the paper, let Walrus do something that hasn’t been possible before: **learn from extremely different kinds of physical systems— waves, fluids, plasmas, turbulence, and make predictions that stay coherent over time**. The result is a model that understands enough underlying structure to perform well across many domains. 
 
 ---
 
-#### Walrus in Action
-
-<p align="center">
-  <img src="/images/blog/walrus/Walrus_3d_examples.png" alt="Forecasts on complex 3D systems." width="95%" style="mix-blend-mode: darken;">
-</p>
-
-
+## Walrus in Action
+<br/>
 Starting from a Walrus checkpoint can speed up learning for the emulation of 2D and 3D physics across an unprecedented number of equations, boundary conditions, physical parameterizations, resolutions, and aspect ratios, offering higher accuracy on downstream tasks than prior foundation models.
-
+<br/>
 <p align="center">
   <img src="/images/blog/walrus/walrus_downstream_2d.png" alt="Comparisons on downstream 2D challenges" width="95%" style="mix-blend-mode: darken;">
-</p>
+</p> 
+<br/>
 
-And this is just one of the things we demonstrate in the Walrus release. Check out the paper for more detailed experiments, or check out more rollout videos [here](https://youtube.com/playlist?list=PLqs9qkDO7oREx4_kus5671l7G-x64RXGn&si=ruyWrGjA1HDGJ8aV). 
+Check out the paper for more experiments, baselines, and ideas, or check out more rollout videos [here](https://youtube.com/playlist?list=PLqs9qkDO7oREx4_kus5671l7G-x64RXGn&si=ruyWrGjA1HDGJ8aV). 
 
 ---
 
-#### Into the Future
-
+### Into the Future
+<br/>
 The path to fully validated, production-ready machine-learned simulators will require further research, careful testing, and deeper integration with traditional methods. But Walrus shows that the foundational ideas work. It suggests a future where simulation is faster, more flexible, and more universally accessible, accelerating research across disciplines that depend on understanding the physical world. 
 
 ---
 
-#### Open Source Release & Getting Started
-
-AION-1 is free and easy to use. Everything you need to run the model:
-
+### Open Source Resources  
+<br/>
+Walrus is entirely open - model and training code. You can get started here:
 * **API & code:** [Walrus Code](https://github.com/PolymathicAI/walrus)
 * **Model weights:** [Hugging Face](https://huggingface.co/polymathic-ai/walrus)
-* **Tutorial:** [Walrus Tutorial](https://github.com/PolymathicAI/walrus/tree/release_cleanup/demo_notebooks)
+* **Tutorial:** [Walrus Tutorial](https://github.com/PolymathicAI/walrus/tree/main/demo_notebooks)
 
 *-- Sophie Barstein, Michael McCabe*
 
@@ -119,7 +127,6 @@ These types of projects wouldn't be possible without the generous support of the
 division of the Flatiron Institute, a division of the Simons Foundation and from the National AI Research Resource Pilot, including support from NVIDIA
 and NVIDIA’s DGX Cloud product which includes the NVIDIA AI Enterprise Software Platform.
 
-Multi-walrus image licensed from Getty Images via Unsplash+.
-Walrus in water by <a href="https://unsplash.com/@brewbottle?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText">Bob Brewer</a> on <a href="https://unsplash.com/photos/two-walins-playing-in-the-water-at-the-beach-gbxL20LPg84?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText">Unsplash</a>.
+Walrus title splash licensed from Getty Images via Unsplash+.