Add fancy Metrics plot to the together-py by artek0chumak · Pull Request #344 · togethercomputer/together-py

artek0chumak · 2026-04-27T19:09:05Z

This PR adds Metrics Plot for the Fine-tuning Jobs. This PR includes:

New CLI command to retrieve metrics: list_metrics;
Library to plot any linear* ascii graphs (time-axis like);
Tests for the new funcionality.

Example of the output:

together-py> uv run together fine-tuning list-metrics ft-b7...

  train/loss  (1 – 105)  0.5725 → 0.3647
   0.573┼──╮
   0.521┼  ╰──╮
   0.469┼     ╰───╮ ╭────╮
   0.418┼         ╰─╯    ╰──╮╭╮ ╭────╮    ╭╮  ╭╮  ╭─╮╭─╮      ╭─╮           ╭╮
   0.366┼                   ╰╯╰─╯    ╰────╯╰──╯╰──╯ ╰╯ ╰──────╯ ╰───╮╭──────╯╰╮╭───────╮╭─╮╭─────╮ ╭───────╮ ╭──────╮    ╭────╮  ╭──╮ ╭──╮ ╭─────╮    ╭─────
   0.315┼                                                           ╰╯        ╰╯       ╰╯ ╰╯     ╰─╯       ╰─╯      ╰────╯    ╰──╯  ╰─╯  ╰─╯     ╰────╯
        └┬─────────────────────────────────────────────────────────────────────────┬────────────────────────────────────────────────────────────────────────┬
         1                                                                        53                                                                      105

  train/grad_norm  (1 – 105)  1.575 → 0.2752
    1.65┼──╮
    1.37┼  ╰╮
    1.09┼   ╰───╮
   0.808┼       ╰─╮
   0.529┼         ╰──────╮
    0.25┼                ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
        └┬─────────────────────────────────────────────────────────────────────────┬────────────────────────────────────────────────────────────────────────┬
         1                                                                        53                                                                      105

  train/learning_rate  (1 – 105)  1e-05 → 2.238e-09
   1e-05┼────────────────────────────────────────────────────────────────────────────────╮
1.86e-06┼                                                                                ╰────────────────────────────────────────╮
3.47e-07┼                                                                                                                         ╰──────────────╮
6.46e-08┼                                                                                                                                        ╰──────╮
 1.2e-08┼                                                                                                                                               ╰──╮
2.24e-09┼                                                                                                                                                  ╰
        └┬─────────────────────────────────────────────────────────────────────────┬────────────────────────────────────────────────────────────────────────┬
         1                                                                        53                                                                      105

  eval/loss  (21 – 105)  5.939 → 5.824
    5.94┼────╮
    5.92┼    ╰───────╮
    5.89┼            ╰───────╮
    5.87┼                    ╰───────╮
    5.85┼                            ╰───────╮
    5.82┼                                    ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────
        └┬─────────────────────────────────────────────────────────────────────────┬────────────────────────────────────────────────────────────────────────┬
        21                                                                        63                                                                      105

together-py> uv run together fine-tuning retrieve ft-b71e...

...
Lr Scheduler:                Lr Scheduler Type: cosine
                             Lr Scheduler Args: Min Lr Ratio: n/a
                                                Num Cycles:   0.5
Multimodal Params:           Train Vision: n/a
Progress:                    Estimate Available: False
                             Seconds Remaining:  0
Training Method:             Method:          sft
                             Train On Inputs: auto
Training Type:               Lora Alpha:             128
                             Lora R:                 64
                             Type:                   Lora
                             Lora Dropout:           n/a
                             Lora Trainable Modules: k_proj,up_proj,o_proj,q_proj,down_proj,v_proj,gate_proj
Checkpoints:

Training metrics:
           train/loss  ██▇▇▇▆▅▅▅▅▃▃▄▅▄▄▄▄▃▂▂▃▂▂▄▄▃▃▃▂▁▂▂▂▃▂▂▂▃▂▁▁▂▃▂▃▂▂▂▁▁▂▁▂▃▃▂▂▂▁▁▁▁▂▂▁▁▂▂▁▁▁▂▂▂▂▂▁▁▁▁▁ ▂▂▂▂▂▁ ▁▁▁▁▁▁▁▂▂  ▁▂▁▂▂▁▁▁ ▁ ▁▂▂▂▂▂▁ ▁▁▁▁▁▁▁▂▁  ▁▂▁▂▂▁▁  ▁ ▁▂▂▂▂▂  0.5725 → 0.3647
      train/grad_norm  ███▇▅▄▄▅▄▃▂▂▂▂▁▁▁▁▁               ▁                                                                                                                   1.575 → 0.2752
  train/learning_rate  ██████████████████████████████████████████████████████████████████▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▅▅▅▅▅▅▅▅▅▅▅▄▄▄▄▄▄▃▃▃▃▂▂▁   1e-05 → 2.238e-09
            eval/loss  ███▇▇▇▇▇▆▆▆▆▆▅▅▅▅▅▄▄▄▄▄▃▃▃▃▃▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁                                                                                                       5.939 → 5.824


FT Events:
  Total events: 26
  To see event log...

…klines

blainekasten · 2026-05-14T14:39:28Z

do we need this file?

blainekasten · 2026-05-14T14:42:31Z

+    global_step_from: Annotated[int | Omit, Parameter(help="Filter metrics from this global step (inclusive).")] = omit,
+    global_step_to: Annotated[int | Omit, Parameter(help="Filter metrics to this global step (inclusive).")] = omit,
+    logged_at_from: Annotated[datetime | Omit, Parameter(help="Filter metrics logged at or after this time.")] = omit,
+    logged_at_to: Annotated[datetime | Omit, Parameter(help="Filter metrics logged at or before this time.")] = omit,
+    resolution: Annotated[int | Omit, Parameter(help="Number of data points to return (used for JSON output).")] = omit,


Can you make these default to None instead? Then you don't need the union on Omit either. Currently the help docs look like this:

I could fix this to not look so bad for omit, but i think the cyclopts internals handle None better for us

blainekasten · 2026-05-14T14:44:23Z

+    if config.json:
+        response = await show_loading_status(
+            "Fetching metrics...",
+            config.client.fine_tuning.list_metrics(
+                fine_tune_id,
+                global_step_from=global_step_from,
+                global_step_to=global_step_to,
+                logged_at_from=logged_at_from,
+                logged_at_to=logged_at_to,
+                resolution=resolution,
+            ),
+        )
+        console.print_json(openapi_dumps(response.metrics or []).decode("utf-8"))
+        return
+
+    # For the ASCII chart always fetch at terminal width resolution for best fidelity.
+    response = await show_loading_status(
+        "Fetching metrics...",
+        config.client.fine_tuning.list_metrics(
+            fine_tune_id,
+            global_step_from=global_step_from,
+            global_step_to=global_step_to,
+            logged_at_from=logged_at_from,
+            logged_at_to=logged_at_to,
+            resolution=console.width - METRICS_WIDTH_PADDING,
+        ),
+    )


Suggested change

if config.json:

response = await show_loading_status(

"Fetching metrics...",

config.client.fine_tuning.list_metrics(

fine_tune_id,

global_step_from=global_step_from,

global_step_to=global_step_to,

logged_at_from=logged_at_from,

logged_at_to=logged_at_to,

resolution=resolution,

),

)

console.print_json(openapi_dumps(response.metrics or []).decode("utf-8"))

return

# For the ASCII chart always fetch at terminal width resolution for best fidelity.

response = await show_loading_status(

"Fetching metrics...",

config.client.fine_tuning.list_metrics(

fine_tune_id,

global_step_from=global_step_from,

global_step_to=global_step_to,

logged_at_from=logged_at_from,

logged_at_to=logged_at_to,

resolution=console.width - METRICS_WIDTH_PADDING,

),

)

# For the ASCII chart always fetch at terminal width resolution for best fidelity.

resolution_value = console.width - METRICS_WIDTH_PADDING if config.json else resolution

response = await show_loading_status(

"Fetching metrics...",

config.client.fine_tuning.list_metrics(

fine_tune_id,

global_step_from=global_step_from,

global_step_to=global_step_to,

logged_at_from=logged_at_from,

logged_at_to=logged_at_to,

resolution=resolution_value,

),

)

if config.json:

console.print_json(openapi_dumps(response.metrics or []).decode("utf-8"))

return

a bit cleaner I think? not a big deal

Yes, cleaner, applied

blainekasten · 2026-05-14T14:51:47Z

    fine_tune_id: str,
    *,
    config: CLIConfigParameter,
+    plots: Annotated[bool, Parameter(help="Print training metric sparklines.")] = True,


Cyclopts converts bools to an auto negative version. So if you do --help on this you'll see you can pass either --plots or --no-plots.

since --plots is the default true it basically does nothing to pass --plots and the user would have to pass --no-plots. Given there is a default of True I would say we should change the behavior to be only a negative param that defaults to false like this:

Suggested change

plots: Annotated[bool, Parameter(help="Print training metric sparklines.")] = True,

no_plots: Annotated[bool, Parameter(help="Print training metric sparklines.", negative=())] = False,

blainekasten · 2026-05-14T14:56:36Z

+    "render_line_chart",
+    "render_sparklines",
+    "should_log",
+]


can we move this from utils to components? I'm trying to put presentational utilities in the components folder

blainekasten · 2026-05-14T14:57:20Z

    help_epilogue=FINE_TUNING_DOWNLOAD_HELP_EXAMPLES,
 )
 fine_tuning_app.command((f"{_CLI}.fine_tuning.delete:delete"), alias="-d", help="Delete a fine-tuning job")
+fine_tuning_app.command(


I'd suggest adding some examples to the usage like we do with other commands that have different parameters

blainekasten

Looking great! Several minor nits but nothing too mind bending

timofeev1995 · 2026-05-14T15:59:41Z

+    global_step_to: Annotated[int | Omit, Parameter(help="Filter metrics to this global step (inclusive).")] = omit,
+    logged_at_from: Annotated[datetime | Omit, Parameter(help="Filter metrics logged at or after this time.")] = omit,
+    logged_at_to: Annotated[datetime | Omit, Parameter(help="Filter metrics logged at or before this time.")] = omit,
+    resolution: Annotated[int | Omit, Parameter(help="Number of data points to return (used for JSON output).")] = omit,


I'm not sure if we want to expose this parameter at all. we can simply use resolution=None for any request, we rarely have jobs more than 20k steps, it's not that much. Originally the motivation to have it was the UI (so it can quickly fetch lets say 100 metrics and render them quickly too)

timofeev1995 · 2026-05-14T16:01:54Z

+                resolution=resolution,
+            ),
+        )
+        console.print_json(openapi_dumps(response.metrics or []).decode("utf-8"))


Are se sure we want to print them all?

Probably we can separate the list into two commands - one is download (as we do for files, either output or stdout) and the second is plot.

Added --save option to save metrics into the file

timofeev1995 · 2026-05-14T16:04:24Z

+            global_step_to=global_step_to,
+            logged_at_from=logged_at_from,
+            logged_at_to=logged_at_to,
+            resolution=console.width - METRICS_WIDTH_PADDING,


What does happen if we set lets say resolution 1000 or no resolution at all? Will the plot be the same size with just some "grouping" of the metrics or plot size will explode?

The _engine has a logic to compress the plot to the specific width, so it's not a big deal. However, if we can request fewer points to render -- it's better to do so.

timofeev1995 · 2026-05-14T16:05:31Z

    fine_tune_id: str,
    *,
    config: CLIConfigParameter,
+    plots: Annotated[bool, Parameter(help="Print training metric sparklines.")] = True,


plot_training_curves as an alternative

timofeev1995 · 2026-05-14T16:05:59Z

+            )
+            metrics = metrics_response.metrics or []
+        except Exception:
+            # Metrics are optional; silently skip if unavailable.


Are we sure we want to skip silently here?

timofeev1995 · 2026-05-14T16:11:26Z

+
+def _get_step(row: dict[str, Any], fallback: int) -> int:
+    """Extract global step, trying several field names before falling back to index."""
+    gs = row.get("global_step", row.get("train/global_step", row.get("step")))


this nested get is a bit hard to read. can we probably loop over the keys here?

timofeev1995 · 2026-05-14T16:12:06Z

+    to ``-inf`` so the rendering engine plots them at the very bottom of the
+    chart rather than silently dropping them.
+    """
+    series: dict[str, tuple[list[float], list[float]]] = {}


can be defaultdict

timofeev1995 · 2026-05-15T12:09:51Z

+
+def should_log(vals: list[float]) -> bool:
+    """Return True when values span more than 100×, suggesting log scale."""
+    nz = [v for v in vals if v > 0]


nz = non zero? if so it's not quite accurate. I'd suggest positive_vals

timofeev1995 · 2026-05-15T12:10:12Z

+    Non-finite values (e.g. the -inf sentinel used for NaN data points) are
+    excluded from the range computation so they don't corrupt the grid.
+    """
+    finite = [v for v in vals if math.isfinite(v)]


finite_vals as an alternative

timofeev1995 · 2026-05-15T13:27:12Z

+    logged_at_to: Annotated[Optional[datetime], Parameter(help="Filter metrics logged at or before this time.")] = None,
+    resolution: Annotated[
+        Optional[int],
+        Parameter(help="Number of training metric points to return. Does not limit the number of eval metric points."),


Please add the "uniformly sampled training metrics" or something

timofeev1995 · 2026-05-15T13:28:12Z

+from together.lib.cli.components.plot_finetune_metrics import METRICS_WIDTH_PADDING, metrics_ascii_charts
+
+
+async def list_metrics(


I still not sure if we want to keep it as one command. just as an example, lets say I want to only save metrics to the file but using this command I would get a bunch of plots I can be not interested in

Wouldn't it be more convenient for us to have save_metrics and plot_metrics? What do you think?

artek0chumak force-pushed the artekchumak/mosh-2342-metric-plots-in-the-client branch 3 times, most recently from 5f57c82 to 1c34308 Compare May 1, 2026 16:05

feat(plots): add fine-tuning metrics plots with ASCII charts and spar…

7aa34f9

…klines

artek0chumak force-pushed the artekchumak/mosh-2342-metric-plots-in-the-client branch from 948bbfd to 7aa34f9 Compare May 14, 2026 09:51

artek0chumak added 2 commits May 14, 2026 14:07

fix comment

950baf7

revert

37259db

artek0chumak changed the title ~~[WIP] Fix FT Service issue~~ Add fancy Metrics plot to the together-py May 14, 2026

artek0chumak requested review from blainekasten and timofeev1995 May 14, 2026 12:15

artek0chumak marked this pull request as ready for review May 14, 2026 12:15

blainekasten reviewed May 14, 2026

View reviewed changes

Comment thread src/together/lib/cli/api/fine_tuning/__init__.py Outdated

Copy link
Copy Markdown

Contributor

blainekasten May 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this file?

blainekasten reviewed May 14, 2026

View reviewed changes

timofeev1995 reviewed May 14, 2026

View reviewed changes

artek0chumak added 4 commits May 15, 2026 09:53

first pack of fixes

d753f70

fixes

ecd5c7b

revert

ad7f85c

final fixes

6af5596

artek0chumak requested review from blainekasten and timofeev1995 May 15, 2026 08:15

timofeev1995 reviewed May 15, 2026

View reviewed changes

artek0chumak added 2 commits May 15, 2026 15:46

remove save

54583a7

feedback fixes

0cee80a

artek0chumak requested a review from timofeev1995 May 18, 2026 08:15

	plots: Annotated[bool, Parameter(help="Print training metric sparklines.")] = True,
	no_plots: Annotated[bool, Parameter(help="Print training metric sparklines.", negative=())] = False,

		from together.lib.cli.components.plot_finetune_metrics import METRICS_WIDTH_PADDING, metrics_ascii_charts


		async def list_metrics(

Conversation

artek0chumak commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

blainekasten left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

artek0chumak commented Apr 27, 2026 •

edited

Loading