Skip to content

Add fancy Metrics plot to the together-py#344

Open
artek0chumak wants to merge 9 commits into
nextfrom
artekchumak/mosh-2342-metric-plots-in-the-client
Open

Add fancy Metrics plot to the together-py#344
artek0chumak wants to merge 9 commits into
nextfrom
artekchumak/mosh-2342-metric-plots-in-the-client

Conversation

@artek0chumak
Copy link
Copy Markdown
Contributor

@artek0chumak artek0chumak commented Apr 27, 2026

This PR adds Metrics Plot for the Fine-tuning Jobs. This PR includes:

  • New CLI command to retrieve metrics: list_metrics;
  • Library to plot any linear* ascii graphs (time-axis like);
  • Tests for the new funcionality.

Example of the output:

together-py> uv run together fine-tuning list-metrics ft-b7...

  train/loss  (1 – 105)  0.5725 → 0.3647
   0.573┼──╮
   0.521┼  ╰──╮
   0.469┼     ╰───╮ ╭────╮
   0.418┼         ╰─╯    ╰──╮╭╮ ╭────╮    ╭╮  ╭╮  ╭─╮╭─╮      ╭─╮           ╭╮
   0.366┼                   ╰╯╰─╯    ╰────╯╰──╯╰──╯ ╰╯ ╰──────╯ ╰───╮╭──────╯╰╮╭───────╮╭─╮╭─────╮ ╭───────╮ ╭──────╮    ╭────╮  ╭──╮ ╭──╮ ╭─────╮    ╭─────
   0.315┼                                                           ╰╯        ╰╯       ╰╯ ╰╯     ╰─╯       ╰─╯      ╰────╯    ╰──╯  ╰─╯  ╰─╯     ╰────╯
        └┬─────────────────────────────────────────────────────────────────────────┬────────────────────────────────────────────────────────────────────────┬
         1                                                                        53                                                                      105

  train/grad_norm  (1 – 105)  1.575 → 0.2752
    1.65┼──╮
    1.37┼  ╰╮
    1.09┼   ╰───╮
   0.808┼       ╰─╮
   0.529┼         ╰──────╮
    0.25┼                ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
        └┬─────────────────────────────────────────────────────────────────────────┬────────────────────────────────────────────────────────────────────────┬
         1                                                                        53                                                                      105

  train/learning_rate  (1 – 105)  1e-05 → 2.238e-09
   1e-05┼────────────────────────────────────────────────────────────────────────────────╮
1.86e-06┼                                                                                ╰────────────────────────────────────────╮
3.47e-07┼                                                                                                                         ╰──────────────╮
6.46e-08┼                                                                                                                                        ╰──────╮
 1.2e-08┼                                                                                                                                               ╰──╮
2.24e-09┼                                                                                                                                                  ╰
        └┬─────────────────────────────────────────────────────────────────────────┬────────────────────────────────────────────────────────────────────────┬
         1                                                                        53                                                                      105

  eval/loss  (21 – 105)  5.939 → 5.824
    5.94┼────╮
    5.92┼    ╰───────╮
    5.89┼            ╰───────╮
    5.87┼                    ╰───────╮
    5.85┼                            ╰───────╮
    5.82┼                                    ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────
        └┬─────────────────────────────────────────────────────────────────────────┬────────────────────────────────────────────────────────────────────────┬
        21                                                                        63                                                                      105
together-py> uv run together fine-tuning retrieve ft-b71e...

...
Lr Scheduler:                Lr Scheduler Type: cosine
                             Lr Scheduler Args: Min Lr Ratio: n/a
                                                Num Cycles:   0.5
Multimodal Params:           Train Vision: n/a
Progress:                    Estimate Available: False
                             Seconds Remaining:  0
Training Method:             Method:          sft
                             Train On Inputs: auto
Training Type:               Lora Alpha:             128
                             Lora R:                 64
                             Type:                   Lora
                             Lora Dropout:           n/a
                             Lora Trainable Modules: k_proj,up_proj,o_proj,q_proj,down_proj,v_proj,gate_proj
Checkpoints:

Training metrics:
           train/loss  ██▇▇▇▆▅▅▅▅▃▃▄▅▄▄▄▄▃▂▂▃▂▂▄▄▃▃▃▂▁▂▂▂▃▂▂▂▃▂▁▁▂▃▂▃▂▂▂▁▁▂▁▂▃▃▂▂▂▁▁▁▁▂▂▁▁▂▂▁▁▁▂▂▂▂▂▁▁▁▁▁ ▂▂▂▂▂▁ ▁▁▁▁▁▁▁▂▂  ▁▂▁▂▂▁▁▁ ▁ ▁▂▂▂▂▂▁ ▁▁▁▁▁▁▁▂▁  ▁▂▁▂▂▁▁  ▁ ▁▂▂▂▂▂  0.5725 → 0.3647
      train/grad_norm  ███▇▅▄▄▅▄▃▂▂▂▂▁▁▁▁▁               ▁                                                                                                                   1.575 → 0.2752
  train/learning_rate  ██████████████████████████████████████████████████████████████████▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▅▅▅▅▅▅▅▅▅▅▅▄▄▄▄▄▄▃▃▃▃▂▂▁   1e-05 → 2.238e-09
            eval/loss  ███▇▇▇▇▇▆▆▆▆▆▅▅▅▅▅▄▄▄▄▄▃▃▃▃▃▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁                                                                                                       5.939 → 5.824


FT Events:
  Total events: 26
  To see event log...

@artek0chumak artek0chumak force-pushed the artekchumak/mosh-2342-metric-plots-in-the-client branch 3 times, most recently from 5f57c82 to 1c34308 Compare May 1, 2026 16:05
@artek0chumak artek0chumak force-pushed the artekchumak/mosh-2342-metric-plots-in-the-client branch from 948bbfd to 7aa34f9 Compare May 14, 2026 09:51
@artek0chumak artek0chumak changed the title [WIP] Fix FT Service issue Add fancy Metrics plot to the together-py May 14, 2026
@artek0chumak artek0chumak marked this pull request as ready for review May 14, 2026 12:15
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this file?

Comment on lines +20 to +24
global_step_from: Annotated[int | Omit, Parameter(help="Filter metrics from this global step (inclusive).")] = omit,
global_step_to: Annotated[int | Omit, Parameter(help="Filter metrics to this global step (inclusive).")] = omit,
logged_at_from: Annotated[datetime | Omit, Parameter(help="Filter metrics logged at or after this time.")] = omit,
logged_at_to: Annotated[datetime | Omit, Parameter(help="Filter metrics logged at or before this time.")] = omit,
resolution: Annotated[int | Omit, Parameter(help="Number of data points to return (used for JSON output).")] = omit,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make these default to None instead? Then you don't need the union on Omit either. Currently the help docs look like this:

Image

I could fix this to not look so bad for omit, but i think the cyclopts internals handle None better for us

Comment on lines +28 to +54
if config.json:
response = await show_loading_status(
"Fetching metrics...",
config.client.fine_tuning.list_metrics(
fine_tune_id,
global_step_from=global_step_from,
global_step_to=global_step_to,
logged_at_from=logged_at_from,
logged_at_to=logged_at_to,
resolution=resolution,
),
)
console.print_json(openapi_dumps(response.metrics or []).decode("utf-8"))
return

# For the ASCII chart always fetch at terminal width resolution for best fidelity.
response = await show_loading_status(
"Fetching metrics...",
config.client.fine_tuning.list_metrics(
fine_tune_id,
global_step_from=global_step_from,
global_step_to=global_step_to,
logged_at_from=logged_at_from,
logged_at_to=logged_at_to,
resolution=console.width - METRICS_WIDTH_PADDING,
),
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if config.json:
response = await show_loading_status(
"Fetching metrics...",
config.client.fine_tuning.list_metrics(
fine_tune_id,
global_step_from=global_step_from,
global_step_to=global_step_to,
logged_at_from=logged_at_from,
logged_at_to=logged_at_to,
resolution=resolution,
),
)
console.print_json(openapi_dumps(response.metrics or []).decode("utf-8"))
return
# For the ASCII chart always fetch at terminal width resolution for best fidelity.
response = await show_loading_status(
"Fetching metrics...",
config.client.fine_tuning.list_metrics(
fine_tune_id,
global_step_from=global_step_from,
global_step_to=global_step_to,
logged_at_from=logged_at_from,
logged_at_to=logged_at_to,
resolution=console.width - METRICS_WIDTH_PADDING,
),
)
# For the ASCII chart always fetch at terminal width resolution for best fidelity.
resolution_value = console.width - METRICS_WIDTH_PADDING if config.json else resolution
response = await show_loading_status(
"Fetching metrics...",
config.client.fine_tuning.list_metrics(
fine_tune_id,
global_step_from=global_step_from,
global_step_to=global_step_to,
logged_at_from=logged_at_from,
logged_at_to=logged_at_to,
resolution=resolution_value,
),
)
if config.json:
console.print_json(openapi_dumps(response.metrics or []).decode("utf-8"))
return

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a bit cleaner I think? not a big deal

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, cleaner, applied

fine_tune_id: str,
*,
config: CLIConfigParameter,
plots: Annotated[bool, Parameter(help="Print training metric sparklines.")] = True,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cyclopts converts bools to an auto negative version. So if you do --help on this you'll see you can pass either --plots or --no-plots.

since --plots is the default true it basically does nothing to pass --plots and the user would have to pass --no-plots. Given there is a default of True I would say we should change the behavior to be only a negative param that defaults to false like this:

Suggested change
plots: Annotated[bool, Parameter(help="Print training metric sparklines.")] = True,
no_plots: Annotated[bool, Parameter(help="Print training metric sparklines.", negative=())] = False,

"render_line_chart",
"render_sparklines",
"should_log",
]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move this from utils to components? I'm trying to put presentational utilities in the components folder

help_epilogue=FINE_TUNING_DOWNLOAD_HELP_EXAMPLES,
)
fine_tuning_app.command((f"{_CLI}.fine_tuning.delete:delete"), alias="-d", help="Delete a fine-tuning job")
fine_tuning_app.command(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest adding some examples to the usage like we do with other commands that have different parameters

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added help

Copy link
Copy Markdown
Contributor

@blainekasten blainekasten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great! Several minor nits but nothing too mind bending

global_step_to: Annotated[int | Omit, Parameter(help="Filter metrics to this global step (inclusive).")] = omit,
logged_at_from: Annotated[datetime | Omit, Parameter(help="Filter metrics logged at or after this time.")] = omit,
logged_at_to: Annotated[datetime | Omit, Parameter(help="Filter metrics logged at or before this time.")] = omit,
resolution: Annotated[int | Omit, Parameter(help="Number of data points to return (used for JSON output).")] = omit,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we want to expose this parameter at all. we can simply use resolution=None for any request, we rarely have jobs more than 20k steps, it's not that much. Originally the motivation to have it was the UI (so it can quickly fetch lets say 100 metrics and render them quickly too)

resolution=resolution,
),
)
console.print_json(openapi_dumps(response.metrics or []).decode("utf-8"))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are se sure we want to print them all?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we can separate the list into two commands - one is download (as we do for files, either output or stdout) and the second is plot.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added --save option to save metrics into the file

global_step_to=global_step_to,
logged_at_from=logged_at_from,
logged_at_to=logged_at_to,
resolution=console.width - METRICS_WIDTH_PADDING,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does happen if we set lets say resolution 1000 or no resolution at all? Will the plot be the same size with just some "grouping" of the metrics or plot size will explode?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _engine has a logic to compress the plot to the specific width, so it's not a big deal. However, if we can request fewer points to render -- it's better to do so.

fine_tune_id: str,
*,
config: CLIConfigParameter,
plots: Annotated[bool, Parameter(help="Print training metric sparklines.")] = True,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plot_training_curves as an alternative

)
metrics = metrics_response.metrics or []
except Exception:
# Metrics are optional; silently skip if unavailable.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure we want to skip silently here?


def _get_step(row: dict[str, Any], fallback: int) -> int:
"""Extract global step, trying several field names before falling back to index."""
gs = row.get("global_step", row.get("train/global_step", row.get("step")))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this nested get is a bit hard to read. can we probably loop over the keys here?

to ``-inf`` so the rendering engine plots them at the very bottom of the
chart rather than silently dropping them.
"""
series: dict[str, tuple[list[float], list[float]]] = {}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be defaultdict

Comment thread tests/test_plots_engine.py Outdated

def should_log(vals: list[float]) -> bool:
"""Return True when values span more than 100×, suggesting log scale."""
nz = [v for v in vals if v > 0]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nz = non zero? if so it's not quite accurate. I'd suggest positive_vals

Non-finite values (e.g. the -inf sentinel used for NaN data points) are
excluded from the range computation so they don't corrupt the grid.
"""
finite = [v for v in vals if math.isfinite(v)]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

finite_vals as an alternative

logged_at_to: Annotated[Optional[datetime], Parameter(help="Filter metrics logged at or before this time.")] = None,
resolution: Annotated[
Optional[int],
Parameter(help="Number of training metric points to return. Does not limit the number of eval metric points."),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the "uniformly sampled training metrics" or something

from together.lib.cli.components.plot_finetune_metrics import METRICS_WIDTH_PADDING, metrics_ascii_charts


async def list_metrics(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still not sure if we want to keep it as one command. just as an example, lets say I want to only save metrics to the file but using this command I would get a bunch of plots I can be not interested in

Wouldn't it be more convenient for us to have save_metrics and plot_metrics? What do you think?

@artek0chumak artek0chumak requested a review from timofeev1995 May 18, 2026 08:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants