Skip to content

invalid reduced-logits token -1 #27

@l3ateman

Description

@l3ateman

Not too sure if it's a bug but looks like it?

using 0.2.0 pre-compiled releases for windows and the following parameters (gotten from quickstart).

I have a 4090 using CUDA 12 version. I was reserving about 22-23 GB of VRAM

.\llama-server.exe `
  -m "models\gemma-4-31B-it-Q4_K_S.gguf" `
  --spec-draft-model "models\gemma4-31b-it-dflash-Q4_K_M.gguf" `
  --spec-type dflash `
  --spec-dflash-cross-ctx 1024 `
  --port 8082 `
  -np 1 `
  --kv-unified `
  -ngl all `
  --spec-draft-ngl all `
  -b 2048 -ub 512 `
  --ctx-size 32768 `
  --cache-type-k q5_0 --cache-type-v q4_1 `
  --flash-attn on `
  --cache-ram 0 `
  --jinja `
  --no-mmap --mlock `
  --no-host `
  --reasoning on `
  --temp 1.0 --top-k 64 --top-p 0.95 --min-p 0.0

After about two-three generation I get the following:

�[0mslot  operator (): id  0 | task 467 | adaptive dm profit: cur=0 recommended=16 score=15.4 action=apply
sched_reserve: reserving ...
sched_reserve:      CUDA0 compute buffer size =   139.88 MiB
sched_reserve:  CUDA_Host compute buffer size =     5.44 MiB
sched_reserve: graph nodes  = 185
sched_reserve: graph splits = 2
sched_reserve: reserve took 1.99 ms, sched copies = 1
slot  operator (): id  0 | task 467 | adaptive dm profit: cur=16 recommended=14 score=22.0 action=apply
slot  operator (): id  0 | task 467 | adaptive dm profit: cur=14 recommended=12 score=24.4 action=apply
slot  operator (): id  0 | task 467 | adaptive dm profit: cur=12 recommended=10 score=24.6 action=apply
sched_reserve: reserving ...
sched_reserve:      CUDA0 compute buffer size =   139.88 MiB
sched_reserve:  CUDA_Host compute buffer size =     5.50 MiB
sched_reserve: graph nodes  = 185
sched_reserve: graph splits = 2
sched_reserve: reserve took 2.31 ms, sched copies = 1
slot  operator (): id  0 | task 467 | adaptive dm profit: cur=10 recommended=8 score=22.8 action=apply
slot  operator (): id  0 | task 467 | adaptive dm profit: cur=8 recommended=7 score=23.2 action=apply
slot  operator (): id  0 | task 467 | adaptive dm profit: cur=7 recommended=6 score=25.4 action=apply
slot  operator (): id  0 | task 467 | adaptive dm profit: cur=6 recommended=5 score=25.5 action=apply
slot  operator (): id  0 | task 467 | adaptive dm profit: cur=5 recommended=4 score=23.6 action=apply
sched_reserve: reserving ...
sched_reserve:      CUDA0 compute buffer size =   139.88 MiB
sched_reserve:  CUDA_Host compute buffer size =     5.57 MiB
sched_reserve: graph nodes  = 185
sched_reserve: graph splits = 2
sched_reserve: reserve took 2.05 ms, sched copies = 1
dflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=386 cross_len=386)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=387 cross_len=387)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=388 cross_len=388)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=389 cross_len=389)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=390 cross_len=390)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=391 cross_len=391)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/6 (top_k=1 committed=392 cross_len=392)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=393 cross_len=393)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=394 cross_len=394)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=395 cross_len=395)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=396 cross_len=396)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=397 cross_len=397)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=398 cross_len=398)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=399 cross_len=399)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=400 cross_len=400)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=401 cross_len=401)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=402 cross_len=402)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=403 cross_len=403)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/6 (top_k=1 committed=404 cross_len=404)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=405 cross_len=405)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=406 cross_len=406)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=407 cross_len=407)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=408 cross_len=408)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=409 cross_len=409)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=410 cross_len=410)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=411 cross_len=411)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=412 cross_len=412)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=413 cross_len=413)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=414 cross_len=414)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=415 cross_len=415)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/16 (top_k=1 committed=416 cross_len=416)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=417 cross_len=417)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=418 cross_len=418)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=419 cross_len=419)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=420 cross_len=420)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=421 cross_len=421)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=422 cross_len=422)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=423 cross_len=423)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=424 cross_len=424)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=425 cross_len=425)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=426 cross_len=426)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=427 cross_len=427)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/6 (top_k=1 committed=428 cross_len=428)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=429 cross_len=429)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=430 cross_len=430)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=431 cross_len=431)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=432 cross_len=432)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=433 cross_len=433)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=434 cross_len=434)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=435 cross_len=435)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=436 cross_len=436)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=437 cross_len=437)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=438 cross_len=438)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=439 cross_len=439)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/6 (top_k=1 committed=440 cross_len=440)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=441 cross_len=441)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=442 cross_len=442)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=443 cross_len=443)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=444 cross_len=444)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=445 cross_len=445)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=446 cross_len=446)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=447 cross_len=447)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=448 cross_len=448)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=449 cross_len=449)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=450 cross_len=450)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=451 cross_len=451)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/6 (top_k=1 committed=452 cross_len=452)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=453 cross_len=453)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=454 cross_len=454)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=455 cross_len=455)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=456 cross_len=456)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=457 cross_len=457)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=458 cross_len=458)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=459 cross_len=459)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=460 cross_len=460)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=461 cross_len=461)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=462 cross_len=462)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=463 cross_len=463)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/16 (top_k=1 committed=464 cross_len=464)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=465 cross_len=465)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=466 cross_len=466)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=467 cross_len=467)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=468 cross_len=468)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=469 cross_len=469)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=470 cross_len=470)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=471 cross_len=471)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=472 cross_len=472)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=473 cross_len=473)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=474 cross_len=474)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=475 cross_len=475)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/6 (top_k=1 committed=476 cross_len=476)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=477 cross_len=477)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=478 cross_len=478)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=479 cross_len=479)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=480 cross_len=480)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=481 cross_len=481)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=482 cross_len=482)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=483 cross_len=483)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=484 cross_len=484)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=485 cross_len=485)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=486 cross_len=486)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=487 cross_len=487)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/6 (top_k=1 committed=488 cross_len=488)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=489 cross_len=489)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=490 cross_len=490)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=491 cross_len=491)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=492 cross_len=492)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=493 cross_len=493)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=494 cross_len=494)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=495 cross_len=495)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=496 cross_len=496)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=497 cross_len=497)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=498 cross_len=498)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=499 cross_len=499)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/6 (top_k=1 committed=500 cross_len=500)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=501 cross_len=501)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=502 cross_len=502)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=503 cross_len=503)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=504 cross_len=504)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=505 cross_len=505)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=506 cross_len=506)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=507 cross_len=507)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=508 cross_len=508)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=509 cross_len=509)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=510 cross_len=510)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/5 (top_k=1 committed=511 cross_len=511)
�[0mdflash: invalid reduced-logits token -1 in draft at row=1/16 (top_k=1 committed=512 cross_len=512)
�[0msched_reserve: reserving ...
sched_reserve:      CUDA0 compute buffer size =   139.88 MiB
sched_reserve:  CUDA_Host compute buffer size =     5.63 MiB
sched_reserve: graph nodes  = 185
sched_reserve: graph splits = 2
sched_reserve: reserve took 1.91 ms, sched copies = 1

Is this normal behavior ? The last part repeats 3x

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions