Skip to content
View rookieC511's full-sized avatar

Block or report rookieC511

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
rookieC511/README.md

常城 | LLM Post-training / Agentic RL / Reward Modeling

华东师范大学计算机科学与技术硕士在读。当前关注方向:

  • 大模型后训练与偏好优化
  • Tool Agent 安全边界验证
  • Process Reward Model / Verifier-Ranker
  • Agentic RL 评测与数据构造

Featured Project

BoundaryVerifier: Action-Level Verifier and PRM-Style Ranker for Tool Agents

Public showcase: boundaryverifier-agent-prm

这个项目研究 Tool Agent 在执行工具调用、外部消息、状态更新、受保护资源访问等动作前,如何判断下一步动作是否应该继续、拦截或澄清,并进一步把 verifier 扩展为同状态多候选动作的 PRM-style ranker。

Latest controlled replay result:

35 groups / 210 candidates
top1 oracle-best: 33/35
top1 allowed: 35/35
unsafe top1: 0
safety macro-F1: 0.9934
ESCALATE recall: 1.0000

Technical Interests

SFT / LoRA / DPO / GRPO
Reward Model / PRM / verifier-guided optimization
Agent routing / verification / stopping / calibration
policy-aware evaluation and error slicing

Popular repositories Loading

  1. -Yarn- -Yarn- Public

    Shell 1

  2. FactWeaver-Agent FactWeaver-Agent Public

    Python 1

  3. github-slideshow github-slideshow Public

    A robot powered training repository 🤖

    Ruby

  4. ostep-typos ostep-typos Public

    Forked from remzi-arpacidusseau/ostep-typos

  5. rookieC511 rookieC511 Public

    Config files for my GitHub profile.

  6. standard-readme standard-readme Public

    Forked from RichardLitt/standard-readme

    A standard style for README files

    Shell