Skip to content
View daoluzixin's full-sized avatar
🎯
Focusing
🎯
Focusing
  • UESTC
  • 02:52 (UTC +08:00)

Block or report daoluzixin

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
daoluzixin/README.md

Hi, I'm daoluzixin 👋

Backend developer focused on AI Agent application development. Deeply interested in LLMs and reinforcement learning, currently exploring from the engineering side toward the algorithm side.

兴趣方向:

  • AI Agent — 工具调用、多 Agent 协作、RAG 检索增强、记忆系统、Skill、Harness 工程
  • 数据工程 — MinHash 去重、PPL、Self-Instruct / Evol-Instruct 数据合成、数据飞轮
  • 大模型训练 — Pretrain → SFT → RLHF 全流程,LoRA / QLoRA 参数高效微调
  • 强化学习 — GRPO、过程奖励塑形、On-Policy 自蒸馏(OPD)
  • 后端工程 — Spring Boot、MySQL、Redis、高可用服务设计

🐍 Contribution Snake

github-snake

Pinned Loading

  1. prompt-flywheel prompt-flywheel Public

    🔄 Prompt-Data Co-Evolution Flywheel | 评估驱动的 Prompt 与数据协同迭代框架,人工审核闭环 + GT 版本化

    Python 4

  2. MiniResearcher MiniResearcher Public

    🔍 在 verl 上对 DeepResearcher 做系统性改进:用 PBRS 稠密过程奖励替代稀疏终局 F1,LoRA + Dr.GRPO 让小模型 RL 训练可控可复现

    Python 2