Professional Experience
TikTok Monetization GenAI – ByteDance
Oct. 2023 – Present
Role Overview: Led post-training of TikTok Monetization VLMs across 7B, 14B, and 30B-A3B scales through SFT, RL, and Agentic RL. The models power multimodal ad creative generation and understanding, including video/image remixing, copywriting, voiceover, and creative optimization. Continuously leveraged ad delivery feedback signals to optimize model behavior and improve advertising ROI.
SFT Post-Training for VLM Foundation Capabilities
- Objective: Built a general-purpose VLM for TikTok monetization, targeting top-tier open-source benchmark performance and the strongest in-house model performance at the same size.
- Approach: Standardized capability taxonomy and quality evaluation; built an automated data synthesis and annotation pipeline; delivered a 35B high-quality training dataset; established capability-mixture and data-scaling curves; designed task-specific data pies and business-oriented benchmarks for downstream teams.
- Impact: Injected reasoning and thinking capabilities into the base model; improved the overall general benchmark score from 64.58 to 74.21, a 9.62 points gain; deployed the 14B model across multiple business scenarios including video remixing and script generation, achieving in-house SOTA at the same model size.
RL Post-Training for Reasoning, Alignment, and Business Optimization
- Objective: Improved model reasoning, alignment, and agentic capabilities while using RL with ad-delivery posterior signals to continuously optimize VLM performance for monetization scenarios.
- Challenges: VLM RL remained technically immature, with heterogeneous multimodal and text spaces causing training instability; ad delivery systems were black-box environments, making it difficult to optimize generative models from posterior delivery data; Agentic RL was still exploratory and required end-to-end system design.
- Approach: Ran dual-track RL training, using GRPO for verifiable tasks and DPO for open-ended tasks, followed by staged model merging; optimized reward signals for open-ended generation; built offline rollout and evaluation pipelines for RL data; denoised business-side delivery signals through statistical denoising and feature anchoring to train reward models while avoiding spurious-feature learning; trained delivery reward models and optimized the generator with DPO and GRPO; proposed Robust Reward Normalization to address reward concentration in GRPO+RM training; Recent Research improved Search R1 style Deep Research systems EM Acc from 49% to 71%.
- Impact: Improved the overall metric from 63.44 to 66.03 after DPO deployment and raised MMMU from 51.66 to 53.11; further improved the Thinking RL benchmark to 75.21 with GRPO; deployed delivery-signal-based RM and RL training, driving +3% AdvV business gain; achieved a 10% validation improvement over the original model with GRPO+RM training.
Advertising Delivery + Large Models
- Objective and Challenges: Integrated AIGC-generated copy into the fine-ranking stage to enable personalized ad creatives and improve advertiser value. Key challenges included the excessive user-creative Cartesian product for large-scale personalization and the lack of mature modeling approaches for user-aware generation beyond plain-text feature injection.
- Approach and Results: Designed a staged optimization framework from creative derivation to personalized creative generation; constructed CTR preference pairs to train reward models, improving RM AUC from 0.78 to 0.81; compressed user features through VQ-VAE-based RM and projected them into LLM joint training; improved offline generation diversity from 38 to 55.
TikTok Monetization Ads – ByteDance
Feb. 2022 – Oct. 2023
Role Overview: Built user matching and positive-sample construction systems for ad conversion events under overseas privacy-compliance constraints, providing critical training signals for CVR/CTR models. Improved high-value user identification through deterministic matching and probabilistic recall.
- Objective and Approach: Improved user match rate and ad delivery performance under sparse-feature conditions. Led end-to-end optimization across recall, features, and models; built configurable recall and experimentation frameworks supporting high-concurrency multi-experiment execution; added and optimized features for recalled users to improve the prediction performance of the base DNN model.
- Impact: Increased ID match rate by approximately 10%, significantly improving positive-sample coverage and downstream model training quality.
Strengths
- End-to-end SFT and RL post-training experience: Led the full SFT/RL alignment lifecycle for small- and mid-sized multimodal foundation models. Designed data strategies, DPO/GRPO algorithms, Agentic RL workflows, and end-to-end training pipelines tailored to smaller models with lower fault tolerance and higher data-efficiency requirements. Achieved continuous gains in reasoning capability and overall benchmark performance under limited parameter budgets, validating the generalizability of the VLM RL design across model sizes. Strong at designing solutions that balance algorithmic iteration with business delivery needs.
- Strong ownership and delivery track record: Promoted twice within four years at ByteDance, with multiple high-performance ratings including M+ and E. Consistently exceeded core model-iteration goals in early-stage projects with limited resources and immature infrastructure, earning recognition as a core contributor.
Education
Nanyang Technological University
Ph.D. Program, Computer Science and Technology, discontinued
Aug. 2021 – Feb. 2022
- Publication: Gao, Guanyu, Chengru Song, Asela Bandara, Meng Shen, Fan Yang, Wolf Posdorfer, Dacheng Tao, and Yonggang Wen. “FogChain: a Blockchain-based Peer-to-Peer Solar Power Trading System Powered by Fog AI.” IEEE Internet of Things Journal, 2021.
Beijing University of Posts and Telecommunications
M.S., Cyberspace Security
Sept. 2018 – Jun. 2021
- Honors: Outstanding Graduate of Beijing University of Posts and Telecommunications, 2021; National Second Prize in the Graduate Mathematical Contest in Modeling, top 13%.
Chongqing University
B.S., Information Security
Sept. 2014 – Jun. 2018