Weijia Zhang Resume Materials Audit
Weijia Zhang Resume Materials Audit
Fit Verdict
Strong fit for LLM agent / multimodal reasoning / AI research internship roles.
Evidence in files/ strongly supports a profile around LLM agents, multimodal reasoning, CUA debugging, benchmark/evaluation design, SFT/RL data construction, and multi-agent systems. The main risk is that several industry metrics in the original resume are not backed by files in this folder, so the polished resume keeps them either conservative or clearly framed as internal validation.
Papers Fully Read
SeeingEye_Agentic_Information_Flow_Unlocks_Multimodal_Reasoning_in_Text_Only_LLMs.pdf- 18 pages.How_Much_Vision_Does_Multimodal_Reasoning_Need.pdf- 25 pages.Every_Act_Has_Its_Price.pdf- 15 pages.CUADebug_Diagnosing_and_Repairing_Computer_Use_Agent_Failures.pdf- 23 pages.12843_COGENT_Multiagent_Large_.pdf- 14 pages.
Strong Evidence Used In Resume
| Resume claim | Evidence | Status |
|---|---|---|
| CUADebug taxonomy has 4 top-level modules and 29 subtypes | CUADebug paper, Table 1 and Appendix A.6 | Can write |
| CUAErrorBench has 184 human-annotated OSWorld failure trajectories | CUADebug abstract, Section 2, Appendix A.4 | Can write |
| CUADebugger uses paired before/after screenshot inspection and structured RCA submission | CUADebug Section 3.1 and Appendix A.1 | Can write |
| Continual re-rollout improves from 12.20% direct continuation to 25.86%, human RCA 29.21% | CUADebug Table 4 | Can write |
| SeeingEye uses translator/reasoner split, SIR, and multi-round feedback | SeeingEye Sections 3-5 and Appendix A | Can write |
| Vision-Stripping Test profiles P0/T0/T1/V1/V2/VST-Full evidence paths | Vision-Stripping Sections 3-5 | Can write |
| Moral Composition uses two-stage blind ELO benchmark and 29,134 judgments | Every Act Sections 3-4 | Can write |
Conservative Or Downgraded Claims
| Original claim | Issue | Polished handling |
|---|---|---|
| Publications (100+ citation) | Citation count is not verifiable from local files and changes over time. | Removed count; used Selected Publications. |
| Tencent crash rate reduced by 120% | Percentage reduction beyond 100% is mathematically risky without a defined metric. | Rewritten as profiler/allocation tracing supporting crash investigation. |
| Character persona configuration time reduced by 300% | Wording is unclear; “reduced by 300%” is not a stable phrasing. | Rewritten as setup effort cut by roughly 3x. |
| Microsoft 75% Office Script accuracy improvement | Strong but not backed by paper/local logs. | Kept with “internal validation reported” wording; add evidence if available. |
| COGENT publication | PDF is anonymized and does not show authorship. | Not added to publication list without explicit confirmation. |
Evidence To Add Later
- Internal Microsoft evaluation table or note for the 75% Office Script code-generation improvement.
- Reborn experiment rubric or screenshots/logs for the 2/5 to 4/5 dialogue-coherence claim and sub-second latency.
- Tencent profiler report or bug/crash dashboard for 40+ allocation sites and memory-leak impact.
- Authorship confirmation for COGENT if it should be added as a publication or project.
- Verified Google Scholar citation count if the
100+ citationsheader should be restored.
