-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathoutput-2.html
More file actions
61 lines (60 loc) · 7.58 KB
/
output-2.html
File metadata and controls
61 lines (60 loc) · 7.58 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="">
<head>
<title>Page 2</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
<style type="text/css">
<!--
p {margin: 0; padding: 0;} .ft00{font-size:16px;font-family:DAQRFI+TeXGyreTermesX;color:#4471c4;}
.ft01{font-size:16px;font-family:DAQRFI+TeXGyreTermesX;color:#000000;}
.ft02{font-size:14px;font-family:DWRGXI+LMRoman10;color:#000000;}
.ft03{font-size:16px;font-family:DWRGXI+LMRoman10;color:#000000;}
.ft04{font-size:16px;font-family:JYGKUD+LatinModernMath;color:#000000;}
.ft05{font-size:16px;line-height:16px;font-family:DWRGXI+LMRoman10;color:#000000;}
-->
</style>
</head>
<body bgcolor="#A0A0A0" vlink="blue" link="blue">
<div id="page2-div" style="position:relative;width:918px;height:1188px;">
<img width="918" height="1188" src="output002.png" alt="background image"/>
<p style="position:absolute;top:45px;left:64px;white-space:nowrap" class="ft00"><b>RESEARCH PROJECTS</b></p>
<p style="position:absolute;top:74px;left:64px;white-space:nowrap" class="ft01"><b>Multi-Agent System Acceleration via Speculation and Routing</b></p>
<p style="position:absolute;top:75px;left:732px;white-space:nowrap" class="ft02">Apr 2026 – ongoing</p>
<p style="position:absolute;top:94px;left:58px;white-space:nowrap" class="ft03">• Built a multi-agent system benchmark supporting human-agent interactions and tool use (e.g.,</p>
<p style="position:absolute;top:94px;left:734px;white-space:nowrap" class="ft04">𝜏</p>
<p style="position:absolute;top:94px;left:742px;white-space:nowrap" class="ft03">-bench).</p>
<p style="position:absolute;top:112px;left:58px;white-space:nowrap" class="ft03">• Designing a speculation-and-routing paradigm to reduce latency in multi-agent inference pipelines.</p>
<p style="position:absolute;top:141px;left:64px;white-space:nowrap" class="ft01"><b>Tensor-Rank-Guided Steering and Routing for High-performance SRM Reasoning</b></p>
<p style="position:absolute;top:142px;left:714px;white-space:nowrap" class="ft02">Jan 2026. – Mar 2026.</p>
<p style="position:absolute;top:161px;left:58px;white-space:nowrap" class="ft03">• Proposed RankGuide, a framework that leverages tensor-rank signals from hidden states to accelerate LRMs</p>
<p style="position:absolute;top:176px;left:77px;white-space:nowrap" class="ft03">reasoning, which can yield up to</p>
<p style="position:absolute;top:176px;left:316px;white-space:nowrap" class="ft04">1.75×</p>
<p style="position:absolute;top:176px;left:363px;white-space:nowrap" class="ft03">and</p>
<p style="position:absolute;top:176px;left:395px;white-space:nowrap" class="ft04">1.36×</p>
<p style="position:absolute;top:176px;left:442px;white-space:nowrap" class="ft03">latency benefit compared to LRM and SoTA collaborative</p>
<p style="position:absolute;top:191px;left:77px;white-space:nowrap" class="ft03">inference framework, while maintaining or improving the accuracy.</p>
<p style="position:absolute;top:207px;left:58px;white-space:nowrap" class="ft03">• Designed a tensor-rank scoring metric on step-level hidden states to detect low-quality reasoning steps and selec-</p>
<p style="position:absolute;top:222px;left:77px;white-space:nowrap" class="ft03">tively route them to larger models, improving the accuracy–latency trade-off.</p>
<p style="position:absolute;top:239px;left:58px;white-space:nowrap" class="ft03">• Developed a rank-based calibration pipeline that filters low-rank samples to construct high-quality steering vectors</p>
<p style="position:absolute;top:254px;left:77px;white-space:nowrap" class="ft03">for inference-time hidden-state intervention, encouraging concise and stable reasoning.</p>
<p style="position:absolute;top:283px;left:64px;white-space:nowrap" class="ft01"><b>Structural Pruning for Efficient LLM Inference via Low-rank Decomposition</b></p>
<p style="position:absolute;top:284px;left:721px;white-space:nowrap" class="ft02">Aug 2024 - May 2025</p>
<p style="position:absolute;top:303px;left:58px;white-space:nowrap" class="ft03">• Developed FLAT-LLM, a training-free, fine-grained compression method that leverages the low-rank structure of</p>
<p style="position:absolute;top:318px;left:77px;white-space:nowrap" class="ft03">the activation space to transform and compress the model weights.</p>
<p style="position:absolute;top:335px;left:58px;white-space:nowrap" class="ft03">• Introduced a novel training-free rank selection algorithm that allocates ranks using a greedy redistribution strategy</p>
<p style="position:absolute;top:350px;left:77px;white-space:nowrap" class="ft03">and can be integrated with existing low-rank LLM compression pipelines.</p>
<p style="position:absolute;top:368px;left:58px;white-space:nowrap" class="ft03">• Achieved strong performance on LLaMA-2, 3 and Mistral models with minimal calibration overhead (within</p>
<p style="position:absolute;top:385px;left:77px;white-space:nowrap" class="ft03">minutes), validated across language modeling and downstream tasks.</p>
<p style="position:absolute;top:414px;left:64px;white-space:nowrap" class="ft01"><b>Binary-Quantized Ensemble LLM for Fast and Robust Language Model Inference</b></p>
<p style="position:absolute;top:415px;left:727px;white-space:nowrap" class="ft02">Apr 2021 - Jun 2023</p>
<p style="position:absolute;top:434px;left:58px;white-space:nowrap" class="ft05">• Developed BEBERT, a novel quantization-ensemble strategy enabling efficient and accurate 1-bit BERT inference.<br/>• Leveraged efficient knowledge distillation strategy for high training efficiency.<br/>• Achieved</p>
<p style="position:absolute;top:467px;left:144px;white-space:nowrap" class="ft04">13×</p>
<p style="position:absolute;top:467px;left:177px;white-space:nowrap" class="ft03">model size reduction and</p>
<p style="position:absolute;top:467px;left:354px;white-space:nowrap" class="ft04">15×</p>
<p style="position:absolute;top:467px;left:387px;white-space:nowrap" class="ft03">compute savings over standard BERT with minimal accuracy loss.</p>
<p style="position:absolute;top:485px;left:58px;white-space:nowrap" class="ft03">• Proposed early-exit inference variant, further cutting compute by</p>
<p style="position:absolute;top:485px;left:532px;white-space:nowrap" class="ft04">20% ∼ 40%</p>
<p style="position:absolute;top:485px;left:615px;white-space:nowrap" class="ft03">on GLUE benchmark.</p>
</div>
</body>
</html>