TTTTTTris.github.io/output-2.html at main · TTTTTTris/TTTTTTris.github.io · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="">
<head>
<title>Page 2</title>

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
<style type="text/css">
<!--
	p {margin: 0; padding: 0;}	.ft00{font-size:16px;font-family:DAQRFI+TeXGyreTermesX;color:#4471c4;}
	.ft01{font-size:16px;font-family:DAQRFI+TeXGyreTermesX;color:#000000;}
	.ft02{font-size:14px;font-family:DWRGXI+LMRoman10;color:#000000;}
	.ft03{font-size:16px;font-family:DWRGXI+LMRoman10;color:#000000;}
	.ft04{font-size:16px;font-family:JYGKUD+LatinModernMath;color:#000000;}
	.ft05{font-size:16px;line-height:16px;font-family:DWRGXI+LMRoman10;color:#000000;}
-->
</style>
</head>
<body bgcolor="#A0A0A0" vlink="blue" link="blue">
<div id="page2-div" style="position:relative;width:918px;height:1188px;">
<img width="918" height="1188" src="output002.png" alt="background image"/>
<p style="position:absolute;top:45px;left:64px;white-space:nowrap" class="ft00"><b>RESEARCH&#160;PROJECTS</b></p>
<p style="position:absolute;top:74px;left:64px;white-space:nowrap" class="ft01"><b>Multi-Agent&#160;System&#160;Acceleration&#160;via&#160;Speculation&#160;and&#160;Routing</b></p>
<p style="position:absolute;top:75px;left:732px;white-space:nowrap" class="ft02">Apr&#160;2026&#160;–&#160;ongoing</p>
<p style="position:absolute;top:94px;left:58px;white-space:nowrap" class="ft03">•&#160;Built&#160;a&#160;multi-agent&#160;system&#160;benchmark&#160;supporting&#160;human-agent&#160;interactions&#160;and&#160;tool&#160;use&#160;(e.g.,</p>
<p style="position:absolute;top:94px;left:734px;white-space:nowrap" class="ft04">𝜏</p>
<p style="position:absolute;top:94px;left:742px;white-space:nowrap" class="ft03">-bench).</p>
<p style="position:absolute;top:112px;left:58px;white-space:nowrap" class="ft03">•&#160;Designing&#160;a&#160;speculation-and-routing&#160;paradigm&#160;to&#160;reduce&#160;latency&#160;in&#160;multi-agent&#160;inference&#160;pipelines.</p>
<p style="position:absolute;top:141px;left:64px;white-space:nowrap" class="ft01"><b>Tensor-Rank-Guided&#160;Steering&#160;and&#160;Routing&#160;for&#160;High-performance&#160;SRM&#160;Reasoning</b></p>
<p style="position:absolute;top:142px;left:714px;white-space:nowrap" class="ft02">Jan&#160;2026.&#160;–&#160;Mar&#160;2026.</p>
<p style="position:absolute;top:161px;left:58px;white-space:nowrap" class="ft03">•&#160;Proposed&#160;RankGuide,&#160;a&#160;framework&#160;that&#160;leverages&#160;tensor-rank&#160;signals&#160;from&#160;hidden&#160;states&#160;to&#160;accelerate&#160;LRMs</p>
<p style="position:absolute;top:176px;left:77px;white-space:nowrap" class="ft03">reasoning,&#160;which&#160;can&#160;yield&#160;up&#160;to</p>
<p style="position:absolute;top:176px;left:316px;white-space:nowrap" class="ft04">1.75×</p>
<p style="position:absolute;top:176px;left:363px;white-space:nowrap" class="ft03">and</p>
<p style="position:absolute;top:176px;left:395px;white-space:nowrap" class="ft04">1.36×</p>
<p style="position:absolute;top:176px;left:442px;white-space:nowrap" class="ft03">latency&#160;benefit&#160;compared&#160;to&#160;LRM&#160;and&#160;SoTA&#160;collaborative</p>
<p style="position:absolute;top:191px;left:77px;white-space:nowrap" class="ft03">inference&#160;framework,&#160;while&#160;maintaining&#160;or&#160;improving&#160;the&#160;accuracy.</p>
<p style="position:absolute;top:207px;left:58px;white-space:nowrap" class="ft03">•&#160;Designed&#160;a&#160;tensor-rank&#160;scoring&#160;metric&#160;on&#160;step-level&#160;hidden&#160;states&#160;to&#160;detect&#160;low-quality&#160;reasoning&#160;steps&#160;and&#160;selec-</p>
<p style="position:absolute;top:222px;left:77px;white-space:nowrap" class="ft03">tively&#160;route&#160;them&#160;to&#160;larger&#160;models,&#160;improving&#160;the&#160;accuracy–latency&#160;trade-off.</p>
<p style="position:absolute;top:239px;left:58px;white-space:nowrap" class="ft03">•&#160;Developed&#160;a&#160;rank-based&#160;calibration&#160;pipeline&#160;that&#160;filters&#160;low-rank&#160;samples&#160;to&#160;construct&#160;high-quality&#160;steering&#160;vectors</p>
<p style="position:absolute;top:254px;left:77px;white-space:nowrap" class="ft03">for&#160;inference-time&#160;hidden-state&#160;intervention,&#160;encouraging&#160;concise&#160;and&#160;stable&#160;reasoning.</p>
<p style="position:absolute;top:283px;left:64px;white-space:nowrap" class="ft01"><b>Structural&#160;Pruning&#160;for&#160;Eﬀicient&#160;LLM&#160;Inference&#160;via&#160;Low-rank&#160;Decomposition</b></p>
<p style="position:absolute;top:284px;left:721px;white-space:nowrap" class="ft02">Aug&#160;2024&#160;-&#160;May&#160;2025</p>
<p style="position:absolute;top:303px;left:58px;white-space:nowrap" class="ft03">•&#160;Developed&#160;FLAT-LLM,&#160;a&#160;training-free,&#160;fine-grained&#160;compression&#160;method&#160;that&#160;leverages&#160;the&#160;low-rank&#160;structure&#160;of</p>
<p style="position:absolute;top:318px;left:77px;white-space:nowrap" class="ft03">the&#160;activation&#160;space&#160;to&#160;transform&#160;and&#160;compress&#160;the&#160;model&#160;weights.</p>
<p style="position:absolute;top:335px;left:58px;white-space:nowrap" class="ft03">•&#160;Introduced&#160;a&#160;novel&#160;training-free&#160;rank&#160;selection&#160;algorithm&#160;that&#160;allocates&#160;ranks&#160;using&#160;a&#160;greedy&#160;redistribution&#160;strategy</p>
<p style="position:absolute;top:350px;left:77px;white-space:nowrap" class="ft03">and&#160;can&#160;be&#160;integrated&#160;with&#160;existing&#160;low-rank&#160;LLM&#160;compression&#160;pipelines.</p>
<p style="position:absolute;top:368px;left:58px;white-space:nowrap" class="ft03">•&#160;Achieved&#160;strong&#160;performance&#160;on&#160;LLaMA-2,&#160;3&#160;and&#160;Mistral&#160;models&#160;with&#160;minimal&#160;calibration&#160;overhead&#160;(within</p>
<p style="position:absolute;top:385px;left:77px;white-space:nowrap" class="ft03">minutes),&#160;validated&#160;across&#160;language&#160;modeling&#160;and&#160;downstream&#160;tasks.</p>
<p style="position:absolute;top:414px;left:64px;white-space:nowrap" class="ft01"><b>Binary-Quantized&#160;Ensemble&#160;LLM&#160;for&#160;Fast&#160;and&#160;Robust&#160;Language&#160;Model&#160;Inference</b></p>
<p style="position:absolute;top:415px;left:727px;white-space:nowrap" class="ft02">Apr&#160;2021&#160;-&#160;Jun&#160;2023</p>
<p style="position:absolute;top:434px;left:58px;white-space:nowrap" class="ft05">•&#160;Developed&#160;BEBERT,&#160;a&#160;novel&#160;quantization-ensemble&#160;strategy&#160;enabling&#160;eﬀicient&#160;and&#160;accurate&#160;1-bit&#160;BERT&#160;inference.<br/>•&#160;Leveraged&#160;eﬀicient&#160;knowledge&#160;distillation&#160;strategy&#160;for&#160;high&#160;training&#160;eﬀiciency.<br/>•&#160;Achieved</p>
<p style="position:absolute;top:467px;left:144px;white-space:nowrap" class="ft04">13×</p>
<p style="position:absolute;top:467px;left:177px;white-space:nowrap" class="ft03">model&#160;size&#160;reduction&#160;and</p>
<p style="position:absolute;top:467px;left:354px;white-space:nowrap" class="ft04">15×</p>
<p style="position:absolute;top:467px;left:387px;white-space:nowrap" class="ft03">compute&#160;savings&#160;over&#160;standard&#160;BERT&#160;with&#160;minimal&#160;accuracy&#160;loss.</p>
<p style="position:absolute;top:485px;left:58px;white-space:nowrap" class="ft03">•&#160;Proposed&#160;early-exit&#160;inference&#160;variant,&#160;further&#160;cutting&#160;compute&#160;by</p>
<p style="position:absolute;top:485px;left:532px;white-space:nowrap" class="ft04">20%&#160;∼&#160;40%</p>
<p style="position:absolute;top:485px;left:615px;white-space:nowrap" class="ft03">on&#160;GLUE&#160;benchmark.</p>
</div>
</body>
</html>