TTTTTTris.github.io/outputs.html at main · TTTTTTris/TTTTTTris.github.io · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
<!DOCTYPE html><html>
<head>
<title></title>
<style type="text/css">
<!--
.xflip {
    -moz-transform: scaleX(-1);
    -webkit-transform: scaleX(-1);
    -o-transform: scaleX(-1);
    transform: scaleX(-1);
    filter: fliph;
}
.yflip {
    -moz-transform: scaleY(-1);
    -webkit-transform: scaleY(-1);
    -o-transform: scaleY(-1);
    transform: scaleY(-1);
    filter: flipv;
}
.xyflip {
    -moz-transform: scaleX(-1) scaleY(-1);
    -webkit-transform: scaleX(-1) scaleY(-1);
    -o-transform: scaleX(-1) scaleY(-1);
    transform: scaleX(-1) scaleY(-1);
    filter: fliph + flipv;
}
-->
</style>
</head>
<body>
<a name=1></a>github.com/ttttttris<br/>
<b>Jiayi&#160;Tian</b><br/>
<a href="tel:+1 (805) 245 0298">+1&#160;(805)&#160;245&#160;0298</a><br/>
|&#160;<a href="mailto:jiayi_tian@ucsb.edu">jiayi_tian@ucsb.edu</a><br/>
|&#160;<a href="https://www.linkedin.com/in/jiayi-tian-32b9652a5/">linkedin.com/in/jiayi-tian-32b9652a5/</a><br/>
<b>Focus&#160;on&#160;eﬀicient&#160;LLM&#160;Training&#160;&amp;&#160;Inference,&#160;Eﬀicient&#160;CoT&#160;Reasoning.</b><br/>
<b>EDUCATION<br/>University&#160;of&#160;California,&#160;Santa&#160;Barbara</b>,&#160;<i>Ph.D.&#160;in&#160;Computer&#160;Engineering&#160;</i>|&#160;CA,&#160;USA&#160;<b>3.9/4.0</b><br/>
Fall&#160;2025&#160;-&#160;ongoing<br/>
<b>University&#160;of&#160;California,&#160;Santa&#160;Barbara</b>,&#160;<i>M.S.&#160;in&#160;Computer&#160;Engineering&#160;</i>|&#160;CA,&#160;USA&#160;<b>3.9/4.0</b><br/>
Fall&#160;2023&#160;-&#160;Fall&#160;2025<br/>
<b>Nanjing&#160;University</b>,&#160;<i>B.Eng.&#160;in&#160;VLSI&#160;Design&#160;&amp;&#160;System&#160;Integration&#160;</i>|&#160;China&#160;<b>4.5/5.0</b><br/>
Fall&#160;2019&#160;-&#160;Fall&#160;2023<br/>
<b>INDUSTRIAL&#160;EXPERIENCE</b><br/>
<b>Intel&#160;Corporation,&#160;</b><i>Research&#160;Intern&#160;</i>|&#160;Hillsboro,&#160;OR<br/>
June.&#160;2025&#160;–&#160;Sep.&#160;2025<br/>
•&#160;Proposed&#160;and&#160;implemented&#160;SkipKV,&#160;a&#160;training-free&#160;KV-cache&#160;compression&#160;framework&#160;featuring&#160;sentence-level<br/>
selective&#160;eviction&#160;and&#160;dynamic&#160;generation&#160;control&#160;for&#160;eﬀicient&#160;CoT&#160;reasoning.<br/>
•&#160;Designed&#160;a&#160;semantic&#160;similarity–based&#160;scoring&#160;metric&#160;to&#160;identify&#160;and&#160;remove&#160;redundant&#160;sentence&#160;spans&#160;while&#160;main-<br/>
taining&#160;reasoning&#160;coherence.<br/>
•&#160;Introduced&#160;a&#160;dynamic&#160;steering&#160;mechanism&#160;to&#160;adapt&#160;hidden&#160;activations&#160;during&#160;inference,&#160;promoting&#160;concise&#160;and<br/>
stable&#160;outputs.<br/>
•&#160;Demonstrated&#160;strong&#160;results&#160;on&#160;long-reasoning&#160;tasks&#160;(e.g.&#160;AIME24,&#160;LiveCodeBench)&#160;with&#160;LRMs:&#160;up&#160;to&#160;26.7%<br/>
higher&#160;accuracy&#160;vs.&#160;SoTA&#160;under&#160;equal&#160;compression,&#160;with&#160;1.6×&#160;shorter&#160;generation&#160;and&#160;1.7×&#160;higher&#160;throughput.<br/>
<b>Intel&#160;Corporation,&#160;</b><i>Research&#160;Intern&#160;</i>|&#160;Hillsboro,&#160;OR<br/>
June.&#160;2024&#160;-&#160;Sep.&#160;2024<br/>
•&#160;Proposed&#160;and&#160;implemented&#160;a&#160;tensor-compressed&#160;Transformer&#160;training&#160;accelerator&#160;on&#160;FPGA,&#160;optimizing&#160;compute<br/>
ordering,&#160;dataflow,&#160;and&#160;memory&#160;allocation&#160;for&#160;LLMs.<br/>
•&#160;Designed&#160;a&#160;bidirectional&#160;tensor&#160;contraction&#160;scheme&#160;enabling&#160;substantial&#160;reduction&#160;in&#160;intermediate&#160;memory&#160;and<br/>
compute&#160;cost&#160;during&#160;long-sequence&#160;training&#160;and&#160;inference.<br/>
•&#160;Built&#160;an&#160;HLS-based&#160;training&#160;engine&#160;achieving&#160;up&#160;to&#160;48×&#160;memory&#160;eﬀiciency&#160;and&#160;3.6×&#160;energy&#160;eﬀiciency&#160;compared<br/>
with&#160;an&#160;Nvidia&#160;RTX&#160;3090&#160;GPU.<br/>
•&#160;Resulting&#160;paper&#160;accepted&#160;to&#160;IEEE&#160;TCAD.<br/>
<b>AMD-Xilinx&#160;Technology,&#160;</b><i>Co-Op/Intern&#160;</i>|&#160;Beijing,&#160;China<br/>
June&#160;2023&#160;-&#160;Sep&#160;2023<br/>
•&#160;Developed&#160;a&#160;C++/HLS&#160;Transformer&#160;training&#160;framework&#160;with&#160;custom&#160;tensorized&#160;linear&#160;layers&#160;and&#160;nonlinear&#160;oper-<br/>
ations&#160;for&#160;LLM&#160;acceleration,&#160;achieved&#160;30×&#160;∼&#160;52×&#160;saving&#160;in&#160;model&#160;size&#160;for&#160;end-to-end&#160;Transformer&#160;training.<br/>
<b>SKILLS&#160;&amp;&#160;RESEARCH&#160;INTERESTS<br/>Languages&#160;&amp;&#160;Tools&#160;</b>Python,&#160;PyTorch,&#160;Huggingface,&#160;vLLM,&#160;C/C++,&#160;High-level&#160;Synthesis&#160;(HLS),&#160;Vivado/Vitis/XRT<br/>
Eﬀicient&#160;Large&#160;Language&#160;Models&#160;(LLMs)&#160;Training/Inference,&#160;Eﬀicient&#160;Large&#160;Reasoning&#160;Models&#160;(LRMs)<br/>
<b>ML&#160;&amp;&#160;NLP</b><br/>
(Model&#160;Compression,&#160;KV&#160;Cache&#160;Compression,&#160;Pruning,&#160;Low-rank&#160;decomposition,&#160;Early&#160;Exit,&#160;Knowledge<br/>Distillation,&#160;Quantization)<br/>
<b>PUBLICATIONS&#160;&amp;&#160;PREPRINTS</b><br/>
<b>SkipKV:&#160;Selective&#160;Skipping&#160;of&#160;KV&#160;Generation&#160;and&#160;Storage&#160;for&#160;Eﬀicient&#160;Inference&#160;with&#160;Large&#160;Reasoning&#160;Models<br/>Jiayi&#160;Tian</b>,&#160;Seyedarmin&#160;Azizi,&#160;Yequan&#160;Zhao,&#160;Erfan&#160;Baghaei&#160;Potraghloo,&#160;Sean&#160;McPherson,&#160;Sharath&#160;Nittur&#160;Sridhar,&#160;Zhengyang&#160;Wang,<br/>Zheng&#160;Zhang,&#160;Massoud&#160;Pedram,&#160;Souvik&#160;Kundu,&#160;under&#160;review&#160;at&#160;MLSYS,&#160;2025.<br/>
<b>Activation-Informed&#160;Pareto-Guided&#160;Low-Rank&#160;Compression&#160;for&#160;Eﬀicient&#160;LLM/VLM<br/></b>Ryan&#160;Solgi,&#160;Parsa&#160;Madinei,&#160;<b>Jiayi&#160;Tian</b>,&#160;Rupak&#160;Swaminathan,&#160;Jing&#160;Liu,&#160;Nathan&#160;Susanj,&#160;Zheng&#160;Zhang,&#160;under&#160;review&#160;at&#160;ARR&#160;Oct,&#160;2025.<br/><a href="https://arxiv.org/pdf/2510.05544">arXiv&#160;preprint</a>.<br/>
<b>FLAT-LLM:&#160;Fine-grained&#160;Low-rank&#160;Activation&#160;Space&#160;Transformation&#160;for&#160;Large&#160;Language&#160;Model&#160;Compression<br/>Jiayi&#160;Tian</b>,&#160;Ryan&#160;Solgi,&#160;Jinming&#160;Lu,&#160;Yifan&#160;Yang,&#160;Hai&#160;Li,&#160;Zheng&#160;Zhang,&#160;under&#160;review&#160;at&#160;ARR&#160;Oct,&#160;2025.&#160;<a href="https://arxiv.org/pdf/2505.23966">arXiv&#160;preprint.</a><br/>
<b>FETTA:&#160;Flexible&#160;and&#160;Eﬀicient&#160;Hardware&#160;Accelerator&#160;for&#160;Tensorized&#160;Neural&#160;Network&#160;Training<br/></b>Jinming&#160;Lu,&#160;<b>Jiayi&#160;Tian</b>,&#160;Hai&#160;Li,&#160;Ian&#160;Young,&#160;Zheng&#160;Zhang,&#160;under&#160;review&#160;at&#160;IEEE&#160;Transactions&#160;on&#160;Computer-Aided&#160;Design&#160;of&#160;Integrated<br/>Circuits&#160;and&#160;Systems.&#160;<a href="https://arxiv.org/pdf/2504.06474">arXiv&#160;preprint</a>.<br/>
<b>Ultra&#160;Memory-Eﬀicient&#160;On-FPGA&#160;Training&#160;of&#160;Transformers&#160;via&#160;Tensor-Compressed&#160;Optimization<br/>Jiayi&#160;Tian</b>,&#160;Jinming&#160;Lu,&#160;Hai&#160;Li,&#160;Xiangwei&#160;Wang,&#160;Cong&#160;(Callie)&#160;Hao,&#160;Ian&#160;Young,&#160;Zheng&#160;Zhang,&#160;<a href="https://ieeexplore.ieee.org/document/11121368">IEEE&#160;Transactions&#160;on&#160;Computer-Aided<br/>Design&#160;of&#160;Integrated&#160;Circuits&#160;and&#160;Systems&#160;(TCAD),&#160;2025.</a><br/>
<b>BEBERT:&#160;Eﬀicient&#160;and&#160;robust&#160;binary&#160;ensemble&#160;BERT<br/>Jiayi,&#160;Tian</b>,&#160;Chao&#160;Fang,&#160;Haonan&#160;Wang,&#160;and&#160;Zhongfeng&#160;Wang,&#160;<a href="https://ieeexplore.ieee.org/document/10096223">IEEE&#160;International&#160;Conference&#160;on&#160;Acoustics,&#160;Speech&#160;and&#160;Signal<br/>Processing&#160;(ICASSP),&#160;2023.</a><br/>
<hr/>
<a name=2></a><b>RESEARCH&#160;PROJECTS</b><br/>
<b>Structural&#160;Pruning&#160;for&#160;Eﬀicient&#160;LLM&#160;Inference&#160;via&#160;Low-rank&#160;Decomposition</b><br/>
Aug.&#160;2024&#160;-&#160;May.&#160;2025<br/>
•&#160;Developed&#160;FLAT-LLM,&#160;a&#160;training-free,&#160;fine-grained&#160;compression&#160;method&#160;that&#160;leverages&#160;the&#160;low-rank&#160;structure&#160;of<br/>
the&#160;activation&#160;space&#160;to&#160;transform&#160;and&#160;compress&#160;the&#160;model&#160;weights.<br/>
•&#160;Introduced&#160;a&#160;novel&#160;training-free&#160;rank&#160;selection&#160;algorithm&#160;that&#160;allocates&#160;ranks&#160;using&#160;a&#160;greedy&#160;redistribution&#160;strategy<br/>
and&#160;can&#160;be&#160;integrated&#160;with&#160;existing&#160;low-rank&#160;LLM&#160;compression&#160;pipelines.<br/>
•&#160;Achieved&#160;strong&#160;performance&#160;on&#160;LLaMA-2,&#160;3&#160;and&#160;Mistral&#160;models&#160;with&#160;minimal&#160;calibration&#160;overhead&#160;(within<br/>
minutes),&#160;validated&#160;across&#160;language&#160;modeling&#160;and&#160;downstream&#160;tasks.<br/>
<b>Training&#160;Accelerator&#160;Design&#160;for&#160;Tensor-Compressed&#160;Transformer&#160;Models</b><br/>
Sep.&#160;2023&#160;-&#160;May.&#160;2024<br/>
•&#160;Designed&#160;a&#160;tensor-compressed&#160;training&#160;framework&#160;for&#160;Transformer&#160;models,&#160;significantly&#160;reducing&#160;model&#160;size&#160;and<br/>
memory&#160;footprint.<br/>
•&#160;Developed&#160;a&#160;fixed&#160;bidirectional&#160;contraction&#160;path&#160;and&#160;an&#160;adaptive&#160;path-search&#160;algorithm&#160;to&#160;improve&#160;memory&#160;and<br/>
compute&#160;eﬀiciency&#160;in&#160;long-sequence&#160;LLM&#160;training&#160;and&#160;inference.<br/>
<b>Binary-Quantized&#160;Ensemble&#160;LLM&#160;for&#160;Fast&#160;and&#160;Robust&#160;Language&#160;Model&#160;Inference</b><br/>
Apr.&#160;2021&#160;-&#160;June.&#160;2023<br/>
•&#160;Developed&#160;BEBERT,&#160;a&#160;novel&#160;quantization-ensemble&#160;strategy&#160;enabling&#160;eﬀicient&#160;and&#160;accurate&#160;1-bit&#160;BERT&#160;inference.<br/>•&#160;Leveraged&#160;eﬀicient&#160;knowledge&#160;distillation&#160;strategy&#160;for&#160;high&#160;training&#160;eﬀiciency.<br/>•&#160;Achieved&#160;13×&#160;model&#160;size&#160;reduction&#160;and&#160;15×&#160;compute&#160;savings&#160;over&#160;standard&#160;BERT&#160;with&#160;minimal&#160;accuracy&#160;loss.<br/>•&#160;Proposed&#160;early-exit&#160;inference&#160;variant,&#160;further&#160;cutting&#160;compute&#160;by&#160;20%&#160;∼&#160;40%&#160;on&#160;GLUE&#160;benchmark.<br/>
<hr/>
</body>
</html>