Skip to content

Something can be improved in dtw() #30

@julyanghar

Description

@julyanghar

Hi! I revisited the code. I think there might be something that can be refined here. As for the dtw function, when two or all of option_diag, option_up and option_left are equal, there might be some bugs since we will actually get several optimal solution for the dynamic programming problem. We can take this as an example:

Base_to_Blending:
[0],"[1, 2, 3]",[4],[5],[6],[7],"[8, 9]",[10],[11],"[12, 13]",[14],[15],"[16, 17, 18, 19]",[20],[21],"[22, 23]",[24],[25],[26],"[27, 28, 29, 30, 31]",[32],[33],[34],[35],"[36, 37]",[38],[39],[40],[41],[42],[43],[44],[45],"[46, 47]",[48],[49],[50],[51],[52],[53],[54],[55],[56],[57],[58],[59],[60],[61],"[62, 63]",[64],[65],[66],[67],"[68, 69]",[70],"[71, 72]",[73],[74],[75],[76],[77],"[78, 79, 80]",[81],[82],[83],[84],"[85, 86]",[87],"[88, 89]",[90],[91],[92],"[93, 94]",[95],"[96, 97]",[98],[99],[100],[101],[102],[103],[104],[105],[106],[107],[108],[109],[110],[111],[112],[113],[114],[115],[116],[117],[118],[119],"[120, 121]",[122],[123],[124],[125],"[126, 127]",[128],"[129, 130]",[131],[132],[133],"[134, 135]",[136],[137],[138],[139],"[140, 141]",[142],[143],[144],"[145, 146]",[147],[148],[149],[150],[151],[152],"[153, 154]",[155],[156],[157],"[158, 159]",[160],[161],[162],[163],[164],[165],[166],[167],"[168, 169]",[170],[171],[172],[173],[174],[175],[176],[177],[178],[179],"[180, 181]",[182],[183],[184],[185],[186],"[187, 188]",[189],"[190, 191]",[192],[193],[194],[195],[196],[197],[198],[199],"[200, 201]",[202],[203],[204],[205],[206],[207],[208],[209],[210],[211],[211],[212],[213],"[214, 215]",[216],[217],[218]


Base_Input_Tokens:
HT,C,'s,ĠVive,ĠPro,Ġheadset,Ġis,Ġavailable,Ġto,Ġpre,-order,Ġfor,Ġ$,799,ĊĊ,We,'ve,Ġseen,Ġplenty,Ġof,ĠBeats,-focused,ĠK,IR,Fs,Ġin,Ġour,Ġtime,",",Ġsome,Ġbetter,Ġthan,Ġothers,.,ĠFew,",",Ġhowever,",",Ġplay,Ġquite,Ġso,Ġdirectly,Ġon,Ġthe,Ġname,Ġas,ĠOrig,Audio,'s,ĠBe,ets,.,ĠFor,Ġ$,25,",",Ġadopt,ers,Ġget,Ġa,Ġset,Ġof,Ġheadphones,Ġthat,Ġbear,Ġlittle,Ġdirect,Ġresemblance,Ġto,ĠDr,.,ĠDre,'s,Ġaudio,Ġgear,Ġof,Ġchoice,",",Ġbut,Ġare,Ġno,Ġdoubt,Ġbound,Ġto,Ġimpress,Ġfriends,Ġ--,Ġat,Ġleast,",",Ġup,Ġuntil,Ġthey,Ġsee,Ġa,Ġroot,Ġvegetable,Ġlogo,Ġinstead,Ġof,Ġa,Ġlower,-case,ĠB,.,ĠThankfully,",",Ġthere,'s,Ġmore,Ġto,Ġit,Ġthan,Ġjust,Ġamusing,Ġand,Ġconfusing,Ġpeers,.,ĠEvery,Ġpurchase,Ġwill,Ġlead,Ġto,Ġa,Ġdonation,Ġof,Ġcanned,Ġbe,ets,Ġ(,what,Ġelse,?),Ġto,Ġthe,ĠSecond,ĠHarvest,ĠFood,ĠBank,Ġof,ĠOrange,ĠCounty,.,ĠFor,Ġus,",",Ġthat,'s,Ġreason,Ġenough,Ġto,Ġhope,Ġthat,ĠBeats,Ġdoesn,'t,Ġput,Ġthe,Ġk,ib,osh,Ġon,ĠOrig,Audio,'s,Ġeffort,.,ĠBesides,",",Ġwe,Ġcould,Ġuse,Ġsome,Ġaccom,pan,iment,Ġfor,Ġour,ĠBeet,Box,.,<|eot_id|>


Blending_Input_Tokens:
▁HT,C,',s,▁V,ive,▁Pro,▁head,set,▁is,▁available,▁to,▁pre,-,order,▁for,▁$,7,9,9,<0x0A>,<0x0A>,We,',ve,▁seen,▁plenty,▁of,▁Be,ats,-,f,oc,used,▁K,IR,F,s,▁in,▁our,▁time,",",▁some,▁better,▁than,▁others,.,▁F,ew,",",▁however,",",▁play,▁quite,▁so,▁directly,▁on,▁the,▁name,▁as,▁Orig,Audio,',s,▁Be,ets,.,▁For,▁$,2,5,",",▁ad,op,ters,▁get,▁a,▁set,▁of,▁head,ph,ones,▁that,▁bear,▁little,▁direct,▁res,embl,ance,▁to,▁Dr,.,▁Dre,',s,▁audio,▁g,ear,▁of,▁choice,",",▁but,▁are,▁no,▁doubt,▁bound,▁to,▁impress,▁friends,▁--,▁at,▁least,",",▁up,▁until,▁they,▁see,▁a,▁root,▁veget,able,▁logo,▁instead,▁of,▁a,▁lower,-,case,▁B,.,▁Thank,fully,",",▁there,',s,▁more,▁to,▁it,▁than,▁just,▁am,using,▁and,▁confusing,▁pe,ers,.,▁Every,▁purchase,▁will,▁lead,▁to,▁a,▁don,ation,▁of,▁can,ned,▁be,ets,▁(,what,▁else,?),▁to,▁the,▁Second,▁Har,vest,▁Food,▁Bank,▁of,▁Orange,▁County,.,▁For,▁us,",",▁that,',s,▁reason,▁enough,▁to,▁hope,▁that,▁Be,ats,▁doesn,',t,▁put,▁the,▁k,ib,osh,▁on,▁Orig,Audio,',s,▁effort,.,▁Besides,",",▁we,▁could,▁use,▁some,▁accompan,iment,▁for,▁our,▁Be,et,Box,.,</s>

TinyLlama_MiniThinky_one example.csv

Obviously, we know that the 1st(index starting from zero) base_token 'C' should only align with the 1st blending_token 'C'. However, based on dtw, we can see that the result is 'C' in base_token aligns with three blending_token ' ' ' , 's' , '▁V' which is incorrect. The underlying reason is what I mentioned above. But I don't really figure out how to improve it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions