Skip to content

[CLS] similar context vector on Evaluation #125

@samish-dev

Description

@samish-dev

While training my model on the Arabic language, I was logging some of the values that were getting processed and generated by the model. the following is a sample log that I was getting while training the model:

top_vec:  tensor([[[-0.2439,  0.2242,  1.3744,  ...,  1.2180, -1.4410, -1.3635],
         [-0.2523,  0.1137,  1.3378,  ...,  1.2184, -0.1754, -1.2815],
         [-0.4105,  0.0702,  1.4091,  ...,  1.2221, -1.5671, -1.3778],
         ...,
         [ 0.0288, -0.6760,  1.5258,  ...,  1.3763, -1.4011, -1.3328],
         [-0.0218, -0.3249,  1.1765,  ...,  1.4232, -1.2773, -1.1683],
         [ 0.0678,  0.2823,  1.2759,  ...,  1.2741,  0.0080, -1.0290]]],
       device='cuda:0', grad_fn=<NativeLayerNormBackward>) torch.Size([1, 432, 768])

clss:  tensor([[  0,  31,  73,  90, 104, 142, 169, 187, 199, 213, 236, 273, 297, 315,
         337, 351, 364, 382, 415]], device='cuda:0') torch.Size([1, 19])

sents_vec:  tensor([[[-0.2439,  0.2242,  1.3744,  ...,  1.2180, -1.4410, -1.3635],
         [-0.2009, -0.0098,  0.3056,  ...,  1.2681, -1.3180, -1.2614],
         [-0.2254, -0.0302,  0.2825,  ...,  1.3459, -0.9250, -1.1691],
         ...,
         [-0.2042, -0.1110,  1.3395,  ...,  1.2766, -1.2633, -1.1890],
         [-0.1571, -0.6477,  1.2429,  ...,  0.6955, -0.8612, -1.1577],
         [-0.2982, -0.9736,  1.2249,  ...,  1.3346, -1.3179, -1.0534]]],
       device='cuda:0', grad_fn=<MulBackward0>) torch.Size([1, 19, 768])

sent_scores:  tensor([[0.2587, 0.1031, 0.2036, 0.0026, 0.2685, 0.0003, 0.0006, 0.0015, 0.0039,
         0.0027, 0.0164, 0.0015, 0.0077, 0.0006, 0.0005, 0.0009, 0.0770, 0.0069,
         0.0009]], device='cuda:0', grad_fn=<SqueezeBackward1>) torch.Size([1, 19])
         
[2022-02-25 00:37:03,025 INFO] Step 2155/50000; xent: 0.39; lr: 0.0000500;  12 docs/s;    280 sec

Everything seemed to be going well until I executed the train.py with testing mode, all the [CLS] tokens were generating the exact same value:

top_vec:  tensor([[[ 0.0841,  0.3211, -0.1155,  ...,  0.5341, -0.0099, -0.0151],
         [ 0.0841,  0.3211, -0.1155,  ...,  0.5341, -0.0099, -0.0151],
         [ 0.0841,  0.3211, -0.1155,  ...,  0.5341, -0.0099, -0.0151],
         ...,
         [ 0.0841,  0.3211, -0.1155,  ...,  0.5341, -0.0099, -0.0151],
         [ 0.0841,  0.3211, -0.1155,  ...,  0.5341, -0.0099, -0.0151],
         [ 0.0841,  0.3211, -0.1155,  ...,  0.5341, -0.0099, -0.0151]]],
       device='cuda:0') torch.Size([1, 512, 768])
clss:  tensor([[  0,  38,  51,  79, 130, 150, 171, 213, 258, 271, 304, 326, 345, 362,
         378, 395, 413, 449, 471, 492]], device='cuda:0') torch.Size([1, 20])
sents_vec:  tensor([[[ 0.0841,  0.3211, -0.1155,  ...,  0.5341, -0.0099, -0.0151],
         [ 0.0841,  0.3211, -0.1155,  ...,  0.5341, -0.0099, -0.0151],
         [ 0.0841,  0.3211, -0.1155,  ...,  0.5341, -0.0099, -0.0151],
         ...,
         [ 0.0841,  0.3211, -0.1155,  ...,  0.5341, -0.0099, -0.0151],
         [ 0.0841,  0.3211, -0.1155,  ...,  0.5341, -0.0099, -0.0151],
         [ 0.0841,  0.3211, -0.1155,  ...,  0.5341, -0.0099, -0.0151]]],
       device='cuda:0') torch.Size([1, 20, 768])
sent_scores:  tensor([[0.0567, 0.0567, 0.0567, 0.0567, 0.0567, 0.0567, 0.0567, 0.0567, 0.0567,
         0.0567, 0.0567, 0.0567, 0.0567, 0.0567, 0.0567, 0.0567, 0.0567, 0.0567,
         0.0567, 0.0567]], device='cuda:0') torch.Size([1, 20])
         
top_vec:  tensor([[[ 0.0841,  0.3211, -0.1155,  ...,  0.5341, -0.0099, -0.0151],
         [ 0.0841,  0.3211, -0.1155,  ...,  0.5341, -0.0099, -0.0151],
         [ 0.0841,  0.3211, -0.1155,  ...,  0.5341, -0.0099, -0.0151],
         ...,
         [ 0.0841,  0.3211, -0.1155,  ...,  0.5341, -0.0099, -0.0151],
         [ 0.0841,  0.3211, -0.1155,  ...,  0.5341, -0.0099, -0.0151],
         [ 0.0841,  0.3211, -0.1155,  ...,  0.5341, -0.0099, -0.0151]]],
       device='cuda:0') torch.Size([1, 512, 768])
clss:  tensor([[  0,  43,  92, 127, 151, 172, 191, 226, 242, 256, 269, 290, 312, 330,
         365, 410, 433, 461, 482, 508]], device='cuda:0') torch.Size([1, 20])
sents_vec:  tensor([[[ 0.0841,  0.3211, -0.1155,  ...,  0.5341, -0.0099, -0.0151],
         [ 0.0841,  0.3211, -0.1155,  ...,  0.5341, -0.0099, -0.0151],
         [ 0.0841,  0.3211, -0.1155,  ...,  0.5341, -0.0099, -0.0151],
         ...,
         [ 0.0841,  0.3211, -0.1155,  ...,  0.5341, -0.0099, -0.0151],
         [ 0.0841,  0.3211, -0.1155,  ...,  0.5341, -0.0099, -0.0151],
         [ 0.0841,  0.3211, -0.1155,  ...,  0.5341, -0.0099, -0.0151]]],
       device='cuda:0') torch.Size([1, 20, 768])
sent_scores:  tensor([[0.0567, 0.0567, 0.0567, 0.0567, 0.0567, 0.0567, 0.0567, 0.0567, 0.0567,
         0.0567, 0.0567, 0.0567, 0.0567, 0.0567, 0.0567, 0.0567, 0.0567, 0.0567,
         0.0567, 0.0567]], device='cuda:0') torch.Size([1, 20])

can anyone please help and indicate why such problem is occurring with me.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions