Hi,
The paper describes four pooling functions: 1. Mean, 2. Identity, 3. Transformer, and 4. LSTM.
I am confused between mean and identity. I follow that mean means simply average all the [CLS] embeddings for all the chunks which would result in a final [768] -dimensional vector. In this way, how would identity function work? Does it mean concatenating all [CLS] vectors and if so, wouldn't it turn into a very long vector like: number of chunks x 768 ?
Any help in understanding this concept would be appreciated!
Thanks!
Hi,
The paper describes four pooling functions: 1. Mean, 2. Identity, 3. Transformer, and 4. LSTM.
I am confused between
meanandidentity. I follow thatmeanmeans simply average all the[CLS]embeddings for all the chunks which would result in a final[768]-dimensional vector. In this way, how wouldidentityfunction work? Does it mean concatenating all[CLS]vectors and if so, wouldn't it turn into a very long vector like:number of chunks x 768?Any help in understanding this concept would be appreciated!
Thanks!