Hi, I recently read a paper on the application of the latest CLIP model in spatial transcriptomics, a visual–omics foundation model to bridge histopathology with spatial transcriptomics, which uses the alignment of tissue images and transcriptome to complete the image to omics alignment. I want to know whether the difference between CLIP and OmiCLIP core is only reflected in the pretrained weights or in other places under such a pretraining mode. In addition, the number of pretraining datasets of OmiCLIP is relatively limited compared with the general CLIP model. OmiCLIP does not cover enough rare diseases. I want to know whether a general CLIP model can be fine tuned to a model similar to OmiCLIP.
Hi, I recently read a paper on the application of the latest CLIP model in spatial transcriptomics, a visual–omics foundation model to bridge histopathology with spatial transcriptomics, which uses the alignment of tissue images and transcriptome to complete the image to omics alignment. I want to know whether the difference between CLIP and OmiCLIP core is only reflected in the pretrained weights or in other places under such a pretraining mode. In addition, the number of pretraining datasets of OmiCLIP is relatively limited compared with the general CLIP model. OmiCLIP does not cover enough rare diseases. I want to know whether a general CLIP model can be fine tuned to a model similar to OmiCLIP.