You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! I recently read your paper titled "scGPT: toward building a foundation model for single-cell multi-omics using generative AI" published in Nature Methods with great interest. And I have some questions regarding the condition tokens of scGPT.
As mentioned in Fig. 1, condition tokens can represent modality, batch, perturbation conditions and so on. But in the "Representation for batch and modality" section, you used additional sets of tokens to represent different batches and modalities which are not used as input to the transformer blocks. So I'm confused about the encoding of condition tokens; it seems like they contain more than just batch or modal information. By browsing through your code, I couldn't seem to locate any information related to condition tokens. In the "model.py", it seems like the input embedding only includes embeddings for gene tokens and expression values, is that correct? While in the "generation_model.py", the input embedding does include embeddings for gene tokens, expression values and perturbation tokens. Wasn't the pretraining of scGPT conducted on a dataset of human cells under normal conditions? Therefore, I'm curious to know how you specifically defined condition tokens during the pretraining process.
Thank you very much for your time and consideration. I would appreciate it if you could kindly point out any misunderstandings on my part and provide your insights on the matter. Looking forward to your response.
The text was updated successfully, but these errors were encountered:
Hi! I recently read your paper titled "scGPT: toward building a foundation model for single-cell multi-omics using generative AI" published in Nature Methods with great interest. And I have some questions regarding the condition tokens of scGPT.
As mentioned in Fig. 1, condition tokens can represent modality, batch, perturbation conditions and so on. But in the "Representation for batch and modality" section, you used additional sets of tokens to represent different batches and modalities which are not used as input to the transformer blocks. So I'm confused about the encoding of condition tokens; it seems like they contain more than just batch or modal information. By browsing through your code, I couldn't seem to locate any information related to condition tokens. In the "model.py", it seems like the input embedding only includes embeddings for gene tokens and expression values, is that correct? While in the "generation_model.py", the input embedding does include embeddings for gene tokens, expression values and perturbation tokens. Wasn't the pretraining of scGPT conducted on a dataset of human cells under normal conditions? Therefore, I'm curious to know how you specifically defined condition tokens during the pretraining process.
Thank you very much for your time and consideration. I would appreciate it if you could kindly point out any misunderstandings on my part and provide your insights on the matter. Looking forward to your response.
The text was updated successfully, but these errors were encountered: