What's the specific meaning of "condition tokens"? #183

OrangeKKW · 2024-04-19T09:14:35Z

Hi! I recently read your paper titled "scGPT: toward building a foundation model for single-cell multi-omics using generative AI" published in Nature Methods with great interest. And I have some questions regarding the condition tokens of scGPT.

As mentioned in Fig. 1, condition tokens can represent modality, batch, perturbation conditions and so on. But in the "Representation for batch and modality" section, you used additional sets of tokens to represent different batches and modalities which are not used as input to the transformer blocks. So I'm confused about the encoding of condition tokens; it seems like they contain more than just batch or modal information. By browsing through your code, I couldn't seem to locate any information related to condition tokens. In the "model.py", it seems like the input embedding only includes embeddings for gene tokens and expression values, is that correct? While in the "generation_model.py", the input embedding does include embeddings for gene tokens, expression values and perturbation tokens. Wasn't the pretraining of scGPT conducted on a dataset of human cells under normal conditions? Therefore, I'm curious to know how you specifically defined condition tokens during the pretraining process.

Thank you very much for your time and consideration. I would appreciate it if you could kindly point out any misunderstandings on my part and provide your insights on the matter. Looking forward to your response.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's the specific meaning of "condition tokens"? #183

What's the specific meaning of "condition tokens"? #183

OrangeKKW commented Apr 19, 2024

What's the specific meaning of "condition tokens"? #183

What's the specific meaning of "condition tokens"? #183

Comments

OrangeKKW commented Apr 19, 2024