COVID-19 PBMC Ncl-Cambridge-UCL #170
Replies: 16 comments
-
Hi Sara, the airr table (equivalent to the preprocessed tsv file) is here: https://github.com/clatworthylab/COVID_analysis/blob/main/scbcr_airr.tsv.bz2 for all the patients and samples. If you know which patient_id you are after, i can also just send that directly to you as i have it on the server as individual files for each sample. |
Beta Was this translation helpful? Give feedback.
-
Thank you Kelvin!
I just want a patient with "severe" symptoms. The one that the network is
titled "severe".
I appreciate it if you provide me with the files.
Thanks,
Sara
…On Wed, Jul 27, 2022 at 12:26 PM Zewen Kelvin Tuong < ***@***.***> wrote:
Hi Sara,
the airr table (equivalent to the preprocessed tsv file) is here:
https://github.com/clatworthylab/COVID_analysis/blob/main/scbcr_airr.tsv.bz2
for all the patients and samples.
If you know which patient_id you are after, i can also just send that
directly to you as i have it on the server as individual files for each
sample.
—
Reply to this email directly, view it on GitHub
<#169 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AVVJONWQZSOHV2C4I6VJBB3VWFPL7ANCNFSM542GRTEQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Hi Sara, there's 13 samples belonging to severe patients and i've place their preprocessed BCR files in this tar ball. bcr_severe_preprocessed.tar.gz Each file belongs to the a sample as per the h5ad file on the covid19 cellatas portal.
and you will be able to find the cells from these samples if you select them from the |
Beta Was this translation helpful? Give feedback.
-
Thank you so much Kelvin.
I have three questions:
1- does this file already passed the QC step (filter.contig() function)?
2- does it matter that preprocessing run on each sample individually? or
running all samples at the same time?
3- I am thinking how you are combining all of the "severe" samples together
to generate your network? I mean how biologically we justify this? and why
we don't generate a network per each severe sample?
Thanks again!
Sara
…On Wed, Jul 27, 2022 at 5:38 PM Zewen Kelvin Tuong ***@***.***> wrote:
Hi Sara, there's 13 samples belonging to severe patients and i've place
their preprocessed BCR files in this tar ball.
bcr_severe_preprocessed.tar.gz
<https://github.com/zktuong/dandelion/files/9203725/bcr_severe_preprocessed.tar.gz>
Each file belongs to the a sample as per the h5ad file on the covid19
cellatas portal <https://www.covid19cellatlas.org/>.
The sample ids are:
MH9143325
MH9143320
MH9143274
MH8919327
newcastle49
MH9179822
MH9179826
AP8
AP1
AP5
BGCV01_CV0144
BGCV03_CV0176
BGCV06_CV0178
and you will be able to find the cells from these samples if you select
them from the sample_id column in the h5ad object
<https://www.covid19cellatlas.org/haniffa21/>
—
Reply to this email directly, view it on GitHub
<#169 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AVVJONU65FLM2NNYH7DNNRDVWGT5LANCNFSM542GRTEQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Nope. you can proceed with this however you see fit.
preprocessing has to be run on each sample individually.
You can generate a network per sample as well. There's no right or wrong here as it's just for visualisation. |
Beta Was this translation helpful? Give feedback.
-
Hi Kelvin,
By chance, do you have any idea why for running this bigger
BCR_preprocessed_data when running ddl.generate.network(), my kernel dies
and I can not finish running this command.
Any comments?
Thanks,
Sara
…On Thu, Jul 28, 2022 at 8:06 AM Sara Moien ***@***.***> wrote:
Thank you.
On Thu, Jul 28, 2022, 4:07 AM Zewen Kelvin Tuong ***@***.***>
wrote:
> 1- does this file already passed the QC step (filter.contig() function)?
>
> Nope. you can proceed with this however you see fit.
>
> 2- does it matter that preprocessing run on each sample individually? or
> running all samples at the same time?
>
> preprocessing *has* to be run on each sample individually.
>
> 3- I am thinking how you are combining all of the "severe" samples
> together
> to generate your network? I mean how biologically we justify this? and why
> we don't generate a network per each severe sample?
>
> You can generate a network per sample as well. There's no right or wrong
> here as it's just for visualisation.
>
> —
> Reply to this email directly, view it on GitHub
> <#169 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AVVJONWAJSZTEDZVTY2GN2DVWI5TVANCNFSM542GRTEQ>
> .
> You are receiving this because you authored the thread.Message ID:
> ***@***.***>
>
|
Beta Was this translation helpful? Give feedback.
-
most likely due to out of memory. might have to use a more powerful machine. |
Beta Was this translation helpful? Give feedback.
-
also the COVID dataset is one of the largest, it took me a couple of hours to run it on a machine with 32 CPUs and 80gb ram |
Beta Was this translation helpful? Give feedback.
-
I see. Thank you so much.
It looks after filter.config the size of vdjs decreases and now I can get
the network of the filtered data.
One other question: for the network generation that is based on minimum
spanning tree of nodes in a cluster, which parameter is used to measure the
lv distance?
The lv distance of "junction_aa"?
…On Thu, Jul 28, 2022 at 10:32 AM Zewen Kelvin Tuong < ***@***.***> wrote:
also the COVID dataset is one of the largest, it took me a couple of hours
to run it on a machine with 32 CPUs and 80gb ram
—
Reply to this email directly, view it on GitHub
<#169 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AVVJONVGMWBKDYPVWEKM3VDVWKKYTANCNFSM542GRTEQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
sequence_alignment_aa. But you can swap to junction_aa if you prefer
Kelvin
On 28 Jul 2022, at 4:02 PM, saramoein372 ***@***.***> wrote:
I see. Thank you so much.
It looks after filter.config the size of vdjs decreases and now I can get
the network of the filtered data.
One other question: for the network generation that is based on minimum
spanning tree of nodes in a cluster, which parameter is used to measure the
lv distance?
The lv distance of "junction_aa"?
On Thu, Jul 28, 2022 at 10:32 AM Zewen Kelvin Tuong < ***@***.***> wrote:
also the COVID dataset is one of the largest, it took me a couple of hours
to run it on a machine with 32 CPUs and 80gb ram
—
Reply to this email directly, view it on GitHub
<#169 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AVVJONVGMWBKDYPVWEKM3VDVWKKYTANCNFSM542GRTEQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
—
Reply to this email directly, view it on GitHub [github.com]<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_zktuong_dandelion_issues_169-23issuecomment-2D1198273086&d=DwMFaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=NnH1lFEAbZToqib-c1bFKCDR6VzAy7mQ1sbB2q4qbXQ&m=5LbpFWAdP9lwXk_k9AF5S7SLQ0HdpL-gWIuBKtvE0yLCjXr3pQT7EZPQ7i6RQYV1&s=r0fd1tZSec89s05rvp2sCx3nKw1zo6cjZu_vub2PsFU&e=>, or unsubscribe [github.com]<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AGIAJIZPB3QIDAU7PKSLAITVWKOJJANCNFSM542GRTEQ&d=DwMFaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=NnH1lFEAbZToqib-c1bFKCDR6VzAy7mQ1sbB2q4qbXQ&m=5LbpFWAdP9lwXk_k9AF5S7SLQ0HdpL-gWIuBKtvE0yLCjXr3pQT7EZPQ7i6RQYV1&s=u8ObewQkoNUpWCYl23WjEo128ENjs0f6h4Z0fBQxbZs&e=>.
You are receiving this because you commented.Message ID: ***@***.***>
…--
The Wellcome Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
|
Beta Was this translation helpful? Give feedback.
-
And one more question;
something is confusing for me: what is the difference of the expanded and
not-expanded networks in your paper?
Does for both of them you used filtered.contig?
If you have used filtered.contig function, why running your code tool
couple of hours? Because for me with filter.contig it took only some
minutes to generate the network.
I want to make sure you have run contig.filter in all your runs.
…On Thu, Jul 28, 2022 at 11:02 AM Sara Moien ***@***.***> wrote:
I see. Thank you so much.
It looks after filter.config the size of vdjs decreases and now I can get
the network of the filtered data.
One other question: for the network generation that is based on minimum
spanning tree of nodes in a cluster, which parameter is used to measure the
lv distance?
The lv distance of "junction_aa"?
On Thu, Jul 28, 2022 at 10:32 AM Zewen Kelvin Tuong <
***@***.***> wrote:
> also the COVID dataset is one of the largest, it took me a couple of
> hours to run it on a machine with 32 CPUs and 80gb ram
>
> —
> Reply to this email directly, view it on GitHub
> <#169 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AVVJONVGMWBKDYPVWEKM3VDVWKKYTANCNFSM542GRTEQ>
> .
> You are receiving this because you authored the thread.Message ID:
> ***@***.***>
>
|
Beta Was this translation helpful? Give feedback.
-
Expanded vs non is just whether the clones have more than 1 cell, or all cells.
Both work off the same dataset.
I have used filter contig and there has been code optimisations since the first version a year ago, so it should hopefully be faster now.
Kelvin
On 28 Jul 2022, at 4:14 PM, saramoein372 ***@***.***> wrote:
And one more question;
something is confusing for me: what is the difference of the expanded and
not-expanded networks in your paper?
Does for both of them you used filtered.contig?
If you have used filtered.contig function, why running your code tool
couple of hours? Because for me with filter.contig it took only some
minutes to generate the network.
I want to make sure you have run contig.filter in all your runs.
On Thu, Jul 28, 2022 at 11:02 AM Sara Moien ***@***.***> wrote:
I see. Thank you so much.
It looks after filter.config the size of vdjs decreases and now I can get
the network of the filtered data.
One other question: for the network generation that is based on minimum
spanning tree of nodes in a cluster, which parameter is used to measure the
lv distance?
The lv distance of "junction_aa"?
On Thu, Jul 28, 2022 at 10:32 AM Zewen Kelvin Tuong <
***@***.***> wrote:
> also the COVID dataset is one of the largest, it took me a couple of
> hours to run it on a machine with 32 CPUs and 80gb ram
>
> —
> Reply to this email directly, view it on GitHub
> <#169 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AVVJONVGMWBKDYPVWEKM3VDVWKKYTANCNFSM542GRTEQ>
> .
> You are receiving this because you authored the thread.Message ID:
> ***@***.***>
>
—
Reply to this email directly, view it on GitHub [github.com]<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_zktuong_dandelion_issues_169-23issuecomment-2D1198293472&d=DwMFaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=NnH1lFEAbZToqib-c1bFKCDR6VzAy7mQ1sbB2q4qbXQ&m=5VkP4ja9fNEzaWI-Ql3pHh3DpzZG64G2udAeOO39JmaURStsRduUy6OUIrrD75-f&s=4bwR_40IpkhnOtGBMt4d2nAP8LWhIzm4RMxRyalruwY&e=>, or unsubscribe [github.com]<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AGIAJI6A7RZRJ47WPU7OSPLVWKPULANCNFSM542GRTEQ&d=DwMFaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=NnH1lFEAbZToqib-c1bFKCDR6VzAy7mQ1sbB2q4qbXQ&m=5VkP4ja9fNEzaWI-Ql3pHh3DpzZG64G2udAeOO39JmaURStsRduUy6OUIrrD75-f&s=QdUBOpMN6eNbVb-GP2rFKTWh0LdyZG5kWlKyNwbqS5I&e=>.
You are receiving this because you commented.Message ID: ***@***.***>
…--
The Wellcome Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
|
Beta Was this translation helpful? Give feedback.
-
Kelvin,
Thank you so much.
I could generate networks somehow close to what you generated. But they are
not exactly the same in visualization.
Are there any reasons why the visualized networks are not exactly the same?
If you expect exactly the same shapes, what are possible parameters that
you think can make thes changes?
For example, can I ask what settings you have used for contig.filtering?
Thanks,
Sara
On Thu, Jul 28, 2022 at 11:25 AM Zewen Kelvin Tuong <
***@***.***> wrote:
… Expanded vs non is just whether the clones have more than 1 cell, or all
cells.
Both work off the same dataset.
I have used filter contig and there has been code optimisations since the
first version a year ago, so it should hopefully be faster now.
Kelvin
On 28 Jul 2022, at 4:14 PM, saramoein372 ***@***.***> wrote:
And one more question;
something is confusing for me: what is the difference of the expanded and
not-expanded networks in your paper?
Does for both of them you used filtered.contig?
If you have used filtered.contig function, why running your code tool
couple of hours? Because for me with filter.contig it took only some
minutes to generate the network.
I want to make sure you have run contig.filter in all your runs.
On Thu, Jul 28, 2022 at 11:02 AM Sara Moien ***@***.***> wrote:
> I see. Thank you so much.
> It looks after filter.config the size of vdjs decreases and now I can
get
> the network of the filtered data.
>
> One other question: for the network generation that is based on minimum
> spanning tree of nodes in a cluster, which parameter is used to measure
the
> lv distance?
> The lv distance of "junction_aa"?
>
> On Thu, Jul 28, 2022 at 10:32 AM Zewen Kelvin Tuong <
> ***@***.***> wrote:
>
>> also the COVID dataset is one of the largest, it took me a couple of
>> hours to run it on a machine with 32 CPUs and 80gb ram
>>
>> —
>> Reply to this email directly, view it on GitHub
>> <
#169 (comment)>,
>> or unsubscribe
>> <
https://github.com/notifications/unsubscribe-auth/AVVJONVGMWBKDYPVWEKM3VDVWKKYTANCNFSM542GRTEQ>
>> .
>> You are receiving this because you authored the thread.Message ID:
>> ***@***.***>
>>
>
—
Reply to this email directly, view it on GitHub [github.com]<
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_zktuong_dandelion_issues_169-23issuecomment-2D1198293472&d=DwMFaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=NnH1lFEAbZToqib-c1bFKCDR6VzAy7mQ1sbB2q4qbXQ&m=5VkP4ja9fNEzaWI-Ql3pHh3DpzZG64G2udAeOO39JmaURStsRduUy6OUIrrD75-f&s=4bwR_40IpkhnOtGBMt4d2nAP8LWhIzm4RMxRyalruwY&e=>,
or unsubscribe [github.com]<
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AGIAJI6A7RZRJ47WPU7OSPLVWKPULANCNFSM542GRTEQ&d=DwMFaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=NnH1lFEAbZToqib-c1bFKCDR6VzAy7mQ1sbB2q4qbXQ&m=5VkP4ja9fNEzaWI-Ql3pHh3DpzZG64G2udAeOO39JmaURStsRduUy6OUIrrD75-f&s=QdUBOpMN6eNbVb-GP2rFKTWh0LdyZG5kWlKyNwbqS5I&e=>.
You are receiving this because you commented.Message ID: ***@***.***>
--
The Wellcome Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
—
Reply to this email directly, view it on GitHub
<#169 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AVVJONQLUYPIN2ZEFUPYVU3VWKQ7FANCNFSM542GRTEQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
And Kelvin, would you please send me the "healthy_control_BCR_preprocessed"
file?
Thank you so much.
Best,
Sara
…On Thu, Jul 28, 2022 at 12:17 PM Sara Moien ***@***.***> wrote:
Kelvin,
Thank you so much.
I could generate networks somehow close to what you generated. But they
are not exactly the same in visualization.
Are there any reasons why the visualized networks are not exactly the same?
If you expect exactly the same shapes, what are possible parameters that
you think can make thes changes?
For example, can I ask what settings you have used for contig.filtering?
Thanks,
Sara
On Thu, Jul 28, 2022 at 11:25 AM Zewen Kelvin Tuong <
***@***.***> wrote:
> Expanded vs non is just whether the clones have more than 1 cell, or all
> cells.
>
> Both work off the same dataset.
>
> I have used filter contig and there has been code optimisations since the
> first version a year ago, so it should hopefully be faster now.
>
> Kelvin
>
> On 28 Jul 2022, at 4:14 PM, saramoein372 ***@***.***> wrote:
>
>
>
> And one more question;
>
> something is confusing for me: what is the difference of the expanded and
> not-expanded networks in your paper?
> Does for both of them you used filtered.contig?
>
> If you have used filtered.contig function, why running your code tool
> couple of hours? Because for me with filter.contig it took only some
> minutes to generate the network.
>
> I want to make sure you have run contig.filter in all your runs.
>
> On Thu, Jul 28, 2022 at 11:02 AM Sara Moien ***@***.***> wrote:
>
> > I see. Thank you so much.
> > It looks after filter.config the size of vdjs decreases and now I can
> get
> > the network of the filtered data.
> >
> > One other question: for the network generation that is based on minimum
> > spanning tree of nodes in a cluster, which parameter is used to measure
> the
> > lv distance?
> > The lv distance of "junction_aa"?
> >
> > On Thu, Jul 28, 2022 at 10:32 AM Zewen Kelvin Tuong <
> > ***@***.***> wrote:
> >
> >> also the COVID dataset is one of the largest, it took me a couple of
> >> hours to run it on a machine with 32 CPUs and 80gb ram
> >>
> >> —
> >> Reply to this email directly, view it on GitHub
> >> <
> #169 (comment)>,
>
> >> or unsubscribe
> >> <
> https://github.com/notifications/unsubscribe-auth/AVVJONVGMWBKDYPVWEKM3VDVWKKYTANCNFSM542GRTEQ>
>
> >> .
> >> You are receiving this because you authored the thread.Message ID:
> >> ***@***.***>
> >>
> >
>
> —
> Reply to this email directly, view it on GitHub [github.com]<
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_zktuong_dandelion_issues_169-23issuecomment-2D1198293472&d=DwMFaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=NnH1lFEAbZToqib-c1bFKCDR6VzAy7mQ1sbB2q4qbXQ&m=5VkP4ja9fNEzaWI-Ql3pHh3DpzZG64G2udAeOO39JmaURStsRduUy6OUIrrD75-f&s=4bwR_40IpkhnOtGBMt4d2nAP8LWhIzm4RMxRyalruwY&e=>,
> or unsubscribe [github.com]<
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AGIAJI6A7RZRJ47WPU7OSPLVWKPULANCNFSM542GRTEQ&d=DwMFaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=NnH1lFEAbZToqib-c1bFKCDR6VzAy7mQ1sbB2q4qbXQ&m=5VkP4ja9fNEzaWI-Ql3pHh3DpzZG64G2udAeOO39JmaURStsRduUy6OUIrrD75-f&s=QdUBOpMN6eNbVb-GP2rFKTWh0LdyZG5kWlKyNwbqS5I&e=>.
>
> You are receiving this because you commented.Message ID: ***@***.***>
>
>
>
> --
> The Wellcome Sanger Institute is operated by Genome Research
> Limited, a charity registered in England with number 1021457 and a
> company registered in England with number 2742969, whose registered
> office is 215 Euston Road, London, NW1 2BE.
>
> —
> Reply to this email directly, view it on GitHub
> <#169 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AVVJONQLUYPIN2ZEFUPYVU3VWKQ7FANCNFSM542GRTEQ>
> .
> You are receiving this because you authored the thread.Message ID:
> ***@***.***>
>
|
Beta Was this translation helpful? Give feedback.
-
that's due to random seed/intitialisation for the spring layout. unfortunately there's no parameters i can provide you as i did not enforce a fixed seed during the generation of the layouts. Everytime you run it, it will look slightly different but overall similar, just like how UMAP/tSNE will look slightly different when you run it on a different machine/different day. The appearance can be random, but the underlying graph should be the same, so don't worry about it.
Here you go: |
Beta Was this translation helpful? Give feedback.
-
It looks like most of the queries are solved now, so i will close this issue and convert to discussion. |
Beta Was this translation helpful? Give feedback.
-
Description of the question
Hi Kelvin,
I am going to regenerate the BCR network for one of the samples in your paper "Single-cell multi-omics analysis of the immune response in COVID-19" for one of the patient cases, for example the "COVID-19 PBMC Ncl-Cambridge-UCL/BCR_severe.h5sd".
But I am not sure where is the preprocessed dandelion file (result of singularity).
Would you please guide me to find the input files to regenerate the BCR network for this sample? I am going to make sure I am correctly using the pipeline. Thank you.
Regards,
Sara
Minimal example
No response
Any error message produced by the code above
No response
OS information
No response
Version information
No response
Additional context
No response
Beta Was this translation helpful? Give feedback.
All reactions