Dear Yuan,
Thank you for your interest in our work. Please refer to the methods section of
https://arxiv.org/abs/2003.06122, which has been expanded to the following during
revision:
Raw expression values were normalized and log transformed. We retained the cell clustering
based on the original studies when available. For each dataset where per-cell annotation
is not available, we re-processed the data from raw or normalized (whichever was deposited
alongside the original publication) quantification matrix. The standard scanpy (version
1.4.3) clustering procedure was followed. When batch information is available, harmony was
used to correct batch effects in the PC space and the corrected PCs were used for
computing nearest neighbour graphs. To re-annotate the cells, multiple clusterings of
different resolutions were generated among which the one best matching the published
clustering was picked and manual annotation was undertaken using marker genes described in
the original publication. Full details can be found in analysis notebooks available at
github.com/Teichlab/covid19_MS1/.
If you are to deposit data with us in raw quantification matrix, we can provide you code
for processing your dataset as an analysis notebook for your reference.
Best,
Ni
From: Cv19caingest <cv19caingest-bounces(a)sanger.ac.uk> on behalf of Yuan He
<yuanhe777tt(a)hotmail.com>
Date: Friday, 10 April 2020 at 14:43
To: "cv19caingest(a)sanger.ac.uk" <cv19caingest(a)sanger.ac.uk>
Cc: "jpopp4(a)jhu.edu" <jpopp4(a)jhu.edu>
Subject: [Cv19caingest] Question about data processing. Thank you! [EXT]
Dear COVID-19 cell atlas,
This is Yuan He, a graduate student from Dr. Alexis Battle’s lab in Johns Hopkins
University. Thank you for providing the great platform for researchers to study
COVID-19!
We have a question regarding processing of the single cell datasets of *.h5ad files. It
said that “Expression values can be in raw counts (which we will re-process) and/or
normalised values (which we will serve directly).”. However it’s not clear how the
processing was done for each dataset. It would be great if the information about whether
the processing was done by your pipeline or by the authors can be included in the website.
Also, it would be super helpful if your pipeline of processing could be available!
Thank you very much!
Best,
Yuan