COSMIC-cell-lines-webcrawler

为了使用oncopredict来预测肿瘤药物，首先我需要详细了解COSMIC数据库中的所有细胞系信息。为了实现这一目标，我采用了Python编程语言，并结合requests、lxml、pandas和BeautifulSoup4等库来进行数据爬取和处理。

在开始爬取之前，我首先分析了COSMIC数据库的网页结构，找到了包含细胞系信息的链接。随后，我使用requests库发送HTTP/GET请求，获取这些链接的网页内容。

使用lxml库，我能够从网页内容中提取所需的细胞系详情数据。将数据存储在pandas的数据框架中，使其便于进行后续处理和分析。

由于细胞系数量较大，我编写了批量爬取的代码，自动处理所有细胞系的链接。这样一来，我能够高效地获取大量数据，为后续预测工作打下了坚实的基础。

经过数据爬取和处理，最终我成功获得了1020个细胞系的详细数据（CSV），包括与疾病相关的信息，这些数据将成为使用oncopredict进行肿瘤药物预测的重要依据。

To utilize oncopredict for tumor drug prediction, I needed to acquire detailed information about all cell lines in the COSMIC database.

I employed Python, along with the requests, lxml, pandas, and BeautifulSoup4 libraries, to scrape the required links for all cell lines.

The links were crawled in bulk, using a batch processing approach, to efficiently gather the data.

In the end, I successfully obtained detailed data on 1020 cell lines, including their associated diseases and other relevant information. These data serve as a crucial foundation for conducting tumor drug prediction with oncopredict.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
COSMIC细胞系爬虫		COSMIC细胞系爬虫
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

COSMIC细胞系爬虫

COSMIC细胞系爬虫

README.md

README.md

Repository files navigation

COSMIC-cell-lines-webcrawler

About

Releases

Packages

Languages

pheonixchen/COSMIC-cell-lines-webcrawler

Folders and files

Latest commit

History

COSMIC细胞系爬虫

COSMIC细胞系爬虫

README.md

README.md

Repository files navigation

COSMIC-cell-lines-webcrawler

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages