This project showcases a Python-based workflow for automating genomic data retrieval and visualization. Using NCBI assembly accession numbers, the pipeline downloads genome annotations (GFF3, FASTA, and metadata), builds a BLAST protein database, performs self-BLAST analyses, and extracts key genomic metrics such as coding region density, gene identity scores, and gene length distributions.
The processed data are then rendered into publication-ready Circos plots, dynamically generated for any species dataset, revealing structural and functional genome patterns.
I developed a Python workflow that automates the retrieval and visualization of genomic data from NCBI Datasets. Using assembly accession numbers, the pipeline downloads GFF and FASTA files along with assembly metadata, builds a local BLAST protein database, and performs a self-BLAST against the assembly’s protein.faa. It then extracts coding regions, top gene identity scores, gene lengths, densities, and replicate mappings to generate publication-ready Circos plots that summarize genomic relationships and structure.

