Analysis Surrounding HRP2 and HRP3 deletions
  • Home
  • Final Manuscript
  • Window Analysis
    • Windows
    • Running Haplotype Reconstruction on Windows
    • Genomic Locations Of Final Windows

    • Window analysis by coverage
    • Processing Coverage Initial Windows
    • Processing Coverage on Sub Windows

    • Window analysis of deletion patterns
    • Telomere healing
    • Processing Samples with HRP2 Deletions TARE1
    • Processing Samples with Chr11 Deletions TARE1
    • Processing Samples for chr13 TARE1 presence
    • pfmdr1 duplication
    • Processing Samples with pfhrp3 13-5++ deletion pattern

    • Final Coverage Windows
    • Processing Coverage on Sub Windows - final

    • Window analysis by sequence/variation
    • Plotting haplotype variation within regions

    • Analysis by SNP variant analysis
    • Calling variants and Estimating COI
    • Plotting BiAllelic Variant Plots
  • HB3/SD01 Longreads analysis
    • Set up
    • Creating Hybrid genomes

    • Spanning Raw Reads analysis
    • Processing Spanning Reads
    • SD01 spanning specific

    • HB3
    • Processing chr11 and chr13
    • Final Process Assembly

    • SD01
    • Running SD01 assemblies
    • Processing SD01 assemblies

    • Both
    • Illumina against HB3/SD01 Assemblies
    • Comparison To 3D7 Simplified View
  • rRNA Segmental Duplications

    • Chr11/13 Duplicated Region
    • Characterizing Duplicated Region
  • Related Genomic Regions Vis
    • Analysis
    • Finding shared regions genome wide
    • Mapping out surrounding Genes on Assembled Strains

    • Misc
    • Plotting HRPs Tandem Repeats
    • Info on all rRNA
  • Comparing to related Plasmodiums
    • Comparing to all 6 Plasmodium Laverania
    • Comparing to all 6 Plasmodium Laverania Gene Arrangements chr05,07,08,11,13
    • Comparing to HRP2/3 falciparum sequences
  • References
    • Getting Raw Data References
    • References
    • R Session and Commandline tools info

Contents

  • Getting the larger organization sub unit
  • Summary
  • Genomic locations of rRNA loci/subunits
  • All by all percent identiy between the different rRNA loci
  • Full unit
  • By subunit
    • 5s
    • 5.8s
    • 18s
      • With branch lengths
    • 28s
      • With branch lengths

Info on 3D7 rRNA

  • Show All Code
  • Hide All Code

  • View Source

Getting the larger organization sub unit

From: Gardner, Malcolm J., Neil Hall, Eula Fung, Owen White, Matthew Berriman, Richard W. Hyman, Jane M. Carlton, et al. 2002. “Genome Sequence of the Human Malaria Parasite Plasmodium Falciparum.” Nature 419 (6906): 498–511.

Unlike many other eukaryotes, the malaria parasite genome does not contain long tandemly repeated arrays of ribosomal RNA (rRNA) genes. Instead, Plasmodium parasites contain several single 18S-5.8S-28S rRNA units distributed on different chromosomes. The sequence encoded by a rRNA gene in one unit differs from the sequence of the corresponding rRNA in the other units. Furthermore, the expression of each rRNA unit is developmentally regulated, resulting in the expression of a different set of rRNAs at different stages of the parasite life cycle. It is likely that by changing the properties of its ribosomes the parasite is able to alter the rate of translation, either globally or of specific messenger RNAs (mRNAs), thereby changing the rate of cell growth or altering patterns of cell development. The two types of rRNA genes previously described in P. falciparum are the S-type, expressed primarily in the mosquito vector, and the A-type, expressed primarily in the human host. Seven loci encoding rRNAs were identified in the genome sequence (Fig. 1). Two copies of the S-type rRNA genes are located on chromosomes 11 and 13, and two copies of the A-type genes are located on chromosomes 5 and 7. In addition, chromosome 1 contains a third, rRNA unit that encodes 18S and 5.8S rRNAs that are almost identical to the S-type genes on chromosomes 11 and 13, but has a significantly divergent 28S rRNA gene (65% identity to the A-type and 75% identity to the S-type). The expression profiles of these genes are unknown. Chromosome 8 also contains two unusual rRNA gene units that contain 5.8S and 28S rRNA genes but do not encode 18S rRNAs; it is not known whether these genes are functional. The 5S rRNA is encoded by three identical tandemly arrayed genes on chromosome 14.

Summary

  • two types of rRNA genes in P. falciparum encoded in 7 loci
    • S-type
      • expressed primarily while in the mosquito vector
      • Full 18S-5.8S-28S rRNA units
        • on chromosome 11 and 13
    • A-type
      • expressed primarily while in the human host
      • Full 18S-5.8S-28S rRNA units
        • on chromosome 5 and 7
    • Mix-type
      • On chromosome 1
        • 18S-5.8S rRNAs identical to S-type but a 28S rRNA unit that is 65% identity to the A-type and 75% identity to the S-type
    • Partial
      • chromosome 8 two unusual rRNA gene units that contain 5.8S and 28S rRNA units but no 18s rRNAs
        • unknown if these are functional
    • 5S rRNA
      • three identical tandemly arrayed units on chromosome 14
Code
elucidator gffDescriptionsCount --gff /tank/data/genomes/plasmodium/genomes/pf/info/gff/Pf3D7.gff | egrep "ribosomal" | egrep "RNA" | tail -4 | cut -f1 > rRNA_descriptionsUpdated.txt
elucidator gffToBedByDescription --gff /tank/data/genomes/plasmodium/genomes/pf/info/gff/Pf3D7.gff --description rRNA_descriptionsUpdated.txt --features gene --out  updatedrRNA_locs
elucidator getFastaWithBed --bed updatedrRNA_locs.bed --twoBit /tank/data/genomes/plasmodium/genomes/pf/genomes/Pf3D7.2bit --out updatedrRNA_locs.fasta --overWrite 

bedtools merge -d 2000 -i updatedrRNA_locs.bed | elucidator bed3ToBed6 --bed STDIN  | elucidator filterBedRecordsByLength --minLen 2000 --bed STDIN --overWrite | elucidator reorientBedToIntersectingGeneInGff --bed STDIN --gff /tank/data/genomes/plasmodium/genomes/pf/info/gff/Pf3D7.gff --out merged_updatedrRNA_locs.bed --overWrite

# against reichnowi 
elucidator extractRefSeqsFromGenomes --bed merged_updatedrRNA_locs.bed --genomeDir /tank/data/genomes/plasmodium/genomes/plasmodiumRefGenomes/genomes/ --gffDir /tank/data/genomes/plasmodium/genomes/plasmodiumRefGenomes/info/gff/ --primaryGenome Pf3D7 --outputDir pr_extract_rRNA_regions --numThreads 15 --overWriteDirs --selectedGenomes PRG01,Pf3D7  

# references 
elucidator extractRefSeqsFromGenomes --bed merged_updatedrRNA_locs.bed --genomeDir /tank/data/genomes/plasmodium/genomes/plasmodiumRefGenomes/genomes/ --gffDir /tank/data/genomes/plasmodium/genomes/plasmodiumRefGenomes/info/gff/ --primaryGenome Pf3D7 --outputDir primaryReference_extract_rRNA_regions --numThreads 15 --overWriteDirs  

elucidator extractRefSeqsFromGenomes --bed merged_updatedrRNA_locs.bed --genomeDir /tank/data/genomes/plasmodium/genomes/pf/genomes/ --gffDir /tank/data/genomes/plasmodium/genomes/pf/info/gff/ --primaryGenome Pf3D7 --outputDir extract_rRNA_regions --numThreads 15 --overWriteDirs
cd extract_rRNA_regions

cat */beds/*.bed | elucidator bedCoordSort --bed STDIN | cut -f1-6 | bedtools merge | egrep -v _00_ | elucidator bed3ToBed6 --bed STDIN | elucidator reorientBedToIntersectingGeneInGff --bed STDIN --gff /tank/data/genomes/combinedGenomes/allPf/info/gff/allPf.gff --overWrite --out allrRNA_locs.bed
elucidator getFastaWithBed --bed allrRNA_locs.bed --twoBit /tank/data/genomes/combinedGenomes/allPfRaw/genomes/allPfRaw.2bit  --out allrRNA_locs.fasta --overWrite

muscle -in allrRNA_locs.fasta -out aln_allrRNA_locs.fasta
FastTree -nt aln_allrRNA_locs.fasta > aln_allrRNA_locs.nwk

# all by all compare for percent identity  
elucidator compareAllByAll --fasta allrRNA_locs.fasta --out allByAll.tab.txt --numThreads 15 --overWrite --verbose --alnInfoDir alnCache  

Genomic locations of rRNA loci/subunits

Code
regions = readr::read_tsv("allGenomicPfrRNA.bed", col_names = F)
DT::datatable(regions,
          extensions = 'Buttons', options = list(
    dom = 'Bfrtip',
    buttons = c('csv')
  ))

All by all percent identiy between the different rRNA loci

Code
allByAll = readr::read_tsv("extract_rRNA_regions/allByAll.tab.txt")

allByAll_3d7 = allByAll %>% 
  filter(grepl("Pf3D7", ReadId))%>% 
  filter(grepl("Pf3D7", OtherReadId)) %>% 
  mutate(OtherReadId = factor(OtherReadId, levels = naturalsort::naturalsort(unique(c(.$ReadId, .$OtherReadId))))) %>% 
  mutate(ReadId = factor(ReadId, levels = naturalsort::naturalsort(unique(c(.$ReadId, .$OtherReadId))))) 

ggplot(allByAll_3d7) + 
  geom_tile(aes(x = ReadId, y = OtherReadId, fill = perId), color = "black") + 
  geom_text(aes(x = ReadId, y = OtherReadId, label = round(perId *100, 1))) + 
  sofonias_theme_xRotate

Full unit

Code
library(ggtree)
rRna_tree = ape::read.tree("extract_rRNA_regions/aln_allrRNA_locs.nwk")

rRna_tree_labelDf = tibble(seq = rRna_tree$tip.label) %>% 
  mutate(chromNum = gsub("-.*", "", seq)) %>% 
  mutate(chromNum = gsub("_v3", "", chromNum)) %>% 
  mutate(chromNum = gsub(".*_", "", chromNum)) %>% 
  mutate(genome = gsub("_.*", "", seq)) %>% 
  separate(seq, into = c("chrom", "start", "end"), sep = "-", convert = T, remove = F) %>% 
  arrange(genome, chrom, start) %>% 
  group_by(genome, chromNum) %>% 
  mutate(chromID = row_number(), 
         total = n()) %>% 
  mutate(chromLabel = ifelse(total > 1, paste0(chromNum, "-", chromID), chromNum ) ) %>% 
  mutate(chromLabel = ifelse("PfGA01_08-1184158-1188520" == seq, "08-2", chromLabel))

chroms = list()
for(currentChrom in unique(rRna_tree_labelDf$chromLabel)){
  rRna_tree_labelDf_chrom = rRna_tree_labelDf %>% 
    filter(chromLabel == currentChrom)
  chroms[[currentChrom]] = rRna_tree_labelDf_chrom$seq
}
rRna_tree = groupOTU(rRna_tree, chroms)

print(ggtree(rRna_tree, layout = 'circular', branch.length='none', aes(color = group)) +
  geom_tiplab() + 
  scale_color_manual("chromosomeGroup", values = c("black", "#0AB45A", "#FA7850", "#AA0A3C", "#FA78FA", "#005AC8", "#8214A0", "#14D2DC"), 
                     guide = guide_legend(override.aes=list(shape=1, size =1)))  + hexpand(0.5) +  theme(plot.margin=margin(120, 120, 120, 120)))

Code
pdf("rRNA_tree.pdf", width = 11, height = 11, useDingbats = F)
print(ggtree(rRna_tree, layout = 'circular', branch.length='none', aes(color = group)) +
  geom_tiplab() + 
  scale_color_manual("chromosomeGroup", values = c("black", "#0AB45A", "#FA7850", "#AA0A3C", "#FA78FA", "#005AC8", "#8214A0", "#14D2DC"), 
                     guide = guide_legend(override.aes=list(shape=1, size =1)))  + hexpand(0.5) +  theme(plot.margin=margin(120, 120, 120, 120)))
# print(ggtree(rRna_tree, layout = 'circular', aes(color = group)) +
#   geom_tiplab() + 
#   scale_color_manual("chromosomeGroup", values = c("black", "#0AB45A", "#FA7850", "#AA0A3C", "#FA78FA", "#005AC8", "#8214A0", "#14D2DC"), 
#                      guide = guide_legend(override.aes=list(shape=1, size =1)))  + hexpand(0.5) +  theme(plot.margin=margin(120, 120, 120, 120)))
dev.off()
quartz_off_screen 
                2 

By subunit

5s

Code
elucidator gffToBedByDescription --gff /tank/data/genomes/plasmodium/genomes/pf/info/gff/Pf3D7.gff --description 5s_rRNA_descriptionsUpdated.txt --features gene --out  5s_rRNA_locs.bed

elucidator getFastaWithBed --bed 5s_rRNA_locs.bed --twoBit /tank/data/genomes/plasmodium/genomes/pf/genomes/Pf3D7.2bit --out 5s_rRNA_locs.fasta --overWrite 

elucidator extractRefSeqsFromGenomes --bed 5s_rRNA_locs.bed --genomeDir /tank/data/genomes/plasmodium/genomes/pf/genomes/ --gffDir /tank/data/genomes/plasmodium/genomes/pf/info/gff/ --primaryGenome Pf3D7 --outputDir 5s_rRNA_locs_regions --numThreads 15 --overWriteDirs

cd 5s_rRNA_locs_regions

cat */beds/*.bed | elucidator bedCoordSort --bed STDIN | cut -f1-6 | bedtools merge | egrep -v _00_ | elucidator bed3ToBed6 --bed STDIN --overWrite --out allrRNA_locs.bed

elucidator getFastaWithBed --bed allrRNA_locs.bed --twoBit /tank/data/genomes/combinedGenomes/allPfRaw/genomes/allPfRaw.2bit  --out allrRNA_locs.fasta --overWrite
muscle -in allrRNA_locs.fasta -out aln_allrRNA_locs.fasta
FastTree -nt aln_allrRNA_locs.fasta > aln_allrRNA_locs.nwk
Code
library(ggtree)
rRna_tree = ape::read.tree("5s_rRNA_locs_regions/aln_allrRNA_locs.nwk")

rRna_tree_labelDf = tibble(seq = rRna_tree$tip.label) %>% 
  mutate(chromNum = gsub("-.*", "", seq)) %>% 
  mutate(chromNum = gsub("_v3", "", chromNum)) %>% 
  mutate(chromNum = gsub(".*_", "", chromNum)) %>% 
  mutate(genome = gsub("_.*", "", seq)) %>% 
  separate(seq, into = c("chrom", "start", "end"), sep = "-", convert = T, remove = F) %>% 
  arrange(genome, chrom, start) %>% 
  group_by(genome, chromNum) %>% 
  mutate(chromID = row_number(), 
         total = n()) %>% 
  mutate(chromLabel = ifelse(total > 1, paste0(chromNum, "-", chromID), chromNum ) ) 

chroms = list()
for(currentChrom in unique(rRna_tree_labelDf$chromLabel)){
  rRna_tree_labelDf_chrom = rRna_tree_labelDf %>% 
    filter(chromLabel == currentChrom)
  chroms[[currentChrom]] = rRna_tree_labelDf_chrom$seq
}
rRna_tree = groupOTU(rRna_tree, chroms)
rRNA_tree_plot_5s = ggtree(rRna_tree, layout = 'circular', branch.length='none', aes(color = group)) +
  geom_tiplab() + 
  scale_color_manual("chromosomeGroup", values = c("black", "#0AB45A", "#FA7850", "#AA0A3C", "#FA78FA", "#005AC8", "#8214A0", "#14D2DC"), 
                     guide = guide_legend(override.aes=list(shape=1, size =1)))  + hexpand(0.5) +  theme(plot.margin=margin(120, 120, 120, 120))
print(rRNA_tree_plot_5s)

Code
pdf("5s_rRNA_tree.pdf", width = 11, height = 11, useDingbats = F)
print(rRNA_tree_plot_5s)
dev.off()
quartz_off_screen 
                2 

5.8s

Code
elucidator gffToBedByDescription --gff /tank/data/genomes/plasmodium/genomes/pf/info/gff/Pf3D7.gff --description 5.8s_rRNA_descriptionsUpdated.txt --features gene --out  5.8s_rRNA_locs.bed

elucidator getFastaWithBed --bed 5.8s_rRNA_locs.bed --twoBit /tank/data/genomes/plasmodium/genomes/pf/genomes/Pf3D7.2bit --out 5.8s_rRNA_locs.fasta --overWrite 

elucidator extractRefSeqsFromGenomes --bed 5.8s_rRNA_locs.bed --genomeDir /tank/data/genomes/plasmodium/genomes/pf/genomes/ --gffDir /tank/data/genomes/plasmodium/genomes/pf/info/gff/ --primaryGenome Pf3D7 --outputDir 5.8s_rRNA_locs_regions --numThreads 15 --overWriteDirs

cd 5.8s_rRNA_locs_regions

cat */beds/*.bed | elucidator bedCoordSort --bed STDIN | cut -f1-6 | bedtools merge | egrep -v _00_ | elucidator bed3ToBed6 --bed STDIN --overWrite --out allrRNA_locs.bed

elucidator getFastaWithBed --bed allrRNA_locs.bed --twoBit /tank/data/genomes/combinedGenomes/allPfRaw/genomes/allPfRaw.2bit  --out allrRNA_locs.fasta --overWrite
muscle -in allrRNA_locs.fasta -out aln_allrRNA_locs.fasta
FastTree -nt aln_allrRNA_locs.fasta > aln_allrRNA_locs.nwk
Code
library(ggtree)
rRna_tree = ape::read.tree("5.8s_rRNA_locs_regions/aln_allrRNA_locs.nwk")

rRna_tree_labelDf = tibble(seq = rRna_tree$tip.label) %>% 
  mutate(chromNum = gsub("-.*", "", seq)) %>% 
  mutate(chromNum = gsub("_v3", "", chromNum)) %>% 
  mutate(chromNum = gsub(".*_", "", chromNum)) %>% 
  mutate(genome = gsub("_.*", "", seq)) %>% 
  separate(seq, into = c("chrom", "start", "end"), sep = "-", convert = T, remove = F) %>% 
  arrange(genome, chrom, start) %>% 
  group_by(genome, chromNum) %>% 
  mutate(chromID = row_number(), 
         total = n()) %>% 
  mutate(chromLabel = ifelse(total > 1, paste0(chromNum, "-", chromID), chromNum ) ) 

chroms = list()
for(currentChrom in unique(rRna_tree_labelDf$chromLabel)){
  rRna_tree_labelDf_chrom = rRna_tree_labelDf %>% 
    filter(chromLabel == currentChrom)
  chroms[[currentChrom]] = rRna_tree_labelDf_chrom$seq
}
rRna_tree = groupOTU(rRna_tree, chroms)

rRNA_tree_plot_5.8s = ggtree(rRna_tree, layout = 'circular', branch.length='none', aes(color = group)) +
  geom_tiplab() + 
  scale_color_manual("chromosomeGroup", values = c("black", "#0AB45A", "#FA7850", "#AA0A3C", "#FA78FA", "#005AC8", "#8214A0", "#14D2DC"), 
                     guide = guide_legend(override.aes=list(shape=1, size =1)))  + hexpand(0.5) +  theme(plot.margin=margin(120, 120, 120, 120))
print(rRNA_tree_plot_5.8s)

Code
pdf("5.8s_rRNA_tree.pdf", width = 11, height = 11, useDingbats = F)
print(rRNA_tree_plot_5.8s)
dev.off()
quartz_off_screen 
                2 

18s

Code
elucidator gffToBedByDescription --gff /tank/data/genomes/plasmodium/genomes/pf/info/gff/Pf3D7.gff --description 18s_rRNA_descriptionsUpdated.txt --features gene --out  18s_rRNA_locs.bed

elucidator getFastaWithBed --bed 18s_rRNA_locs.bed --twoBit /tank/data/genomes/plasmodium/genomes/pf/genomes/Pf3D7.2bit --out 18s_rRNA_locs.fasta --overWrite 

elucidator extractRefSeqsFromGenomes --bed 18s_rRNA_locs.bed --genomeDir /tank/data/genomes/plasmodium/genomes/pf/genomes/ --gffDir /tank/data/genomes/plasmodium/genomes/pf/info/gff/ --primaryGenome Pf3D7 --outputDir 18s_rRNA_locs_regions --numThreads 15 --overWriteDirs

cd 18s_rRNA_locs_regions

cat */beds/*.bed | elucidator bedCoordSort --bed STDIN | cut -f1-6 | bedtools merge | egrep -v _00_ | elucidator bed3ToBed6 --bed STDIN --overWrite --out allrRNA_locs.bed

elucidator getFastaWithBed --bed allrRNA_locs.bed --twoBit /tank/data/genomes/combinedGenomes/allPfRaw/genomes/allPfRaw.2bit  --out allrRNA_locs.fasta --overWrite
muscle -in allrRNA_locs.fasta -out aln_allrRNA_locs.fasta
FastTree -nt aln_allrRNA_locs.fasta > aln_allrRNA_locs.nwk
Code
library(ggtree)
rRna_tree = ape::read.tree("18s_rRNA_locs_regions/aln_allrRNA_locs.nwk")

rRna_tree_labelDf = tibble(seq = rRna_tree$tip.label) %>% 
  mutate(chromNum = gsub("-.*", "", seq)) %>% 
  mutate(chromNum = gsub("_v3", "", chromNum)) %>% 
  mutate(chromNum = gsub(".*_", "", chromNum)) %>% 
  mutate(genome = gsub("_.*", "", seq)) %>% 
  separate(seq, into = c("chrom", "start", "end"), sep = "-", convert = T, remove = F) %>% 
  arrange(genome, chrom, start) %>% 
  group_by(genome, chromNum) %>% 
  mutate(chromID = row_number(), 
         total = n()) %>% 
  mutate(chromLabel = ifelse(total > 1, paste0(chromNum, "-", chromID), chromNum ) ) 

chroms = list()
for(currentChrom in unique(rRna_tree_labelDf$chromLabel)){
  rRna_tree_labelDf_chrom = rRna_tree_labelDf %>% 
    filter(chromLabel == currentChrom)
  chroms[[currentChrom]] = rRna_tree_labelDf_chrom$seq
}
rRna_tree = groupOTU(rRna_tree, chroms)

rRNA_tree_plot_18s = ggtree(rRna_tree, layout = 'circular', branch.length='none', aes(color = group)) +
  geom_tiplab() + 
  scale_color_manual("chromosomeGroup", values = c("black", "#0AB45A", "#FA7850", "#AA0A3C", "#FA78FA", "#005AC8", "#8214A0", "#14D2DC"), 
                     guide = guide_legend(override.aes=list(shape=1, size =1)))  + hexpand(0.5) +  theme(plot.margin=margin(120, 120, 120, 120)) 
print(rRNA_tree_plot_18s )

Code
pdf("18s_rRNA_tree.pdf", width = 11, height = 11, useDingbats = F)
print(rRNA_tree_plot_18s+  theme(plot.margin=margin(120, 120, 120, 120)))
dev.off()
quartz_off_screen 
                2 

With branch lengths

Code
rRNA_tree_plot_18s_brLen = ggtree(rRna_tree, layout = 'circular', aes(color = group)) +
  geom_tiplab() + 
  scale_color_manual("chromosomeGroup", values = c("black", "#0AB45A", "#FA7850", "#AA0A3C", "#FA78FA", "#005AC8", "#8214A0", "#14D2DC"), 
                     guide = guide_legend(override.aes=list(shape=1, size =1)))  + hexpand(0.5) +  theme(plot.margin=margin(120, 120, 120, 120))
print(rRNA_tree_plot_18s_brLen)

Code
pdf("18s_rRNA_tree_brLen.pdf", width = 11, height = 11, useDingbats = F)
print(rRNA_tree_plot_18s_brLen)
dev.off()
quartz_off_screen 
                2 

28s

Code
elucidator gffToBedByDescription --gff /tank/data/genomes/plasmodium/genomes/pf/info/gff/Pf3D7.gff --description 28s_rRNA_descriptionsUpdated.txt --features gene --out  28s_rRNA_locs.bed

elucidator getFastaWithBed --bed 28s_rRNA_locs.bed --twoBit /tank/data/genomes/plasmodium/genomes/pf/genomes/Pf3D7.2bit --out 28s_rRNA_locs.fasta --overWrite 

elucidator extractRefSeqsFromGenomes --bed 28s_rRNA_locs.bed --genomeDir /tank/data/genomes/plasmodium/genomes/pf/genomes/ --gffDir /tank/data/genomes/plasmodium/genomes/pf/info/gff/ --primaryGenome Pf3D7 --outputDir 28s_rRNA_locs_regions --numThreads 15 --overWriteDirs

cd 28s_rRNA_locs_regions

cat */beds/*.bed | elucidator bedCoordSort --bed STDIN | cut -f1-6 | bedtools merge | egrep -v _00_ | elucidator bed3ToBed6 --bed STDIN --overWrite --out allrRNA_locs.bed

elucidator getFastaWithBed --bed allrRNA_locs.bed --twoBit /tank/data/genomes/combinedGenomes/allPfRaw/genomes/allPfRaw.2bit  --out allrRNA_locs.fasta --overWrite
muscle -in allrRNA_locs.fasta -out aln_allrRNA_locs.fasta
FastTree -nt aln_allrRNA_locs.fasta > aln_allrRNA_locs.nwk
Code
library(ggtree)
rRna_tree = ape::read.tree("28s_rRNA_locs_regions/aln_allrRNA_locs.nwk")

rRna_tree_labelDf = tibble(seq = rRna_tree$tip.label) %>% 
  mutate(chromNum = gsub("-.*", "", seq)) %>% 
  mutate(chromNum = gsub("_v3", "", chromNum)) %>% 
  mutate(chromNum = gsub(".*_", "", chromNum)) %>% 
  mutate(genome = gsub("_.*", "", seq)) %>% 
  separate(seq, into = c("chrom", "start", "end"), sep = "-", convert = T, remove = F) %>% 
  arrange(genome, chrom, start) %>% 
  group_by(genome, chromNum) %>% 
  mutate(chromID = row_number(), 
         total = n()) %>% 
  mutate(chromLabel = ifelse(total > 1, paste0(chromNum, "-", chromID), chromNum ) ) %>% 
  mutate(chromLabel = ifelse("PfGA01_08-1184151-1188744" == seq, "08-2", chromLabel))

chroms = list()
for(currentChrom in unique(rRna_tree_labelDf$chromLabel)){
  rRna_tree_labelDf_chrom = rRna_tree_labelDf %>% 
    filter(chromLabel == currentChrom)
  chroms[[currentChrom]] = rRna_tree_labelDf_chrom$seq
}
rRna_tree = groupOTU(rRna_tree, chroms)

rRNA_tree_plot_28s = ggtree(rRna_tree, layout = 'circular', branch.length='none', aes(color = group)) +
  geom_tiplab() + 
  scale_color_manual("chromosomeGroup", values = c("black", "#0AB45A", "#FA7850", "#AA0A3C", "#FA78FA", "#005AC8", "#8214A0", "#14D2DC"), 
                     guide = guide_legend(override.aes=list(shape=1, size =1)))  + hexpand(0.5) +  theme(plot.margin=margin(120, 120, 120, 120))
print(rRNA_tree_plot_28s)

Code
pdf("28s_rRNA_tree.pdf", width = 11, height = 11, useDingbats = F)
print(rRNA_tree_plot_28s)
dev.off()
quartz_off_screen 
                2 

With branch lengths

Code
rRNA_tree_plot_28s_brLen = ggtree(rRna_tree, layout = 'circular', aes(color = group)) +
  geom_tiplab() + 
  scale_color_manual("chromosomeGroup", values = c("black", "#0AB45A", "#FA7850", "#AA0A3C", "#FA78FA", "#005AC8", "#8214A0", "#14D2DC"), 
                     guide = guide_legend(override.aes=list(shape=1, size =1)))  + hexpand(0.5) +  theme(plot.margin=margin(120, 120, 120, 120))
print(rRNA_tree_plot_28s_brLen)

Code
pdf("28s_rRNA_tree_brLen.pdf", width = 11, height = 11, useDingbats = F)
print(rRNA_tree_plot_28s_brLen)
dev.off()
quartz_off_screen 
                2 
Source Code
---
title: Info on 3D7 rRNA
---


```{r setup, echo=FALSE, message=FALSE}
source("../../common.R")
```

## Getting the larger organization sub unit 

From: Gardner, Malcolm J., Neil Hall, Eula Fung, Owen White, Matthew Berriman, Richard W. Hyman, Jane M. Carlton, et al. 2002. *“Genome Sequence of the Human Malaria Parasite Plasmodium Falciparum.”* Nature 419 (6906): 498–511. 

*Unlike many other eukaryotes, the malaria parasite genome does not contain long tandemly repeated arrays of ribosomal RNA (rRNA) genes. Instead, Plasmodium parasites contain several single 18S-5.8S-28S rRNA units distributed on different chromosomes. The sequence encoded by a rRNA gene in one unit differs from the sequence of the corresponding rRNA in the other units. Furthermore, the expression of each rRNA unit is developmentally regulated, resulting in the expression of a different set of rRNAs at different stages of the parasite life cycle. It is likely that by changing the properties of its ribosomes the parasite is able to alter the rate of translation, either globally or of specific messenger RNAs (mRNAs), thereby changing the rate of cell growth or altering patterns of cell development. The two types of rRNA genes previously described in P. falciparum are the S-type, expressed primarily in the mosquito vector, and the A-type, expressed primarily in the human host. Seven loci encoding rRNAs were identified in the genome sequence (Fig. 1). Two copies of the S-type rRNA genes are located on chromosomes 11 and 13, and two copies of the A-type genes are located on chromosomes 5 and 7. In addition, chromosome 1 contains a third, rRNA unit that encodes 18S and 5.8S rRNAs that are almost identical to the S-type genes on chromosomes 11 and 13, but has a significantly divergent 28S rRNA gene (65% identity to the A-type and 75% identity to the S-type). The expression profiles of these genes are unknown. Chromosome 8 also contains two unusual rRNA gene units that contain 5.8S and 28S rRNA genes but do not encode 18S rRNAs; it is not known whether these genes are functional. The 5S rRNA is encoded by three identical tandemly arrayed genes on chromosome 14.*

## Summary  

*  two types of rRNA genes in P. falciparum encoded in 7 loci  
    *  S-type  
        *  expressed primarily while in the mosquito vector
        *  Full 18S-5.8S-28S rRNA units  
            *  on chromosome 11 and 13  
    *  A-type  
        *  expressed primarily while in the human host  
        *  Full 18S-5.8S-28S rRNA units  
            *  on chromosome 5 and 7  
    *  Mix-type  
        *  On chromosome 1  
            *  18S-5.8S rRNAs identical to S-type but a 28S rRNA unit that is 65% identity to the A-type and 75% identity to the S-type  
    *  Partial  
        *  chromosome 8 two unusual rRNA gene units that contain 5.8S and 28S rRNA units but no 18s rRNAs 
            *  unknown if these are functional  
    *  5S rRNA  
        *  three identical tandemly arrayed units on chromosome 14  





```{bash, eval = F}
elucidator gffDescriptionsCount --gff /tank/data/genomes/plasmodium/genomes/pf/info/gff/Pf3D7.gff | egrep "ribosomal" | egrep "RNA" | tail -4 | cut -f1 > rRNA_descriptionsUpdated.txt
elucidator gffToBedByDescription --gff /tank/data/genomes/plasmodium/genomes/pf/info/gff/Pf3D7.gff --description rRNA_descriptionsUpdated.txt --features gene --out  updatedrRNA_locs
elucidator getFastaWithBed --bed updatedrRNA_locs.bed --twoBit /tank/data/genomes/plasmodium/genomes/pf/genomes/Pf3D7.2bit --out updatedrRNA_locs.fasta --overWrite 

bedtools merge -d 2000 -i updatedrRNA_locs.bed | elucidator bed3ToBed6 --bed STDIN  | elucidator filterBedRecordsByLength --minLen 2000 --bed STDIN --overWrite | elucidator reorientBedToIntersectingGeneInGff --bed STDIN --gff /tank/data/genomes/plasmodium/genomes/pf/info/gff/Pf3D7.gff --out merged_updatedrRNA_locs.bed --overWrite

# against reichnowi 
elucidator extractRefSeqsFromGenomes --bed merged_updatedrRNA_locs.bed --genomeDir /tank/data/genomes/plasmodium/genomes/plasmodiumRefGenomes/genomes/ --gffDir /tank/data/genomes/plasmodium/genomes/plasmodiumRefGenomes/info/gff/ --primaryGenome Pf3D7 --outputDir pr_extract_rRNA_regions --numThreads 15 --overWriteDirs --selectedGenomes PRG01,Pf3D7  

# references 
elucidator extractRefSeqsFromGenomes --bed merged_updatedrRNA_locs.bed --genomeDir /tank/data/genomes/plasmodium/genomes/plasmodiumRefGenomes/genomes/ --gffDir /tank/data/genomes/plasmodium/genomes/plasmodiumRefGenomes/info/gff/ --primaryGenome Pf3D7 --outputDir primaryReference_extract_rRNA_regions --numThreads 15 --overWriteDirs  

elucidator extractRefSeqsFromGenomes --bed merged_updatedrRNA_locs.bed --genomeDir /tank/data/genomes/plasmodium/genomes/pf/genomes/ --gffDir /tank/data/genomes/plasmodium/genomes/pf/info/gff/ --primaryGenome Pf3D7 --outputDir extract_rRNA_regions --numThreads 15 --overWriteDirs
cd extract_rRNA_regions

cat */beds/*.bed | elucidator bedCoordSort --bed STDIN | cut -f1-6 | bedtools merge | egrep -v _00_ | elucidator bed3ToBed6 --bed STDIN | elucidator reorientBedToIntersectingGeneInGff --bed STDIN --gff /tank/data/genomes/combinedGenomes/allPf/info/gff/allPf.gff --overWrite --out allrRNA_locs.bed
elucidator getFastaWithBed --bed allrRNA_locs.bed --twoBit /tank/data/genomes/combinedGenomes/allPfRaw/genomes/allPfRaw.2bit  --out allrRNA_locs.fasta --overWrite

muscle -in allrRNA_locs.fasta -out aln_allrRNA_locs.fasta
FastTree -nt aln_allrRNA_locs.fasta > aln_allrRNA_locs.nwk

# all by all compare for percent identity  
elucidator compareAllByAll --fasta allrRNA_locs.fasta --out allByAll.tab.txt --numThreads 15 --overWrite --verbose --alnInfoDir alnCache  

```

# Genomic locations of rRNA loci/subunits  

```{r}
regions = readr::read_tsv("allGenomicPfrRNA.bed", col_names = F)
DT::datatable(regions,
          extensions = 'Buttons', options = list(
    dom = 'Bfrtip',
    buttons = c('csv')
  ))
```



# All by all percent identiy between the different rRNA loci  

```{r}
allByAll = readr::read_tsv("extract_rRNA_regions/allByAll.tab.txt")

allByAll_3d7 = allByAll %>% 
  filter(grepl("Pf3D7", ReadId))%>% 
  filter(grepl("Pf3D7", OtherReadId)) %>% 
  mutate(OtherReadId = factor(OtherReadId, levels = naturalsort::naturalsort(unique(c(.$ReadId, .$OtherReadId))))) %>% 
  mutate(ReadId = factor(ReadId, levels = naturalsort::naturalsort(unique(c(.$ReadId, .$OtherReadId))))) 

ggplot(allByAll_3d7) + 
  geom_tile(aes(x = ReadId, y = OtherReadId, fill = perId), color = "black") + 
  geom_text(aes(x = ReadId, y = OtherReadId, label = round(perId *100, 1))) + 
  sofonias_theme_xRotate

```

# Full unit  

```{r}
#| fig-column: screen  
#| fig-height: 15
#| fig-width: 15 
library(ggtree)
rRna_tree = ape::read.tree("extract_rRNA_regions/aln_allrRNA_locs.nwk")

rRna_tree_labelDf = tibble(seq = rRna_tree$tip.label) %>% 
  mutate(chromNum = gsub("-.*", "", seq)) %>% 
  mutate(chromNum = gsub("_v3", "", chromNum)) %>% 
  mutate(chromNum = gsub(".*_", "", chromNum)) %>% 
  mutate(genome = gsub("_.*", "", seq)) %>% 
  separate(seq, into = c("chrom", "start", "end"), sep = "-", convert = T, remove = F) %>% 
  arrange(genome, chrom, start) %>% 
  group_by(genome, chromNum) %>% 
  mutate(chromID = row_number(), 
         total = n()) %>% 
  mutate(chromLabel = ifelse(total > 1, paste0(chromNum, "-", chromID), chromNum ) ) %>% 
  mutate(chromLabel = ifelse("PfGA01_08-1184158-1188520" == seq, "08-2", chromLabel))

chroms = list()
for(currentChrom in unique(rRna_tree_labelDf$chromLabel)){
  rRna_tree_labelDf_chrom = rRna_tree_labelDf %>% 
    filter(chromLabel == currentChrom)
  chroms[[currentChrom]] = rRna_tree_labelDf_chrom$seq
}
rRna_tree = groupOTU(rRna_tree, chroms)

print(ggtree(rRna_tree, layout = 'circular', branch.length='none', aes(color = group)) +
  geom_tiplab() + 
  scale_color_manual("chromosomeGroup", values = c("black", "#0AB45A", "#FA7850", "#AA0A3C", "#FA78FA", "#005AC8", "#8214A0", "#14D2DC"), 
                     guide = guide_legend(override.aes=list(shape=1, size =1)))  + hexpand(0.5) +  theme(plot.margin=margin(120, 120, 120, 120)))
pdf("rRNA_tree.pdf", width = 11, height = 11, useDingbats = F)
print(ggtree(rRna_tree, layout = 'circular', branch.length='none', aes(color = group)) +
  geom_tiplab() + 
  scale_color_manual("chromosomeGroup", values = c("black", "#0AB45A", "#FA7850", "#AA0A3C", "#FA78FA", "#005AC8", "#8214A0", "#14D2DC"), 
                     guide = guide_legend(override.aes=list(shape=1, size =1)))  + hexpand(0.5) +  theme(plot.margin=margin(120, 120, 120, 120)))
# print(ggtree(rRna_tree, layout = 'circular', aes(color = group)) +
#   geom_tiplab() + 
#   scale_color_manual("chromosomeGroup", values = c("black", "#0AB45A", "#FA7850", "#AA0A3C", "#FA78FA", "#005AC8", "#8214A0", "#14D2DC"), 
#                      guide = guide_legend(override.aes=list(shape=1, size =1)))  + hexpand(0.5) +  theme(plot.margin=margin(120, 120, 120, 120)))
dev.off()


```



# By subunit 

## 5s  

```{bash, eval = F}
elucidator gffToBedByDescription --gff /tank/data/genomes/plasmodium/genomes/pf/info/gff/Pf3D7.gff --description 5s_rRNA_descriptionsUpdated.txt --features gene --out  5s_rRNA_locs.bed

elucidator getFastaWithBed --bed 5s_rRNA_locs.bed --twoBit /tank/data/genomes/plasmodium/genomes/pf/genomes/Pf3D7.2bit --out 5s_rRNA_locs.fasta --overWrite 

elucidator extractRefSeqsFromGenomes --bed 5s_rRNA_locs.bed --genomeDir /tank/data/genomes/plasmodium/genomes/pf/genomes/ --gffDir /tank/data/genomes/plasmodium/genomes/pf/info/gff/ --primaryGenome Pf3D7 --outputDir 5s_rRNA_locs_regions --numThreads 15 --overWriteDirs

cd 5s_rRNA_locs_regions

cat */beds/*.bed | elucidator bedCoordSort --bed STDIN | cut -f1-6 | bedtools merge | egrep -v _00_ | elucidator bed3ToBed6 --bed STDIN --overWrite --out allrRNA_locs.bed

elucidator getFastaWithBed --bed allrRNA_locs.bed --twoBit /tank/data/genomes/combinedGenomes/allPfRaw/genomes/allPfRaw.2bit  --out allrRNA_locs.fasta --overWrite
muscle -in allrRNA_locs.fasta -out aln_allrRNA_locs.fasta
FastTree -nt aln_allrRNA_locs.fasta > aln_allrRNA_locs.nwk

```


```{r}
#| fig-column: screen  
#| #| fig-height: 15  
#| fig-width: 15  
library(ggtree)
rRna_tree = ape::read.tree("5s_rRNA_locs_regions/aln_allrRNA_locs.nwk")

rRna_tree_labelDf = tibble(seq = rRna_tree$tip.label) %>% 
  mutate(chromNum = gsub("-.*", "", seq)) %>% 
  mutate(chromNum = gsub("_v3", "", chromNum)) %>% 
  mutate(chromNum = gsub(".*_", "", chromNum)) %>% 
  mutate(genome = gsub("_.*", "", seq)) %>% 
  separate(seq, into = c("chrom", "start", "end"), sep = "-", convert = T, remove = F) %>% 
  arrange(genome, chrom, start) %>% 
  group_by(genome, chromNum) %>% 
  mutate(chromID = row_number(), 
         total = n()) %>% 
  mutate(chromLabel = ifelse(total > 1, paste0(chromNum, "-", chromID), chromNum ) ) 

chroms = list()
for(currentChrom in unique(rRna_tree_labelDf$chromLabel)){
  rRna_tree_labelDf_chrom = rRna_tree_labelDf %>% 
    filter(chromLabel == currentChrom)
  chroms[[currentChrom]] = rRna_tree_labelDf_chrom$seq
}
rRna_tree = groupOTU(rRna_tree, chroms)
rRNA_tree_plot_5s = ggtree(rRna_tree, layout = 'circular', branch.length='none', aes(color = group)) +
  geom_tiplab() + 
  scale_color_manual("chromosomeGroup", values = c("black", "#0AB45A", "#FA7850", "#AA0A3C", "#FA78FA", "#005AC8", "#8214A0", "#14D2DC"), 
                     guide = guide_legend(override.aes=list(shape=1, size =1)))  + hexpand(0.5) +  theme(plot.margin=margin(120, 120, 120, 120))
print(rRNA_tree_plot_5s)
pdf("5s_rRNA_tree.pdf", width = 11, height = 11, useDingbats = F)
print(rRNA_tree_plot_5s)
dev.off()


```


## 5.8s  

```{bash, eval = F}
elucidator gffToBedByDescription --gff /tank/data/genomes/plasmodium/genomes/pf/info/gff/Pf3D7.gff --description 5.8s_rRNA_descriptionsUpdated.txt --features gene --out  5.8s_rRNA_locs.bed

elucidator getFastaWithBed --bed 5.8s_rRNA_locs.bed --twoBit /tank/data/genomes/plasmodium/genomes/pf/genomes/Pf3D7.2bit --out 5.8s_rRNA_locs.fasta --overWrite 

elucidator extractRefSeqsFromGenomes --bed 5.8s_rRNA_locs.bed --genomeDir /tank/data/genomes/plasmodium/genomes/pf/genomes/ --gffDir /tank/data/genomes/plasmodium/genomes/pf/info/gff/ --primaryGenome Pf3D7 --outputDir 5.8s_rRNA_locs_regions --numThreads 15 --overWriteDirs

cd 5.8s_rRNA_locs_regions

cat */beds/*.bed | elucidator bedCoordSort --bed STDIN | cut -f1-6 | bedtools merge | egrep -v _00_ | elucidator bed3ToBed6 --bed STDIN --overWrite --out allrRNA_locs.bed

elucidator getFastaWithBed --bed allrRNA_locs.bed --twoBit /tank/data/genomes/combinedGenomes/allPfRaw/genomes/allPfRaw.2bit  --out allrRNA_locs.fasta --overWrite
muscle -in allrRNA_locs.fasta -out aln_allrRNA_locs.fasta
FastTree -nt aln_allrRNA_locs.fasta > aln_allrRNA_locs.nwk

```


```{r}
#| fig-column: screen    
#| #| fig-height: 15  
#| fig-width: 15  
library(ggtree)
rRna_tree = ape::read.tree("5.8s_rRNA_locs_regions/aln_allrRNA_locs.nwk")

rRna_tree_labelDf = tibble(seq = rRna_tree$tip.label) %>% 
  mutate(chromNum = gsub("-.*", "", seq)) %>% 
  mutate(chromNum = gsub("_v3", "", chromNum)) %>% 
  mutate(chromNum = gsub(".*_", "", chromNum)) %>% 
  mutate(genome = gsub("_.*", "", seq)) %>% 
  separate(seq, into = c("chrom", "start", "end"), sep = "-", convert = T, remove = F) %>% 
  arrange(genome, chrom, start) %>% 
  group_by(genome, chromNum) %>% 
  mutate(chromID = row_number(), 
         total = n()) %>% 
  mutate(chromLabel = ifelse(total > 1, paste0(chromNum, "-", chromID), chromNum ) ) 

chroms = list()
for(currentChrom in unique(rRna_tree_labelDf$chromLabel)){
  rRna_tree_labelDf_chrom = rRna_tree_labelDf %>% 
    filter(chromLabel == currentChrom)
  chroms[[currentChrom]] = rRna_tree_labelDf_chrom$seq
}
rRna_tree = groupOTU(rRna_tree, chroms)

rRNA_tree_plot_5.8s = ggtree(rRna_tree, layout = 'circular', branch.length='none', aes(color = group)) +
  geom_tiplab() + 
  scale_color_manual("chromosomeGroup", values = c("black", "#0AB45A", "#FA7850", "#AA0A3C", "#FA78FA", "#005AC8", "#8214A0", "#14D2DC"), 
                     guide = guide_legend(override.aes=list(shape=1, size =1)))  + hexpand(0.5) +  theme(plot.margin=margin(120, 120, 120, 120))
print(rRNA_tree_plot_5.8s)
pdf("5.8s_rRNA_tree.pdf", width = 11, height = 11, useDingbats = F)
print(rRNA_tree_plot_5.8s)
dev.off()


```


## 18s  

```{bash, eval = F}
elucidator gffToBedByDescription --gff /tank/data/genomes/plasmodium/genomes/pf/info/gff/Pf3D7.gff --description 18s_rRNA_descriptionsUpdated.txt --features gene --out  18s_rRNA_locs.bed

elucidator getFastaWithBed --bed 18s_rRNA_locs.bed --twoBit /tank/data/genomes/plasmodium/genomes/pf/genomes/Pf3D7.2bit --out 18s_rRNA_locs.fasta --overWrite 

elucidator extractRefSeqsFromGenomes --bed 18s_rRNA_locs.bed --genomeDir /tank/data/genomes/plasmodium/genomes/pf/genomes/ --gffDir /tank/data/genomes/plasmodium/genomes/pf/info/gff/ --primaryGenome Pf3D7 --outputDir 18s_rRNA_locs_regions --numThreads 15 --overWriteDirs

cd 18s_rRNA_locs_regions

cat */beds/*.bed | elucidator bedCoordSort --bed STDIN | cut -f1-6 | bedtools merge | egrep -v _00_ | elucidator bed3ToBed6 --bed STDIN --overWrite --out allrRNA_locs.bed

elucidator getFastaWithBed --bed allrRNA_locs.bed --twoBit /tank/data/genomes/combinedGenomes/allPfRaw/genomes/allPfRaw.2bit  --out allrRNA_locs.fasta --overWrite
muscle -in allrRNA_locs.fasta -out aln_allrRNA_locs.fasta
FastTree -nt aln_allrRNA_locs.fasta > aln_allrRNA_locs.nwk

```


```{r}
#| fig-column: screen    
#| #| fig-height: 15  
#| fig-width: 15  
library(ggtree)
rRna_tree = ape::read.tree("18s_rRNA_locs_regions/aln_allrRNA_locs.nwk")

rRna_tree_labelDf = tibble(seq = rRna_tree$tip.label) %>% 
  mutate(chromNum = gsub("-.*", "", seq)) %>% 
  mutate(chromNum = gsub("_v3", "", chromNum)) %>% 
  mutate(chromNum = gsub(".*_", "", chromNum)) %>% 
  mutate(genome = gsub("_.*", "", seq)) %>% 
  separate(seq, into = c("chrom", "start", "end"), sep = "-", convert = T, remove = F) %>% 
  arrange(genome, chrom, start) %>% 
  group_by(genome, chromNum) %>% 
  mutate(chromID = row_number(), 
         total = n()) %>% 
  mutate(chromLabel = ifelse(total > 1, paste0(chromNum, "-", chromID), chromNum ) ) 

chroms = list()
for(currentChrom in unique(rRna_tree_labelDf$chromLabel)){
  rRna_tree_labelDf_chrom = rRna_tree_labelDf %>% 
    filter(chromLabel == currentChrom)
  chroms[[currentChrom]] = rRna_tree_labelDf_chrom$seq
}
rRna_tree = groupOTU(rRna_tree, chroms)

rRNA_tree_plot_18s = ggtree(rRna_tree, layout = 'circular', branch.length='none', aes(color = group)) +
  geom_tiplab() + 
  scale_color_manual("chromosomeGroup", values = c("black", "#0AB45A", "#FA7850", "#AA0A3C", "#FA78FA", "#005AC8", "#8214A0", "#14D2DC"), 
                     guide = guide_legend(override.aes=list(shape=1, size =1)))  + hexpand(0.5) +  theme(plot.margin=margin(120, 120, 120, 120)) 
print(rRNA_tree_plot_18s )
pdf("18s_rRNA_tree.pdf", width = 11, height = 11, useDingbats = F)
print(rRNA_tree_plot_18s+  theme(plot.margin=margin(120, 120, 120, 120)))
dev.off()

```

### With branch lengths  

```{r}
#| fig-column: screen    
#| #| fig-height: 15  
#| fig-width: 15  
rRNA_tree_plot_18s_brLen = ggtree(rRna_tree, layout = 'circular', aes(color = group)) +
  geom_tiplab() + 
  scale_color_manual("chromosomeGroup", values = c("black", "#0AB45A", "#FA7850", "#AA0A3C", "#FA78FA", "#005AC8", "#8214A0", "#14D2DC"), 
                     guide = guide_legend(override.aes=list(shape=1, size =1)))  + hexpand(0.5) +  theme(plot.margin=margin(120, 120, 120, 120))
print(rRNA_tree_plot_18s_brLen)
pdf("18s_rRNA_tree_brLen.pdf", width = 11, height = 11, useDingbats = F)
print(rRNA_tree_plot_18s_brLen)
dev.off()


```


## 28s  

```{bash, eval = F}
elucidator gffToBedByDescription --gff /tank/data/genomes/plasmodium/genomes/pf/info/gff/Pf3D7.gff --description 28s_rRNA_descriptionsUpdated.txt --features gene --out  28s_rRNA_locs.bed

elucidator getFastaWithBed --bed 28s_rRNA_locs.bed --twoBit /tank/data/genomes/plasmodium/genomes/pf/genomes/Pf3D7.2bit --out 28s_rRNA_locs.fasta --overWrite 

elucidator extractRefSeqsFromGenomes --bed 28s_rRNA_locs.bed --genomeDir /tank/data/genomes/plasmodium/genomes/pf/genomes/ --gffDir /tank/data/genomes/plasmodium/genomes/pf/info/gff/ --primaryGenome Pf3D7 --outputDir 28s_rRNA_locs_regions --numThreads 15 --overWriteDirs

cd 28s_rRNA_locs_regions

cat */beds/*.bed | elucidator bedCoordSort --bed STDIN | cut -f1-6 | bedtools merge | egrep -v _00_ | elucidator bed3ToBed6 --bed STDIN --overWrite --out allrRNA_locs.bed

elucidator getFastaWithBed --bed allrRNA_locs.bed --twoBit /tank/data/genomes/combinedGenomes/allPfRaw/genomes/allPfRaw.2bit  --out allrRNA_locs.fasta --overWrite
muscle -in allrRNA_locs.fasta -out aln_allrRNA_locs.fasta
FastTree -nt aln_allrRNA_locs.fasta > aln_allrRNA_locs.nwk

```


```{r}
#| fig-column: screen    
#| #| fig-height: 15  
#| fig-width: 15  
library(ggtree)
rRna_tree = ape::read.tree("28s_rRNA_locs_regions/aln_allrRNA_locs.nwk")

rRna_tree_labelDf = tibble(seq = rRna_tree$tip.label) %>% 
  mutate(chromNum = gsub("-.*", "", seq)) %>% 
  mutate(chromNum = gsub("_v3", "", chromNum)) %>% 
  mutate(chromNum = gsub(".*_", "", chromNum)) %>% 
  mutate(genome = gsub("_.*", "", seq)) %>% 
  separate(seq, into = c("chrom", "start", "end"), sep = "-", convert = T, remove = F) %>% 
  arrange(genome, chrom, start) %>% 
  group_by(genome, chromNum) %>% 
  mutate(chromID = row_number(), 
         total = n()) %>% 
  mutate(chromLabel = ifelse(total > 1, paste0(chromNum, "-", chromID), chromNum ) ) %>% 
  mutate(chromLabel = ifelse("PfGA01_08-1184151-1188744" == seq, "08-2", chromLabel))

chroms = list()
for(currentChrom in unique(rRna_tree_labelDf$chromLabel)){
  rRna_tree_labelDf_chrom = rRna_tree_labelDf %>% 
    filter(chromLabel == currentChrom)
  chroms[[currentChrom]] = rRna_tree_labelDf_chrom$seq
}
rRna_tree = groupOTU(rRna_tree, chroms)

rRNA_tree_plot_28s = ggtree(rRna_tree, layout = 'circular', branch.length='none', aes(color = group)) +
  geom_tiplab() + 
  scale_color_manual("chromosomeGroup", values = c("black", "#0AB45A", "#FA7850", "#AA0A3C", "#FA78FA", "#005AC8", "#8214A0", "#14D2DC"), 
                     guide = guide_legend(override.aes=list(shape=1, size =1)))  + hexpand(0.5) +  theme(plot.margin=margin(120, 120, 120, 120))
print(rRNA_tree_plot_28s)
pdf("28s_rRNA_tree.pdf", width = 11, height = 11, useDingbats = F)
print(rRNA_tree_plot_28s)
dev.off()

```

### With branch lengths  

```{r}
#| fig-column: screen    
#| #| fig-height: 15  
#| fig-width: 15  
rRNA_tree_plot_28s_brLen = ggtree(rRna_tree, layout = 'circular', aes(color = group)) +
  geom_tiplab() + 
  scale_color_manual("chromosomeGroup", values = c("black", "#0AB45A", "#FA7850", "#AA0A3C", "#FA78FA", "#005AC8", "#8214A0", "#14D2DC"), 
                     guide = guide_legend(override.aes=list(shape=1, size =1)))  + hexpand(0.5) +  theme(plot.margin=margin(120, 120, 120, 120))
print(rRNA_tree_plot_28s_brLen)
pdf("28s_rRNA_tree_brLen.pdf", width = 11, height = 11, useDingbats = F)
print(rRNA_tree_plot_28s_brLen)
dev.off()

```