TEsmall

TEsmall is a tool that allows for the simultaneous processing and analysis of a variety of small RNAs in a single integrated workflow. These include microRNAs (miRNA), transfer RNAs (tRNA), small nucleolar RNAs (snoRNA), small nuclear RNAs (snRNA), Y-RNAs, Piwi-interacting RNAs (piRNA), and short interfering RNAs (siRNA). This package sorts small RNAs into their respective classes and estimates read abundance in each group. In particular, we use an EM strategy for statistical assignment of reads to small RNAs from repetitive genomic regions (such as siRNAs and piRNAs) at locus-specific resolution. TEsmall output is formatted for easy incorporation into downstream differential analysis packages like DESeq2.

If you encounter any issues or have any questions about TEsmall, please check out our Github page.

Download instructions

You can download the software package from GitHub, with detailed instructions on installation using Miniconda. Additional packages required by the TEsmall workflow include cutadapt, bowtie, bedtools, samtools, pybedtools, and scipy.

Prebuilt databases for hg19, hg38, dm6, mm10, mm39 and GRCz11 can be automatically downloaded by the software. Other builds could be downloaded from here into your database folder (see our Github page for more details)

Tool Description

TEsmall functions by accepting raw input in FASTQ file format from next generation sequencing platforms in conjunction with genomic annotation sets via an online server. Adapters are trimmed from FASTQ reads by the cutadapt package, and rRNA derived reads are next filtered from the data by aligning to rRNA sequences using bowtie. Filtered reads are then aligned to the genome using bowtie, allowing no mismatches and up to 100 alignments (reads with more alignments are discarded). The choice of 100 genomic loci as the upper limit allows for the classification of multimapper reads common to sRNA data, in particular structural RNAs like tRNAs and transposable element targeting siRNAs, while removing or reducing homopolymer or low complexity reads from downstream analysis.

Following alignment to the genome, each alignment is annotated via a sequential decision tree, where each alignment are distributed to an annotation category in order, then removed from the pool of alignments in order to facilitate priority annotation. The default order/priority of annotation is: structural RNAs, miRNAs and hairpins, exons, sense transposons, antisense transposons, introns, and ultimately annotated piRNA clusters. This annotation class priority can be re-ordered by the user to suit the application and user preferences.

An HTML output file is then created using python based Bokeh tools to visualize the abundance distributions, length distributions, and mapping logs of all small RNAs in the dataset. In conjunction with this HTML output, TEsmall compiles multiple flat text output files, including a counts file that is structured to be directly compatible with DESeq2 for differential analysis. The abundance calculations for these counts files are 1/n normalized at the end of this annotation process, where n represents the number of alignments per read, to ensure no double-counting of multimappers.

Custom Annotations

For custom genome builds, you would need the following files:

Sequences

FASTA file of genomic sequences (genome.fa), with a corresponding index generated by samtools faidx
FASTA file of rDNA sequences (rDNA.fa), with a corresponding index generated by samtools faidx
FASTA file of tDNA sequences (tDNA.fa), with a corresponding index generated by samtools faidx
Bowtie indices (bowtie-build) for the genomic sequence
Bowtie indices (bowtie-build) for the rDNA sequences
Bowtie indices (bowtie-build) for the tDNA sequences

BED files for annotation

Gene exons (exon.bed)
Gene introns (intron.bed)
Transposable elements (TE.bed)
Mature microRNA sequence (miRNA.bed)
MicroRNA hairpin (hairpin.bed)
PIWI-interacting RNA clusters (piRNA_cluster.bed)
Other small non-coding RNA, such as tRNA, snRNA (structural_RNA.bed)

Empty BED files must be provided even if there are no applicable annotations for a particular category.

If you are encountering difficulty generating the files for your genome build, you can contact us, and we will help you to the best of our abilities.

Citation

O’Neill K., Liao W.W., Patel A. and Gale-Hammell M. (2018) TEsmall Identifies Small RNAs Associated With Targeted Inhibitor Resistance in Melanoma. Front. Genet. 9: 461. Pubmed ID: 30349559

TEsmall

TEsingle

SAKE