TEtranscripts

TEtranscripts is a software package that uses both unambiguously (uniquely) and ambiguously (multi-) mapped reads to perform differential expression analyses from high throughput RNA sequencing experiments. Currently, most gene expression analysis packages are not optimized for handling the complexities involved in quantifying highly repetitive regions of the genome, especially transposable elements (TE), from short sequencing reads. Although transposable elements make up between 20 to 80% of many eukaryotic genomes and contribute significantly to the cellular transcriptome, the difficulty in quantifying their abundance from high throughput sequencing experiments has led them to be largely ignored in most studies.

TEtranscripts improves the recovery of TE transcripts from RNA-Seq experiments and can also be used for general gene/TE abundance counting in genomics assays.

If you encounter any issues or have any questions about TEtranscripts, please check out our Github page.

Download instructions

You can download the software package from PyPi and GitHub. The transposable element GTF files required by TEtranscripts (see tool description below) and example data files (BAM) are available at this location.

Tool Description

The two tools, TEtranscripts and TEcount, quantify both gene and transposable element (TE) transcript abundances from RNA-Seq experiments, utilizing both uniquely and ambiguously mapped short read sequences. It processes the short reads alignments (BAM files) and proportionally assigns read counts to the corresponding gene or TE based on the user-provided annotation files (GTF files). In addition, TEtranscripts combines multiple libraries and perform differential analysis using DESeq2.

GTF files for gene annotation can be obtained from UCSC RefSeq, Ensembl, iGenomes or other annotation databases. GTF files for TE annotations are customized versions of the annotation from UCSC RepeatMasker or other TE databases. They contain two custom attributes, class_id and family_id, corresponding to the class (e.g. LINE) and family (e.g. L1) of the corresponding transposable element. A unique ID (e.g. L1Md_Gf_dup1) is also assigned for each TE annotation in the transcript_id attribute. Pre-generated TE GTF files are available for a number of organisms, and can be downloaded here. If the organism or genome build of your interest is not available, please contact us and provide a curated annotation of the transposable elements (e.g. genomic location and TE name/type). We will do our best to help you generate a suitable TE GTF file.

Citation

Jin Y, Tam OH, Paniagua E, Hammell M. (2015) TEtranscripts: a package for including transposable elements in differential expression analysis of RNA-seq datasets. Bioinformatics. 31(22):3593-9. Pubmed ID: 26206304

TEtranscripts

DANCer