Tutorials & Documentation

RNAseek — User Guide

Everything you need — from first upload to publication-ready figures
 Getting Started

No Account Required — Just Open and Go

RNAseek removes every barrier between you and your analysis. There are no usernames, no passwords, and no registration forms. The moment you open the platform in your browser, a secure, private session is created for you automatically.

Behind the scenes, RNAseek assigns your browser a cryptographically random session ID (a UUID) stored as a secure, HttpOnly cookie. This ID links your uploads, pipeline runs, and results to your browser session — and only your browser session. No one else can access your data.

Your 14-Day Session Window

Your session and all data associated with it remains active for 14 days from the moment it is created. During that window you can:

  • Upload new datasets and launch analyses at any time.
  • Return to the platform and pick up exactly where you left off.
  • Download any results, reports, or visualizations you have generated.
After 14 days the session expires and an automated janitor permanently deletes all uploaded files and results. There is no way to recover data after expiration — download everything you need before your session ends.
Tip: Bookmark the RNAseek URL in the same browser you used for your first visit. Your session cookie is browser-specific — opening RNAseek in a different browser or in private/incognito mode will start a brand-new session.

What You Need Before You Begin

Item Required? Details
Raw sequencing reads .fq.gz or .fastq.gz files. At least two experimental groups (e.g., 3 Control + 3 Treated).
Reference genome Select from 11 pre-indexed genomes or upload your own FASTA + GTF/GFF.
Condition mapping A table or CSV assigning every sample to a biological group (e.g., Control vs. Treated).
Batch IDs Optional Include a Batch column to trigger automatic ComBat-seq batch-effect correction.
Timepoints Optional Include a Timepoint column to switch DESeq2 to the Likelihood Ratio Test (time-series).
Already have aligned BAM/CRAM files or a pre-computed count matrix? You can skip the alignment step entirely — see Alternative Entry Points below.

Supported Analysis Types (Assay Tracks)

RNAseek supports four core assay types. Choose the one that matches your experiment during the Setup Wizard:

Assay Best For Aligner Key Tools
Standard RNA-Seq (Poly-A) Gene expression profiling HISAT2 (splice-aware) featureCounts, DESeq2
Small RNA / miRNA Regulatory RNA quantification Bowtie (miRBase) samtools idxstats, DESeq2
ChIP-seq Histone / TF binding sites BWA MEM MACS2 peak calling, featureCounts
DNA Methylation Bisulfite sequencing Bismark methylKit differential methylation
Microbial / Bacterial Transcriptomics: Upload an unannotated bacterial FASTA and RNAseek will automatically dispatch it to a local BASys2 engine that generates complete structural, operon, and metabolome annotations in roughly 10 seconds — no manual annotation required.

Quick-Start Walkthrough

Upload
Configure
Stage 1
Stage 2
Core Hub
Modules
  1. Navigate to RNAseek in your browser — your session starts automatically.
  2. Create a new submission from the Active Workspace page.
  3. Upload your compressed FASTQ files (the uploader handles large files seamlessly).
  4. Map your conditions using the interactive table or a CSV upload.
  5. Select a reference genome from the dropdown (or upload a custom genome).
  6. Launch the core pipeline and watch progress in real time via live progress bars.
  7. Explore results in the Core Hub — interactive plots, downloadable tables, and 12 advanced modules.

 Uploading Data

The Chunked Uploader — Built for Large Files

Genomics files are large. A single paired-end RNA-seq experiment can easily produce tens of gigabytes of compressed FASTQ data. RNAseek’s uploader is specifically engineered for this reality.

5 MB binary chunks HTTPS encrypted Auto-retry on failure No file-count limit

Your browser automatically splits each file into 5 MB binary chunks before transmission. Each chunk is sent individually over HTTPS and reassembled on the server. If a network interruption occurs mid-upload, only the affected chunk needs to be retransmitted — you do not lose the entire file. Upload files of any size, even over slower connections.

Accepted File Formats

Input Type Accepted Formats Notes
Raw reads .fq.gz, .fastq.gz Must be gzip-compressed.
Aligned reads .bam, .cram Skips alignment; proceeds to quantification.
Count matrix .csv, .tsv Rows = genes, columns = samples. Non-negative integers only.
Metadata .csv Condition mapping, batch IDs, timepoints.
Custom genome .fa / .fasta + .gtf / .gff Triggers on-demand HISAT2 index build. Not available for Small RNA track.
Uncompressed .fastq or .fq files are not accepted. Compress them first: gzip sample_R1.fastq

Paired-End Read Detection

If your experiment uses paired-end sequencing, name your files with the standard _R1 / _R2 convention:

SampleA_R1.fq.gz   SampleA_R2.fq.gz
SampleB_R1.fq.gz   SampleB_R2.fq.gz

RNAseek auto-detects paired reads from filenames. You can also manually toggle between Single-End and Paired-End mode in the Setup Wizard if your naming convention differs.

Mapping Experimental Conditions (Metadata)

After your files finish uploading, the Setup Wizard asks you to define the experimental design. You have two options:

Option A — Interactive Table

Best for small experiments. The wizard pre-populates a table with your uploaded filenames — just pick a condition from the dropdown for each sample.

Option B — CSV Upload

Best for large experiments. Prepare a .csv with Filename and Condition columns (required), plus optional Batch and Timepoint.

Example metadata CSV:

Filename Condition Batch (optional) Timepoint (optional)
WT_rep1_R1.fq.gz Control Batch_1 Day 0
WT_rep2_R1.fq.gz Control Batch_1 Day 0
WT_rep3_R1.fq.gz Control Batch_2 Day 0
KO_rep1_R1.fq.gz Treated Batch_2 Day 7
KO_rep2_R1.fq.gz Treated Batch_1 Day 7
KO_rep3_R1.fq.gz Treated Batch_1 Day 7

Batch – Providing batch IDs automatically triggers ComBat-seq batch correction during normalization, removing technical sequencing noise between batches.

Timepoint – Providing timepoints switches the statistical model from the Wald test to the Likelihood Ratio Test (LRT), which is more appropriate for time-series experimental designs.

Alternative Entry Points

Not starting from raw reads? RNAseek supports two shortcut entry points:

Pre-Aligned Reads (BAM/CRAM)

Upload aligned BAM or CRAM files. RNAseek skips QC & alignment and proceeds directly to gene quantification (featureCounts) → Stage 2 normalization & DEG testing.

Count Matrix (CSV/TSV)

Upload a gene-level count matrix. Bypasses Stage 1 entirely and jumps straight to filtering, normalization, DESeq2, and visualization. Requires: rows = genes, columns = samples, non-negative integers.

Selecting a Reference Genome

Choose the reference genome that matches your organism. RNAseek ships with 11 pre-indexed genomes:

Organism Assembly Source
Homo sapiens GRCh38 (hg38) Ensembl / UCSC
Mus musculus GRCm39 (mm39) Ensembl / UCSC
Mus musculus GRCm38 (mm10) Ensembl / UCSC
Rattus norvegicus mRatBN7.2 (rn7) Ensembl
Danio rerio GRCz11 (danRer11) Ensembl
Gallus gallus GRCg6a (galGal6) Ensembl
Sus scrofa Sscrofa11.1 (susScr11) Ensembl
Drosophila melanogaster BDGP6 (dm6) Ensembl
C. elegans WBcel235 Ensembl
S. cerevisiae (Yeast) sacCer3 (R64-1-1) Ensembl
A. thaliana TAIR10 Ensembl
Custom Genomes: If your organism is not listed, upload a FASTA (.fa / .fasta) and annotation (.gtf / .gff). RNAseek builds a HISAT2 index on-demand before alignment begins. Available for the Standard RNA-Seq and ChIP-seq tracks. The Small RNA track requires species-specific miRBase indices and does not support custom genomes.
Bacterial Genomes: For unannotated microbial FASTA files, RNAseek automatically invokes its local BASys2 engine to produce full structural and metabolic annotations — no GTF upload needed.

Launching the Pipeline

Once your files are uploaded, metadata is mapped, and a genome is selected:

  1. Review your configuration in the Setup Wizard summary panel.
  2. Click Launch Pipeline.
  3. You will be redirected to the Processing page with real-time progress bars tracking every step.

 Viewing Results

Real-Time Progress Tracking

After launching the pipeline, the Processing page connects to the server via a live WebSocket connection. You will see a step-by-step progress bar updating in real time:

Quality Control
Alignment
Quantification
Normalization
Visualizations
If the WebSocket connection drops, the page automatically falls back to HTTP polling — you never lose sight of your pipeline’s progress. You can also close the tab and come back later — your results will be waiting in the Core Hub.

The Core Hub — Your Results Dashboard

When the pipeline finishes, you are taken to the Core Hub, a three-tab dashboard:

Overview

Summary statistics, QC report, downloadable data files.

Modules

12 advanced analytical micro-pipelines, unlocked instantly.

Single-Cell

Deconvolution gateway and spatial analysis spokes.

Stage 1 Results — Alignment & QC

The Overview tab provides the foundational outputs of your analysis:

File Format Description
Compressed Alignments .cram Deeply compressed files showing where every read mapped to the genome.
Raw Count Matrix .csv Genes × Samples matrix of raw read counts.
QC Report .html Interactive MultiQC report — Phred scores, GC content, adapter metrics, trimming stats.

Stage 2 Results — Normalization, DEG Testing & Visualizations

This is where RNAseek transforms your raw data into biological insight — fully automatically:

  • Low-count filtering removes genes with fewer than 10 total reads across all samples.
  • Batch correction (if Batch IDs provided) applies ComBat-seq to remove technical noise.
  • DESeq2 normalization adjusts for library size and performs differential gene expression testing with FDR-corrected p-values.
  • Outlier detection uses PCA-based Mahalanobis distance to flag suspect samples.
  • Gene annotation queries the MyGene.info API to append human-readable gene descriptions and disease associations to every gene.

Downloadable files:

File Format Description
Normalized Count Matrix .csv Library-size-normalized (and batch-corrected, if applicable) expression values. Ready for external tools or ML.
Differential Expression Table .csv Log2 Fold Change, p-value, adjusted p-value (FDR), gene descriptions, and disease associations.

Interactive Plotly Visualizations:

Every plot is rendered directly in your browser — zoom, pan, hover for gene names, and export as PNG or SVG.

PCA Plot (2D/3D)

Sample clustering via Principal Component Analysis. Variance explained on each axis.

UMAP Plot

Non-linear dimensionality reduction revealing structure PCA might miss.

Volcano Plot

Log2FC vs. significance. Red = upregulated, blue = downregulated. Hover for gene names.

MA Plot

Mean expression vs. Log2FC for every gene, highlighting significant DEGs.

Heatmap

Top 50 DEGs with z-score normalization and color-coded group annotations.

Tip: Click and drag on any Plotly plot to zoom into a region. Double-click to reset. Use the camera icon in the toolbar to save the plot as a high-resolution image.

Advanced Modules (Tier 2)

After the core pipeline completes, the Modules tab unlocks 12 specialized analytical micro-pipelines. These modules reuse your existing results — no re-uploading required. The tab uses a master-detail layout: browse the list on the left, configure and view results on the right.

Key modules:

 WGCNA — Weighted Gene Co-expression Network Analysis

Identify clusters (modules) of co-expressed genes and correlate them with clinical traits. Upload a traits CSV or build one interactively. Outputs include module-trait correlation heatmaps, hub gene lists, and Enrichr pathway enrichment results.

 Pathway & Gene Set Enrichment

Map your differentially expressed genes onto biological pathways and curated gene sets. RNAseek integrates multiple databases:

PathBank (dynamic diagrams) MSigDB Hallmark C2: KEGG, Reactome C5: GO BP/MF/CC BASys2 Microbial Pathways

All available modules:

Module What It Does
WGCNA Co-expression network analysis correlating gene modules to clinical traits.
Pathway Enrichment GSEA/ORA with PathBank, KEGG, Reactome, GO, and BASys2 microbial pathways.
Alternative Splicing Detects skipped exons and predicts protein domain changes (IsoformSwitchAnalyzeR).
RNA Editing / SNPs Identifies A-to-I editing events and high-confidence variants (REDItools2).
Time Series Models gene expression dynamics over time (ImpulseDE2).
Causal Networks Infers gene regulatory networks from expression data (GRNBoost2).
Literature NLP Mines published literature for known gene interactions (INDRA Bio).
Survival Analysis Correlates gene expression with clinical survival outcomes (lifelines).
TCGA Comparison Compares your data against public TCGA cancer cohorts.
Biomarker Discovery Cross-references DEGs with the MarkerDB clinical biomarker database.
MOFA Multi-omics factor analysis for integrating multiple data layers.
DIABLO Supervised multi-omics integration with discriminant analysis (mixOmics).

Downloading Your Data

Every downloadable file in the Core Hub has a clearly marked download button. You can download:

  • Individual result files (click the download icon next to any file).
  • The complete differential expression table with gene annotations.
  • Raw and normalized count matrices for use in R, Python, or Excel.
  • The interactive MultiQC HTML report for sharing with collaborators.
Remember: Your session expires after 14 days. Download all files you wish to keep before the session window closes. Once expired, data is permanently and irrecoverably deleted by the automated cleanup process.

Single-Cell & Spatial Analysis (Advanced)

The Single-Cell tab provides access to predictive deconvolution and spatial analysis tools:

Deconvolution Gateway

Select a tissue-specific single-cell reference atlas and run computational deconvolution to estimate cell-type fractions. Toggle between a quick summary and high-resolution .h5ad pseudo-cell matrix generation.

Trajectory Inference

Trace developmental or disease trajectories through predicted cell states using pseudotime analysis (scanpy / PAGA).

Spatial Mapping

Project predicted cell types onto a tissue image (generic template or your own H&E slide) to visualize where cells physically reside (Tangram).

Spatial Autocorrelation

Search for specific genes and visualize their spatial expression patterns as heatmaps overlaid on the tissue image (Moran’s I / Squidpy).


 Quick Reference
Do I need an account?
No. Sessions are anonymous and start automatically when you open the platform.
How long does my data persist?
14 days from session creation. After that, all data is permanently deleted.
What file formats are accepted?
.fq.gz / .fastq.gz (reads), .bam / .cram (alignments), .csv / .tsv (count matrices).
Is there a file size limit?
The server supports uploads up to 10 GB per file. Files are chunked at 5 MB for reliability.
Can I use a custom genome?
Yes. Upload FASTA + GTF/GFF and an index will be built automatically (Standard RNA-Seq and ChIP-seq tracks).
What organisms are supported?
11 pre-indexed genomes (Human, Mouse, Rat, Zebrafish, Drosophila, C. elegans, Yeast, Arabidopsis, Chicken, Pig) plus custom uploads & on-demand bacterial annotation via BASys2.
What statistics are used?
DESeq2 for differential expression (Wald test by default; LRT for time-series). ComBat-seq for batch correction.
Are the plots interactive?
Yes. All visualizations use Plotly — zoom, pan, hover for gene names, and export as PNG/SVG.
Can I come back later?
Yes. Return in the same browser within 14 days to access all your results.
What happens after 14 days?
All session data is permanently deleted by an automated cleanup process. Download results before expiration.