RNAseek – Tutorials & User Guide

Getting Started

Sessions, prerequisites & quick-start

Uploading Data

Formats, metadata & genome selection

Viewing Results

Plots, downloads & advanced modules

Getting Started

No Account Required — Just Open and Go

RNAseek removes every barrier between you and your analysis. There are no usernames, no passwords, and no registration forms. The moment you open the platform in your browser, a secure, private session is created for you automatically.

Behind the scenes, RNAseek assigns your browser a cryptographically random session ID (a UUID) stored as a secure, HttpOnly cookie. This ID links your uploads, pipeline runs, and results to your browser session — and only your browser session. No one else can access your data.

Your 14-Day Session Window

Your session and all data associated with it remains active for 14 days from the moment it is created. During that window you can:

Upload new datasets and launch analyses at any time.
Return to the platform and pick up exactly where you left off.
Download any results, reports, or visualizations you have generated.

Tip: Bookmark the RNAseek URL in the same browser you used for your first visit. Your session cookie is browser-specific — opening RNAseek in a different browser or in private/incognito mode will start a brand-new session.

What You Need Before You Begin

Item	Required?	Details
Raw sequencing reads		`.fq.gz` or `.fastq.gz` files. At least two experimental groups (e.g., 3 Control + 3 Treated).
Reference genome		Select from 11 pre-indexed genomes or upload your own FASTA + GTF/GFF.
Condition mapping		A table or CSV assigning every sample to a biological group (e.g., Control vs. Treated).
Batch IDs	Optional	Include a Batch column to trigger automatic ComBat-seq batch-effect correction.
Timepoints	Optional	Include a Timepoint column to switch DESeq2 to the Likelihood Ratio Test (time-series).

Already have aligned BAM/CRAM files or a pre-computed count matrix? You can skip the alignment step entirely — see Alternative Entry Points below.

Supported Analysis Types (Assay Tracks)

RNAseek supports four core assay types. Choose the one that matches your experiment during the Setup Wizard:

Assay	Best For	Aligner	Key Tools
Standard RNA-Seq (Poly-A)	Gene expression profiling	HISAT2 (splice-aware)	featureCounts, DESeq2
Small RNA / miRNA	Regulatory RNA quantification	Bowtie (miRBase)	samtools idxstats, DESeq2
ChIP-seq	Histone / TF binding sites	BWA MEM	MACS2 peak calling, featureCounts
DNA Methylation	Bisulfite sequencing	Bismark	methylKit differential methylation

Microbial / Bacterial Transcriptomics: Upload an unannotated bacterial FASTA and RNAseek will automatically dispatch it to a local BASys2 engine that generates complete structural, operon, and metabolome annotations in roughly 10 seconds — no manual annotation required.

Quick-Start Walkthrough

Upload

Configure

Stage 1

Stage 2

Core Hub

Modules

Navigate to RNAseek in your browser — your session starts automatically.
Create a new submission from the Active Workspace page.
Upload your compressed FASTQ files (the uploader handles large files seamlessly).
Map your conditions using the interactive table or a CSV upload.
Select a reference genome from the dropdown (or upload a custom genome).
Launch the core pipeline and watch progress in real time via live progress bars.
Explore results in the Core Hub — interactive plots, downloadable tables, and 12 advanced modules.

Uploading Data

The Chunked Uploader — Built for Large Files

Genomics files are large. A single paired-end RNA-seq experiment can easily produce tens of gigabytes of compressed FASTQ data. RNAseek’s uploader is specifically engineered for this reality.

5 MB binary chunks HTTPS encrypted Auto-retry on failure No file-count limit

Your browser automatically splits each file into 5 MB binary chunks before transmission. Each chunk is sent individually over HTTPS and reassembled on the server. If a network interruption occurs mid-upload, only the affected chunk needs to be retransmitted — you do not lose the entire file. Upload files of any size, even over slower connections.

Accepted File Formats

Input Type	Accepted Formats	Notes
Raw reads	`.fq.gz`, `.fastq.gz`	Must be gzip-compressed.
Aligned reads	`.bam`, `.cram`	Skips alignment; proceeds to quantification.
Count matrix	`.csv`, `.tsv`	Rows = genes, columns = samples. Non-negative integers only.
Metadata	`.csv`	Condition mapping, batch IDs, timepoints.
Custom genome	`.fa` / `.fasta` + `.gtf` / `.gff`	Triggers on-demand HISAT2 index build. Not available for Small RNA track.

Paired-End Read Detection

If your experiment uses paired-end sequencing, name your files with the standard _R1 / _R2 convention:

SampleA_R1.fq.gz SampleA_R2.fq.gz
SampleB_R1.fq.gz SampleB_R2.fq.gz

RNAseek auto-detects paired reads from filenames. You can also manually toggle between Single-End and Paired-End mode in the Setup Wizard if your naming convention differs.

Mapping Experimental Conditions (Metadata)

After your files finish uploading, the Setup Wizard asks you to define the experimental design. You have two options:

Option A — Interactive Table

Best for small experiments. The wizard pre-populates a table with your uploaded filenames — just pick a condition from the dropdown for each sample.

Option B — CSV Upload

Best for large experiments. Prepare a .csv with Filename and Condition columns (required), plus optional Batch and Timepoint.

Example metadata CSV:

Filename	Condition	Batch (optional)	Timepoint (optional)
`WT_rep1_R1.fq.gz`	Control	Batch_1	Day 0
`WT_rep2_R1.fq.gz`	Control	Batch_1	Day 0
`WT_rep3_R1.fq.gz`	Control	Batch_2	Day 0
`KO_rep1_R1.fq.gz`	Treated	Batch_2	Day 7
`KO_rep2_R1.fq.gz`	Treated	Batch_1	Day 7
`KO_rep3_R1.fq.gz`	Treated	Batch_1	Day 7

Batch – Providing batch IDs automatically triggers ComBat-seq batch correction during normalization, removing technical sequencing noise between batches.

Timepoint – Providing timepoints switches the statistical model from the Wald test to the Likelihood Ratio Test (LRT), which is more appropriate for time-series experimental designs.

Alternative Entry Points

Not starting from raw reads? RNAseek supports two shortcut entry points:

Pre-Aligned Reads (BAM/CRAM)

Upload aligned BAM or CRAM files. RNAseek skips QC & alignment and proceeds directly to gene quantification (featureCounts) → Stage 2 normalization & DEG testing.

Count Matrix (CSV/TSV)

Upload a gene-level count matrix. Bypasses Stage 1 entirely and jumps straight to filtering, normalization, DESeq2, and visualization. Requires: rows = genes, columns = samples, non-negative integers.

Selecting a Reference Genome

Choose the reference genome that matches your organism. RNAseek ships with 11 pre-indexed genomes:

Organism	Assembly	Source
Homo sapiens	GRCh38 (hg38)	Ensembl / UCSC
Mus musculus	GRCm39 (mm39)	Ensembl / UCSC
Mus musculus	GRCm38 (mm10)	Ensembl / UCSC
Rattus norvegicus	mRatBN7.2 (rn7)	Ensembl
Danio rerio	GRCz11 (danRer11)	Ensembl
Gallus gallus	GRCg6a (galGal6)	Ensembl
Sus scrofa	Sscrofa11.1 (susScr11)	Ensembl
Drosophila melanogaster	BDGP6 (dm6)	Ensembl
C. elegans	WBcel235	Ensembl
S. cerevisiae (Yeast)	sacCer3 (R64-1-1)	Ensembl
A. thaliana	TAIR10	Ensembl

Custom Genomes: If your organism is not listed, upload a FASTA (.fa / .fasta) and annotation (.gtf / .gff). RNAseek builds a HISAT2 index on-demand before alignment begins. Available for the Standard RNA-Seq and ChIP-seq tracks. The Small RNA track requires species-specific miRBase indices and does not support custom genomes.

Bacterial Genomes: For unannotated microbial FASTA files, RNAseek automatically invokes its local BASys2 engine to produce full structural and metabolic annotations — no GTF upload needed.

Launching the Pipeline

Once your files are uploaded, metadata is mapped, and a genome is selected:

Review your configuration in the Setup Wizard summary panel.
Click Launch Pipeline.
You will be redirected to the Processing page with real-time progress bars tracking every step.

Viewing Results

Real-Time Progress Tracking

After launching the pipeline, the Processing page connects to the server via a live WebSocket connection. You will see a step-by-step progress bar updating in real time:

Quality Control

Alignment

Quantification

Normalization

Visualizations

If the WebSocket connection drops, the page automatically falls back to HTTP polling — you never lose sight of your pipeline’s progress. You can also close the tab and come back later — your results will be waiting in the Core Hub.

The Core Hub — Your Results Dashboard

When the pipeline finishes, you are taken to the Core Hub, a three-tab dashboard:

Overview

Summary statistics, QC report, downloadable data files.

Modules

12 advanced analytical micro-pipelines, unlocked instantly.

Single-Cell

Deconvolution gateway and spatial analysis spokes.

Stage 1 Results — Alignment & QC

The Overview tab provides the foundational outputs of your analysis:

File	Format	Description
Compressed Alignments	`.cram`	Deeply compressed files showing where every read mapped to the genome.
Raw Count Matrix	`.csv`	Genes × Samples matrix of raw read counts.
QC Report	`.html`	Interactive MultiQC report — Phred scores, GC content, adapter metrics, trimming stats.

Stage 2 Results — Normalization, DEG Testing & Visualizations

This is where RNAseek transforms your raw data into biological insight — fully automatically:

Low-count filtering removes genes with fewer than 10 total reads across all samples.
Batch correction (if Batch IDs provided) applies ComBat-seq to remove technical noise.
DESeq2 normalization adjusts for library size and performs differential gene expression testing with FDR-corrected p-values.
Outlier detection uses PCA-based Mahalanobis distance to flag suspect samples.
Gene annotation queries the MyGene.info API to append human-readable gene descriptions and disease associations to every gene.

Downloadable files:

File	Format	Description
Normalized Count Matrix	`.csv`	Library-size-normalized (and batch-corrected, if applicable) expression values. Ready for external tools or ML.
Differential Expression Table	`.csv`	Log2 Fold Change, p-value, adjusted p-value (FDR), gene descriptions, and disease associations.

Interactive Plotly Visualizations:

Every plot is rendered directly in your browser — zoom, pan, hover for gene names, and export as PNG or SVG.

PCA Plot (2D/3D)

Sample clustering via Principal Component Analysis. Variance explained on each axis.

UMAP Plot

Non-linear dimensionality reduction revealing structure PCA might miss.

Volcano Plot

Log2FC vs. significance. Red = upregulated, blue = downregulated. Hover for gene names.

MA Plot

Mean expression vs. Log2FC for every gene, highlighting significant DEGs.

Heatmap

Top 50 DEGs with z-score normalization and color-coded group annotations.

Tip: Click and drag on any Plotly plot to zoom into a region. Double-click to reset. Use the camera icon in the toolbar to save the plot as a high-resolution image.

Advanced Modules (Tier 2)

After the core pipeline completes, the Modules tab unlocks 12 specialized analytical micro-pipelines. These modules reuse your existing results — no re-uploading required. The tab uses a master-detail layout: browse the list on the left, configure and view results on the right.

Key modules:

WGCNA — Weighted Gene Co-expression Network Analysis

Identify clusters (modules) of co-expressed genes and correlate them with clinical traits. Upload a traits CSV or build one interactively. Outputs include module-trait correlation heatmaps, hub gene lists, and Enrichr pathway enrichment results.

Pathway & Gene Set Enrichment

Map your differentially expressed genes onto biological pathways and curated gene sets. RNAseek integrates multiple databases:

PathBank (dynamic diagrams) MSigDB Hallmark C2: KEGG, Reactome C5: GO BP/MF/CC BASys2 Microbial Pathways

All available modules:

Module	What It Does
WGCNA	Co-expression network analysis correlating gene modules to clinical traits.
Pathway Enrichment	GSEA/ORA with PathBank, KEGG, Reactome, GO, and BASys2 microbial pathways.
Alternative Splicing	Detects skipped exons and predicts protein domain changes (IsoformSwitchAnalyzeR).
RNA Editing / SNPs	Identifies A-to-I editing events and high-confidence variants (REDItools2).
Time Series	Models gene expression dynamics over time (ImpulseDE2).
Causal Networks	Infers gene regulatory networks from expression data (GRNBoost2).
Literature NLP	Mines published literature for known gene interactions (INDRA Bio).
Survival Analysis	Correlates gene expression with clinical survival outcomes (lifelines).
TCGA Comparison	Compares your data against public TCGA cancer cohorts.
Biomarker Discovery	Cross-references DEGs with the MarkerDB clinical biomarker database.
MOFA	Multi-omics factor analysis for integrating multiple data layers.
DIABLO	Supervised multi-omics integration with discriminant analysis (mixOmics).

Downloading Your Data

Every downloadable file in the Core Hub has a clearly marked download button. You can download:

Individual result files (click the download icon next to any file).
The complete differential expression table with gene annotations.
Raw and normalized count matrices for use in R, Python, or Excel.
The interactive MultiQC HTML report for sharing with collaborators.

Single-Cell & Spatial Analysis (Advanced)

The Single-Cell tab provides access to predictive deconvolution and spatial analysis tools:

Deconvolution Gateway

Select a tissue-specific single-cell reference atlas and run computational deconvolution to estimate cell-type fractions. Toggle between a quick summary and high-resolution .h5ad pseudo-cell matrix generation.

Trajectory Inference

Trace developmental or disease trajectories through predicted cell states using pseudotime analysis (scanpy / PAGA).

Spatial Mapping

Project predicted cell types onto a tissue image (generic template or your own H&E slide) to visualize where cells physically reside (Tangram).

Spatial Autocorrelation

Search for specific genes and visualize their spatial expression patterns as heatmaps overlaid on the tissue image (Moran’s I / Squidpy).

Quick Reference

Do I need an account?

No. Sessions are anonymous and start automatically when you open the platform.

How long does my data persist?

14 days from session creation. After that, all data is permanently deleted.

What file formats are accepted?

.fq.gz / .fastq.gz (reads), .bam / .cram (alignments), .csv / .tsv (count matrices).

Is there a file size limit?

The server supports uploads up to 10 GB per file. Files are chunked at 5 MB for reliability.

Can I use a custom genome?

Yes. Upload FASTA + GTF/GFF and an index will be built automatically (Standard RNA-Seq and ChIP-seq tracks).

What organisms are supported?

11 pre-indexed genomes (Human, Mouse, Rat, Zebrafish, Drosophila, C. elegans, Yeast, Arabidopsis, Chicken, Pig) plus custom uploads & on-demand bacterial annotation via BASys2.

What statistics are used?

DESeq2 for differential expression (Wald test by default; LRT for time-series). ComBat-seq for batch correction.

Are the plots interactive?

Yes. All visualizations use Plotly — zoom, pan, hover for gene names, and export as PNG/SVG.

Can I come back later?

Yes. Return in the same browser within 14 days to access all your results.

What happens after 14 days?

All session data is permanently deleted by an automated cleanup process. Download results before expiration.

RNAseek — User Guide

No Account Required — Just Open and Go

Your 14-Day Session Window

What You Need Before You Begin

Supported Analysis Types (Assay Tracks)

Quick-Start Walkthrough

The Chunked Uploader — Built for Large Files

Accepted File Formats

Paired-End Read Detection

Mapping Experimental Conditions (Metadata)

Option A — Interactive Table

Option B — CSV Upload

Alternative Entry Points

Pre-Aligned Reads (BAM/CRAM)

Count Matrix (CSV/TSV)

Selecting a Reference Genome

Launching the Pipeline

Real-Time Progress Tracking

The Core Hub — Your Results Dashboard

Overview

Modules

Single-Cell

Stage 1 Results — Alignment & QC

Stage 2 Results — Normalization, DEG Testing & Visualizations

PCA Plot (2D/3D)

UMAP Plot

Volcano Plot

MA Plot

Heatmap

Advanced Modules (Tier 2)

WGCNA — Weighted Gene Co-expression Network Analysis

Pathway & Gene Set Enrichment

Downloading Your Data

Single-Cell & Spatial Analysis (Advanced)

Deconvolution Gateway

Trajectory Inference

Spatial Mapping

Spatial Autocorrelation