

Start by testing your current knowledge of Bioinformatics Tools. Click on Start Button below.
UNDER CONSTRUCTION
Bioinformatics Tools Quiz

"Building a Bioinformatics Pipeline"?
Big idea, small steps. Expand when ready.
🧬🧪💻📊
Databases
Try this(optional):
Search for any human gene (for example: FLT1)
Just notice what type of data is shown first - DNA, RNA, or protein.
Try this(optional):
Search GenBank for FLT-1 and notice:
-how many organisms appear in the results
-whether the record is DNA or mRNA
Try this(optional):
Notice that the same gene appears here in EMBL and GenBank.
Why do you think multiple countries maintain the same data?
Goal: recognize where biological data comes from-not to analyze and master it yet.
Data Acquisition and Quality Control
Try this(optional):
FastQC is a quality control tool that provides a visual summary of high-throughput sequencing data.
FastQC uses a color system (🔴🟡🟢) to summarize quality checks. Without looking anything up yet, what do you think each color is telling you about the data?
Try this(optional):
MultiQC combines results from many tools (like FastQC) into a single summary repost.
Visit the MultiQC site and notice:
-how many different tools are supported
-how results from multiple samples are shown together
Without worrying about details yet, what do you think is the benefit of viewing all quality results in one place instead of one file at a time?
Try this(optional):
Phred scores describe how confident we are in each DNA base call.
Visit the Galaxy Quality Control tutorial and look at per-base sequence quality plot.
Without memorizing numbers, notice:
-"Higher scores appear in the green range"
-"Lower scores shift toward yellow or red"
What might a low Phred score suggest about that region of the sequence?
Goal: recognize where biological data comes from-not to analyze and master it yet.
Read Alignment / Mapping
Try this(optional):
Align short DNA sequencing reads to a reference genome.
Open the Bowtie2 tutorial page and notice:
-What kinds of input files are used (FASTQ, BAM)?
-What organism's genome is selected by default?
-What do "mapped" vs "unmapped" mean in plain language?
Try this(optional):
Align RNA-seq reads while accounting for splicing (introns).
Scroll through the HISAT2 tutorial and notice:
-Why RNA-seq alignment is different from DNA-seq
-Where "spliced" alignments are mentioned
-What kinds of results are reported (counts, alleles, alignments)
Try this(optional):
A high-performance aligner designed for large RNA-seq datasets.
STAR requires building a genome index before alignment.
Why do you think this step might speed up read mapping later?
Goal: recognize where biological data comes from-not to analyze and master it yet.
Post-Alignment Processing
Try this(optional):
A collection of tools used after alignment to add metadata and summarize alignment quality metrics.
Visit the Picard tool list and click AddOrReplaceReadGroups. Notice:
-What information is added that is not part of the DNA sequence
-Why the same reads might fail downstream analysis without this metadata
-How post-alignment steps focus on organization and validation, not discovery
Try this(optional):
High-performance tools for viewing, filtering and processing BAM/SAM files after read alignment.
Visit the Sambamba overview page and notice:
-That Sambamba works with aligned reads (BAM files), not raw sequences
-The emphasis on speed and efficiency compared to other tools
-Examples of actions like filtering, sorting, and marking duplicates
Try this(optional):
Essential utilities for working with SAM and BAM data formats.
SAMtools is one of the most commonly used tools after alignment
Explore a SAMtools tutorial and notice:
-The difference between SAM and BAM files
-Why indexing a BAM file is important
-How summary statistics help assess alignment quality
Goal: recognize where biological data comes from-not to analyze and master it yet.
Variant Calling or Quantification
Try this(optional):
A toolkit for identifying genetic variants (such as SNPs and small insertions/deletions) from aligned sequencing data.
Visit the GATK homepage and notice:
-The GATK starts after alignment (it uses BAM files)
-The difference between variant calling and read mapping
-How this step answers the question:
"How does this sample differ from the reference genome?"
Try this(optional):
A Python-based tool for counting how many sequencing reads overlap genes or genomic features, commonly used in RNA-seq analysis.
Explore the HTseq tour and notice:
-That the output is counts, not sequences
-That HTseq requires a gene annotation file (GTF/GFF)
-How counting reads help answer:
"Which genes are more or less active?"
Try this(optional):
Tools for identifying enriched genomic regions and visualizing signal patterns in sequencing experiments such as ChIP-seq or ATAC-seq.
Visit the MACS documentation and notice:
-That the goal is to find regions (peaks), not variants
-How signal strength across the genome is analyzed
-Why this type of analysis focuses on patterns, not individual bases
Goal: recognize where biological data comes from-not to analyze and master it yet.
Statistical Analysis and Biological Interpretation
Try this(optional):
DESeq2 is a statistical method for identifying genes that are expressed differently between conditions.
Visit the DESeq2 workflow and notice:
-That the input is a count matrix, not raw reads
-How samples are grouped and compared
-How results are visualized using plots
"How do statistical models turn raw counts into biological insight?"
Try this(optional):
edgeR is a statistical framework used to compare gene expression across experimental conditions.
Visit the RNA-seq lab and notice:
-How experimental groups and contrasts are defined
-That normalization and dispersion are handled before testing
-How statistical results relate to biological questions
"How do we decide whether observed differences are real or just noise?"
Try this(optional):
Bioconductor is an open-source ecosystem for analyzing and interpreting genomic data in R.
Explore the RNA-seq lessons and notice:
-How statistical results are linked to biological meaning
-That interpretation often involves pathways, gene sets, and patterns
-How multiple tools work together as part of an analysis ecosystem
"Why does biological interpretation require more than a single statistical test?"
Goal: recognize where biological data comes from-not to analyze and master it yet.

Playtime
Choose the question that best matches your goal.
