NeuroLINCS Project

NeuroLINCS Project

The NeuroLINCS Project is part of the NIH Common Fund’s Library of Integrated Network-based Cellular Signatures (LINCS) program, which aims to characterize how a variety of human cells, tissues and entire organism respond to perturbations by drugs and other molecular factors. As Part of the LINCS program, the NeuroLINCS study concentrates on human brain cells, which are far less understood than other cells in the body. Our initial focus is to produce diseased motor neurons from patients by utilizing high-quality induced pluripotent stem cell (iPSC) lines from Amyotrophic Lateral Sclerosis (ALS) and Spinal Muscular Atrophy (SMA) patients in addition to unaffected normal healthy controls. Using state-of-the-art OMICS methods (genomics, epigenomics, transcriptomics, and proteomics), we intend to create a wealth of cellular data that is patient-specific in the context of their baseline genetic perturbations and in the presence of other genetic and environmental perturbagens (e.g. endoplasmic reticulum stress). The primary data will be used to build cell signatures that convey the key features that distinguish the state of a cell and determine its behavior. Ultimately, the analysis of these datasets will lead to the identification of a network of unique signatures relevant to each of these motor neuron diseases.

Getting You Started

Here are some information and videos that will help you get started using Galaxy and the NeuroLINCS pipelines. You can use these workflows/pipelines with your own data or rerun the NeuroLINCS Data found on dbGap and LINCSproject. Links to data provided below.

Galaxy Resources

Introduction To Galaxy Learn what Galaxy is and how you can use it
Learning Resources A collection of videos to help you learn how
Get Data: Upload File How to upload your Data
Upload Data From SRA How to upload data from SRA
NeuroLINCS Website NeuroLINCS website contains information on the project, including technologies, data, and tools developed and used by the team
NeuroLINCS Data Summary Data page on NeuroLINCS website showing summary of experiments and links to data
NeuroLINCS Raw Data The NCBI data base of Genotypes and Phenotypes study that hosts the NeuroLINCS raw data files for ATAC-Seq, RNA-Seq and whole genome sequences
NeuroLINCS Raw Protein Data Chorus Project site that hosts the raw data files for SWATH proteomic assay. Note, you need to sign into Chorus in order to access the files
NeuroLINCS Processed Data LINCSproject.org datasets for NeuroLINCS
  • Experiment 1: ATAC-Seq, RNA-Seq and proteomics were carried out on samples obtained from induced Pluripotnent Stem Cells (iPSC) cell lines. These lines were derived from ALS, SMA and Control (unaffected) individuals (three of each).
    • ATAC-Seq
    • RNA-Seq
    • Proteomics - NOTE: We do not have a proteomic workflow in Galaxy but you can still access the data for analysis using your tools.
  • Experiment 2: ATAC-Seq, RNA-Seq and proteomics were carried out on samples obtained from motor neuron lines generated from subject induced pluripotent stem cell (iPSC) lines.
    • ATAC-Seq
    • RNA-Seq
    • Proteomics - NOTE: We do not have a proteomic workflow in Galaxy but you can still access the data for analysis using your tools.

Analyzing data using the pipelines/workflow

RNA-Seq Workflow

  1. Use RNA-Seq Step 1 ‘Secondary Analysis’ workflow below to generate the count matrix (level 3 data) for all samples using raw fastq files.
  2. If technical or growth replicates are present, use the Rcode to generate the differentially expressed genes (level 4 data). If not, use the RNA-Seq Step 2 ‘Statistical Analysis of Gene Expression’ workflow below to generate the differentially expressed gene list.

ATAC-Seq Workflow

The ATAC pipeline on galaxy will generate BAM files from bowtie2 alignment and narrowPeak files from MACS2 peak calling.

  1. Download the BAM and narrowPeak files to your computer or server. BAM files are very large so if possible, use an FTP client to transfer them to their destination.
  2. Create a sample sheet for use with DiffBind using this example as a template. For each sample, provide the path to the BAM file in the “bamReads” column of the sample sheet and the path to the narrowPeak file in the “Peaks” column.
  3. Fill out the “PeakCaller” section with “macs” for all samples.
  4. The “SampleID”, “Tissue”, “Factor”, “Condition”, “Treatment”, and “Replicate” column should be filled out with appropriate values for each sample although only the “SampleID” column is required; the other columns provide information to make convenient comparisons in DiffBind.
  5. Once the sample sheet has been filled out, you can start using the DiffBind ATAC R vignette here to analyze your data.

Workflows

The workflows described below are used for primary analysis of NeuroLINCS cell line data.

Web-based Pipeline For Differential Gene Expression Analysis (RNA-Seq)

NeuroLINCs Transcriptomics Center, UC Irvine

The workflows describes a standard analysis of bulk RNA-seq analysis. For a schematic of the pipeline, click here.

For more information on NeuroLINCS, click here.

Step 1. Secondary Analysis

Galaxy Workflow | imported: fraenkel_ATAC_batch_experimental_paired (for in house usage)

Step 1: Input dataset collection

input
select at runtime

Step 2: Input dataset

encode blacklist regions
select at runtime

Step 3: Trimmomatic

Single-end or paired-end reads?
Paired-end (as collection)
Select FASTQ dataset collection with R1/R2 pair
Output dataset 'output' from step 1
Perform initial ILLUMINACLIP step?
False
Trimmomatic Operations
Trimmomatic Operation 1  Select Trimmomatic operation to perform
 Cut bases off the start of a read, if below a threshold quality (LEADING)
 Minimum quality required to keep a base
 15
 Trimmomatic Operation 2
 Select Trimmomatic operation to perform
 Cut bases off the end of a read, if below a threshold quality (TRAILING)
 Minimum quality required to keep a base
 15

Step 4: FastQC

Short read data from your current history
Output dataset 'fastq_out_paired' from step 3
Contaminant list
select at runtime
Submodule and Limit specifing file
select at runtime

Step 5: Bowtie2

Is this single or paired library
Paired-end Dataset Collection
FASTQ Paired Dataset
Output dataset 'fastq_out_paired' from step 3
Write unaligned reads (in fastq format) to separate file(s)
False
Write aligned reads (in fastq format) to separate file(s)
False
Do you want to set paired-end options?
No
Will you select a reference genome from your history or use a built-in index?
Use a built-in genome index
Select reference genome
hg19
Set read groups information?
Do not set
Select analysis mode
1: Default setting only
Do you want to use presets?
No, just use defaults
Save the bowtie2 mapping statistics to the history
True

Step 6: BAM filter

Select BAM dataset
Output dataset 'output' from step 5
Remove reads that are smaller than
Not available.
Remove reads that are larger than
Not available.
Keep only mapped reads
True
Keep only unmapped reads
False
Keep only properly paired reads
True
Discard properly paired reads
False
Remove reads that match the mask
Empty.
Remove reads that have the same sequence
-1
Remove reads that start at the same position
False
Remove reads with that many mismatches
Not available.
Remove secondary alignment reads
True
Remove reads that do not pass the quality control
False
Remove reads that are marked as PCR dupicates
False
Remove reads that are in any of the regions
select at runtime
Remove reads that are NOT any of the regions
select at runtime
Strand information from BED file is ignored
False
Exclude reads NOT mapped to a reference
Empty.
Exclude reads mapped to a particular reference
chrM
Filter by maximum mismatch ratio
Not available.

Step 7: Sort

BAM File
Output dataset 'outfile' from step 6
Sort by
Chromosomal coordinates

Step 8: MarkDuplicates

Select SAM/BAM dataset or dataset collection
Output dataset 'output1' from step 7
Comments
If true do not write duplicates to the output file instead of writing them with appropriate flags set
True
Assume the input file is already sorted
True
The scoring strategy for choosing the non-duplicate among candidates
SUM_OF_BASE_QUALITIES
Regular expression that can be used in unusual situations to parse non-standard read names in the incoming SAM/BAM dataset
[a-zA-Z0-9]+:[0-9]:([0-9]+):([0-9]+):([0-9]+).*.
The maximum offset between two duplicte clusters in order to consider them optical duplicates
100
Barcode Tag
Empty.
Select validation stringency
Lenient

Step 9: bamCoverage

BAM/CRAM file
Output dataset 'outFile' from step 8
Bin size in bases
50
Scaling/Normalization method
Normalize to reads per kilobase per million (RPKM)
Coverage file format
bigwig
Region of the genome to limit the operation to
Empty.
Show advanced options
no

Step 10: MACS2 callpeak

Are you pooling Treatment Files?
No
ChIP-Seq Treatment File
select at runtime
Do you have a Control File?
No
Format of Input Files
BAM
Effective genome size
H. sapiens (2.7e9)
Build Model
Build the shifting model
Set lower mfold bound
5
Set upper mfold bound
50
Band width for picking regions to compute fragment size
300
Peak detection based on
q-value
Minimum FDR (q-value) cutoff for peak detection
0.05
Additional Outputs
Peaks as tabular file (compatible wih MultiQC)
Advanced Options:
 When set, scale the small sample up to the bigger sample
 False
 Use fixed background lambda as local lambda for every peak region
 False
 Save signal per million reads for fragment pileup profiles
 False
 When set, use a custom scaling ratio of ChIP/control (e.g. calculated using NCIS) for linear scaling
 1.0
 The small nearby region in basepairs to calculate dynamic lambda
 1000
 The large nearby region in basepairs to calculate dynamic lambda
 10000
 Composite broad regions
 No broad regions
 Use a more sophisticated signal processing approach to find subpeak summits in each enriched peak region
 False
 How many duplicate tags at the exact same location are allowed?
 1

Step 11: multiBigwigSummary

Sample order matters
No
Bigwig files
Output dataset 'outFileName' from step 9
Choose computation mode
Bins
Bin size in bp
10000
Distance between bins
0
Region of the genome to limit the operation to
Empty.
Save raw counts (scores) to file
True
Show advanced options
no

Step 12: Intersect intervals

File A to intersect with B
Output dataset 'output_narrowpeaks' from step 10
Combined or separate output files
One output file per 'input B' file
File(s) B to intersect with A
select at runtime
Calculation based on strandedness?
Overlaps on either strand
What should be written to the output file?
Write the original entry in A for each overlap (-wa)
Treat split/spliced BAM or BED12 entries as distinct BED intervals when computing coverage.
False
Minimum overlap required as a fraction of the BAM alignment
Empty.
Require that the fraction of overlap be reciprocal for A and B
False
Report only those alignments that **do not** overlap with file(s) B
True
Write the original A entry _once_ if _any_ overlaps found in B.
False
For each entry in A, report the number of overlaps with B.
False
Print the header from the A file prior to results
False

Step 13: plotPCA

Matrix file from the multiBamSummary or multiBigwigSummary tools
Output dataset 'outFile' from step 11
Image file format
pdf
Title of the plot
Empty.
Save the matrix of PCA and eigenvalues underlying the plot.
False
Show advanced options
no

Step 14: plotCorrelation

Matrix file from the multiBamSummary tool
Output dataset 'outFile' from step 11
Correlation method
Spearman
Plotting type
Heatmap
Minimum value for the heatmap intensities
Empty.
Maximum value for the heatmap intensities
Empty.
Color map to use for the heatmap
RdYlBu
Title of the plot
Empty.
Plot the correlation value
True
Plot height
9.5
Plot width
11.0
Skip zeroes
False
Image file format
pdf
Remove regions with very large counts
True
Save the matrix of values underlying the heatmap
False

Step 15: BED-to-bigBed

Convert
Output dataset 'output' from step 12
Converter settings to use
Full parameter list
Items to bundle in r-tree
256
Data points bundled at lowest level
512
Do not use compression
False

Step 16: computeMatrix

Select regions
 Select regions 1
 Regions to plot
 Output dataset 'output' from step 12
Sample order matters
Yes
Score files
 Score files 1
 Score file
 Output dataset 'outFileName' from step 9
computeMatrix has two main output options
reference-point
The reference point for the plotting
center of region
Discard any values after the region end
False
Distance upstream of the start site of the regions defined in the region file
1000
Distance downstream of the end site of the given regions
1000
Show advanced output settings
no
Show advanced options
yes
Length, in bases, of non-overlapping bins used for averaging the score over the regions length
50
Sort regions
maintain the same ordering as the input files
Method used for sorting
mean
Define the type of statistic that should be displayed.
mean
Convert missing values to 0?
False
Skip zeros
False
Minimum threshold
Not available.
Maximum threshold
Not available.
Scaling factor
Not available.
Labels for the samples (each bigwig)
Empty.
Use a metagene model
False
trascript designator
transcript
exon designator
exon
transcriptID key designator
transcript_id
Blacklisted regions in BED/GTF format
select at runtime

Step 17: plotHeatmap

Matrix file from the computeMatrix tool
Output dataset 'outFileName' from step 16
Show advanced output settings
no
Show advanced options
no

Step 2. Statistical Analysis of gene expression

This step uses DESeq2 standard workflow to test differential expression across two groups, e.g. control vs. ALS.

Galaxy Workflow | imported: fraenkel_ATAC_batch_experimental (for in house usage)

Step 1: Input dataset collection

Input FASTQs
select at runtime

Step 2: Input dataset

Naked DNA File
select at runtime

Step 3: Input dataset

encode blacklist regions
select at runtime

Step 4: Trimmomatic

Single-end or paired-end reads?
Single-end
Input FASTQ file
Output dataset 'output' from step 1
Perform initial ILLUMINACLIP step?
False
Trimmomatic Operations
 Trimmomatic Operation 1
 Select Trimmomatic operation to perform
 Cut bases off the start of a read, if below a threshold quality (LEADING)
 Minimum quality required to keep a base
 15
 Trimmomatic Operation 2
 Select Trimmomatic operation to perform
 Cut bases off the end of a read, if below a threshold quality (TRAILING)
Minimum quality required to keep a base
 15

Step 5: Bowtie2

Is this single or paired library
Single-end
FASTA/Q file
Output dataset 'fastq_out' from step 4
Write unaligned reads (in fastq format) to separate file(s)
False
Write aligned reads (in fastq format) to separate file(s)
False
Will you select a reference genome from your history or use a built-in index?
Use a built-in genome index
Select reference genome
hg19
Set read groups information?
Do not set
Select analysis mode
1: Default setting only
Do you want to use presets?
No, just use defaults
Save the bowtie2 mapping statistics to the history
True

Step 6: BAM filter

Select BAM dataset
Output dataset 'output' from step 5
Remove reads that are smaller than
Not available.
Remove reads that are larger than
Not available.
Keep only mapped reads
True
Keep only unmapped reads
False
Keep only properly paired reads
False
Discard properly paired reads
False
Remove reads that match the mask
Empty.
Remove reads that have the same sequence
-1
Remove reads that start at the same position
False
Remove reads with that many mismatches
Not available.
Remove secondary alignment reads
True
Remove reads that do not pass the quality control
False
Remove reads that are marked as PCR dupicates
False
Remove reads that are in any of the regions
select at runtime
Remove reads that are NOT any of the regions
select at runtime
Strand information from BED file is ignored
False
Exclude reads NOT mapped to a reference
Empty.
Exclude reads mapped to a particular reference
chrM
Filter by maximum mismatch ratio
Not available.

Step 7: Sort

BAM File
Output dataset 'outfile' from step 6
Sort by
Chromosomal coordinates

Step 8: MarkDuplicates

Select SAM/BAM dataset or dataset collection
Output dataset 'output1' from step 7
Comments
If true do not write duplicates to the output file instead of writing them with appropriate flags set
True
Assume the input file is already sorted
True
The scoring strategy for choosing the non-duplicate among candidates
SUM_OF_BASE_QUALITIES
Regular expression that can be used in unusual situations to parse non-standard read names in the incoming SAM/BAM dataset
[a-zA-Z0-9]+:[0-9]:([0-9]+):([0-9]+):([0-9]+).*.
The maximum offset between two duplicte clusters in order to consider them optical duplicates
100
Barcode Tag
Empty.
Select validation stringency
Lenient

Step 9: bamCoverage

BAM/CRAM file
Output dataset 'outFile' from step 8
Bin size in bases
50
Scaling/Normalization method
Normalize to reads per kilobase per million (RPKM)
Coverage file format
bigwig
Region of the genome to limit the operation to
Empty.
Show advanced options
no

Step 10: MACS2 callpeak

Are you pooling Treatment Files?
No
ChIP-Seq Treatment File
select at runtime
Do you have a Control File?
No
Format of Input Files
Single-end BAM
Effective genome size
H. sapiens (2.7e9)
Build Model
Do not build the shifting model (--nomodel)
Set extension size
200
Set shift size
-100
Peak detection based on
q-value
Minimum FDR (q-value) cutoff for peak detection
0.01
Additional Outputs
Peaks as tabular file (compatible wih MultiQC) Peak summits Scores in bedGraph files (--bdg) Summary page (html) Plot in PDF (only available if a model is created and if BAMPE is not used)
Advanced Options:
 When set, scale the small sample up to the bigger sample
 False
 Use fixed background lambda as local lambda for every peak region
 False
 Save signal per million reads for fragment pileup profiles
 False
 When set, use a custom scaling ratio of ChIP/control (e.g. calculated using NCIS) for linear scaling
 1.0
 The small nearby region in basepairs to calculate dynamic lambda
 1000
 The large nearby region in basepairs to calculate dynamic lambda
 10000
 Composite broad regions
 No broad regions
 Use a more sophisticated signal processing approach to find subpeak summits in each enriched peak region
 True
 How many duplicate tags at the exact same location are allowed?
 1

Step 11: multiBigwigSummary

Sample order matters
No
Bigwig files
Output dataset 'outFileName' from step 9
Choose computation mode
Bins
Bin size in bp
10000
Distance between bins
0
Region of the genome to limit the operation to
Empty.
Save raw counts (scores) to file
True
Show advanced options
no

Step 12: Intersect intervals

File A to intersect with B
Output dataset 'output_narrowpeaks' from step 10
Combined or separate output files
One output file per 'input B' file
File(s) B to intersect with A
select at runtime
Calculation based on strandedness?
Overlaps on either strand
What should be written to the output file?
Write the original entry in A for each overlap (-wa)
Treat split/spliced BAM or BED12 entries as distinct BED intervals when computing coverage.
False
Minimum overlap required as a fraction of the BAM alignment
Empty.
Require that the fraction of overlap be reciprocal for A and B
False
Report only those alignments that **do not** overlap with file(s) B
True
Write the original A entry _once_ if _any_ overlaps found in B.
False
For each entry in A, report the number of overlaps with B.
False
Print the header from the A file prior to results
False

Step 13: plotPCA

Matrix file from the multiBamSummary or multiBigwigSummary tools
Output dataset 'outFile' from step 11
Image file format
pdf
Title of the plot
Empty.
Save the matrix of PCA and eigenvalues underlying the plot.
False
Show advanced options
no

Step 14: plotCorrelation

Matrix file from the multiBamSummary tool
Output dataset 'outFile' from step 11
Correlation method
Spearman
Plotting type
Heatmap
Minimum value for the heatmap intensities
Empty.
Maximum value for the heatmap intensities
Empty.
Color map to use for the heatmap
RdYlBu
Title of the plot
Empty.
Plot the correlation value
True
Plot height
9.5
Plot width
11.0
Skip zeros
False
Image file format
pdf
Remove regions with very large counts
True
Save the matrix of values underlying the heatmap
False

Step 15: BED-to-bigBed

Convert
Output dataset 'output' from step 12
Converter settings to use
Full parameter list
Items to bundle in r-tree
256
Data points bundled at lowest level
512
Do not use compression
False

Step 16: computeMatrix

Select regions
 Select regions 1
 Regions to plot
 Output dataset 'output' from step 12
Sample order matters
Yes
Score files
 Score files 1
 Score file
 Output dataset 'outFileName' from step 9
computeMatrix has two main output options
reference-point
The reference point for the plotting
center of region
Discard any values after the region end
False
Distance upstream of the start site of the regions defined in the region file
1000
Distance downstream of the end site of the given regions
1000
Show advanced output settings
no
Show advanced options
yes
Length, in bases, of non-overlapping bins used for averaging the score over the regions length
50
Sort regions
maintain the same ordering as the input files
Method used for sorting
mean
Define the type of statistic that should be displayed.
mean
Convert missing values to 0?
False
Skip zeros
False
Minimum threshold
Not available.
Maximum threshold
Not available.
Scaling factor
Not available.
Labels for the samples (each bigwig)
Empty.
Use a metagene model
False
trascript designator
transcript
exon designator
exon
transcriptID key designator
transcript_id
Blacklisted regions in BED/GTF format
select at runtime

Step 17: plotHeatmap

Matrix file from the computeMatrix tool
Output dataset 'outFileName' from step 16
Show advanced output settings
no
Show advanced options
no


For more information regarding DESeq2, please visit this page.

Web-based Pipeline For Assay for Transposase-Accessible Chromatin followed by sequencing (ATAC-Seq)

NeuroLINCs Epigenomics Center, MIT

Assay overview

The ATAC-seq experiment provides genome-wide profiles of chromatin accessibility. Briefly, the ATAC-seq method works as follows: loaded transposase inserts sequencing primers into open chromatin sites across the genome, and reads are then sequenced. The ends of the reads mark open chromatin sites. The ATAC-seq pipeline is used for statistical signal processing of short-read sequencing data and quality control, producing alignments and measures of enrichment. In its current form, it is a prototype and will likely undergo substantial change within the next year.

ATAC Pipeline

Our ATAC pipeline takes in BAM files containing aligned reads and outputs peaks and peak annotations.

The first step in the pipeline is to remove all reads mapped to mitochondrial DNA from the BAM file. Since we observe 30-60% mitochondrial contamination for NeuroLINCS samples, removing mitochondrial reads will remove considerable noise from downstream analysis. Afterward, peak calling is performed on BAM using MACS2 with the following parameters: –format BAM –gsize hs –qvalue .05. We have prepared a background bam file for MACS2 peak calling by extracting naked genomic DNA from iMNS, performing ATAC on the genomic DNA, and sequencing the resulting library. Peak annotation is performed using the script “map_peaks_to_known_genes.py” from the ChipSeqUtil package; we map genes to peaks within a window of +/- 10kb. Bigwig files for the reads and BigBed files for the peaks are generated for visualization of data on a genome browser.

Galaxy Workflow | imported: fraenkel_ATAC_batch_experimental_paired (for in house usage)

Step 1: Input dataset collection

input
select at runtime

Step 2: Input dataset

encode blacklist regions
select at runtime

Step 3: Trimmomatic

Single-end or paired-end reads?
Paired-end (as collection)
Select FASTQ dataset collection with R1/R2 pair
Output dataset 'output' from step 1
Perform initial ILLUMINACLIP step?
False
Trimmomatic Operations
Trimmomatic Operation 1  Select Trimmomatic operation to perform
 Cut bases off the start of a read, if below a threshold quality (LEADING)
 Minimum quality required to keep a base
 15
 Trimmomatic Operation 2
 Select Trimmomatic operation to perform
 Cut bases off the end of a read, if below a threshold quality (TRAILING)
 Minimum quality required to keep a base
 15

Step 4: FastQC

Short read data from your current history
Output dataset 'fastq_out_paired' from step 3
Contaminant list
select at runtime
Submodule and Limit specifing file
select at runtime

Step 5: Bowtie2

Is this single or paired library
Paired-end Dataset Collection
FASTQ Paired Dataset
Output dataset 'fastq_out_paired' from step 3
Write unaligned reads (in fastq format) to separate file(s)
False
Write aligned reads (in fastq format) to separate file(s)
False
Do you want to set paired-end options?
No
Will you select a reference genome from your history or use a built-in index?
Use a built-in genome index
Select reference genome
hg19
Set read groups information?
Do not set
Select analysis mode
1: Default setting only
Do you want to use presets?
No, just use defaults
Save the bowtie2 mapping statistics to the history
True

Step 6: BAM filter

Select BAM dataset
Output dataset 'output' from step 5
Remove reads that are smaller than
Not available.
Remove reads that are larger than
Not available.
Keep only mapped reads
True
Keep only unmapped reads
False
Keep only properly paired reads
True
Discard properly paired reads
False
Remove reads that match the mask
Empty.
Remove reads that have the same sequence
-1
Remove reads that start at the same position
False
Remove reads with that many mismatches
Not available.
Remove secondary alignment reads
True
Remove reads that do not pass the quality control
False
Remove reads that are marked as PCR dupicates
False
Remove reads that are in any of the regions
select at runtime
Remove reads that are NOT any of the regions
select at runtime
Strand information from BED file is ignored
False
Exclude reads NOT mapped to a reference
Empty.
Exclude reads mapped to a particular reference
chrM
Filter by maximum mismatch ratio
Not available.

Step 7: Sort

BAM File
Output dataset 'outfile' from step 6
Sort by
Chromosomal coordinates

Step 8: MarkDuplicates

Select SAM/BAM dataset or dataset collection
Output dataset 'output1' from step 7
Comments
If true do not write duplicates to the output file instead of writing them with appropriate flags set
True
Assume the input file is already sorted
True
The scoring strategy for choosing the non-duplicate among candidates
SUM_OF_BASE_QUALITIES
Regular expression that can be used in unusual situations to parse non-standard read names in the incoming SAM/BAM dataset
[a-zA-Z0-9]+:[0-9]:([0-9]+):([0-9]+):([0-9]+).*.
The maximum offset between two duplicte clusters in order to consider them optical duplicates
100
Barcode Tag
Empty.
Select validation stringency
Lenient

Step 9: bamCoverage

BAM/CRAM file
Output dataset 'outFile' from step 8
Bin size in bases
50
Scaling/Normalization method
Normalize to reads per kilobase per million (RPKM)
Coverage file format
bigwig
Region of the genome to limit the operation to
Empty.
Show advanced options
no

Step 10: MACS2 callpeak

Are you pooling Treatment Files?
No
ChIP-Seq Treatment File
select at runtime
Do you have a Control File?
No
Format of Input Files
BAM
Effective genome size
H. sapiens (2.7e9)
Build Model
Build the shifting model
Set lower mfold bound
5
Set upper mfold bound
50
Band width for picking regions to compute fragment size
300
Peak detection based on
q-value
Minimum FDR (q-value) cutoff for peak detection
0.05
Additional Outputs
Peaks as tabular file (compatible wih MultiQC)
Advanced Options:
 When set, scale the small sample up to the bigger sample
 False
 Use fixed background lambda as local lambda for every peak region
 False
 Save signal per million reads for fragment pileup profiles
 False
 When set, use a custom scaling ratio of ChIP/control (e.g. calculated using NCIS) for linear scaling
 1.0
 The small nearby region in basepairs to calculate dynamic lambda
 1000
 The large nearby region in basepairs to calculate dynamic lambda
 10000
 Composite broad regions
 No broad regions
 Use a more sophisticated signal processing approach to find subpeak summits in each enriched peak region
 False
 How many duplicate tags at the exact same location are allowed?
 1

Step 11: multiBigwigSummary

Sample order matters
No
Bigwig files
Output dataset 'outFileName' from step 9
Choose computation mode
Bins
Bin size in bp
10000
Distance between bins
0
Region of the genome to limit the operation to
Empty.
Save raw counts (scores) to file
True
Show advanced options
no

Step 12: Intersect intervals

File A to intersect with B
Output dataset 'output_narrowpeaks' from step 10
Combined or separate output files
One output file per 'input B' file
File(s) B to intersect with A
select at runtime
Calculation based on strandedness?
Overlaps on either strand
What should be written to the output file?
Write the original entry in A for each overlap (-wa)
Treat split/spliced BAM or BED12 entries as distinct BED intervals when computing coverage.
False
Minimum overlap required as a fraction of the BAM alignment
Empty.
Require that the fraction of overlap be reciprocal for A and B
False
Report only those alignments that **do not** overlap with file(s) B
True
Write the original A entry _once_ if _any_ overlaps found in B.
False
For each entry in A, report the number of overlaps with B.
False
Print the header from the A file prior to results
False

Step 13: plotPCA

Matrix file from the multiBamSummary or multiBigwigSummary tools
Output dataset 'outFile' from step 11
Image file format
pdf
Title of the plot
Empty.
Save the matrix of PCA and eigenvalues underlying the plot.
False
Show advanced options
no

Step 14: plotCorrelation

Matrix file from the multiBamSummary tool
Output dataset 'outFile' from step 11
Correlation method
Spearman
Plotting type
Heatmap
Minimum value for the heatmap intensities
Empty.
Maximum value for the heatmap intensities
Empty.
Color map to use for the heatmap
RdYlBu
Title of the plot
Empty.
Plot the correlation value
True
Plot height
9.5
Plot width
11.0
Skip zeroes
False
Image file format
pdf
Remove regions with very large counts
True
Save the matrix of values underlying the heatmap
False

Step 15: BED-to-bigBed

Convert
Output dataset 'output' from step 12
Converter settings to use
Full parameter list
Items to bundle in r-tree
256
Data points bundled at lowest level
512
Do not use compression
False

Step 16: computeMatrix

Select regions
 Select regions 1
 Regions to plot
 Output dataset 'output' from step 12
Sample order matters
Yes
Score files
 Score files 1
 Score file
 Output dataset 'outFileName' from step 9
computeMatrix has two main output options
reference-point
The reference point for the plotting
center of region
Discard any values after the region end
False
Distance upstream of the start site of the regions defined in the region file
1000
Distance downstream of the end site of the given regions
1000
Show advanced output settings
no
Show advanced options
yes
Length, in bases, of non-overlapping bins used for averaging the score over the regions length
50
Sort regions
maintain the same ordering as the input files
Method used for sorting
mean
Define the type of statistic that should be displayed.
mean
Convert missing values to 0?
False
Skip zeros
False
Minimum threshold
Not available.
Maximum threshold
Not available.
Scaling factor
Not available.
Labels for the samples (each bigwig)
Empty.
Use a metagene model
False
trascript designator
transcript
exon designator
exon
transcriptID key designator
transcript_id
Blacklisted regions in BED/GTF format
select at runtime

Step 17: plotHeatmap

Matrix file from the computeMatrix tool
Output dataset 'outFileName' from step 16
Show advanced output settings
no
Show advanced options
no