Skip to content

Commit c25c65b

Browse files
authored
Merge pull request #1389 from deeptools/filtering
MNase filtering
2 parents 735aca8 + dd6be5d commit c25c65b

13 files changed

Lines changed: 891 additions & 678 deletions

deeptools4.0.0_changes.md

Lines changed: 0 additions & 42 deletions
This file was deleted.

docs/content/changelog.rst

Lines changed: 22 additions & 93 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
11
Changes in deepTools4.0
22
=======================
33

4-
Changes:
4+
Plotting
5+
--------
56

67
* Plots in general:
78
- Removed Plotly for all graphics.
@@ -25,97 +26,25 @@ Changes:
2526
- Scree plot is showing lines for individual and accumulated variation.
2627
- Points are by default rainbow colored circles.
2728

28-
Changes in deepTools2.0
29-
========================
29+
Core
30+
----
3031

31-
.. contents::
32-
:local:
33-
34-
Major changes
35-
-------------
36-
37-
.. note:: The major changes encompass features for **increased efficiency**, **new sequencing data types**, and **additional plots**, particularly for QC.
38-
39-
Moreover, deepTools modules can now be used by other python programs.
40-
The :ref:`api` is part of the new documentation.
41-
42-
Accommodating additional data types
43-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
44-
45-
* correlation and comparisons can now be calculated for **bigWig files** (in addition to BAM files) using ``multiBigwigSummary`` and ``bigwigCompare``
46-
47-
* **RNA-seq:** split-reads are now natively supported
32+
* bamCoverage
33+
- --no_collapse flag to not merge bins with equal coverage values together.
4834

49-
* **MNase-seq:** using the new option ``--MNase`` in ``bamCoverage``, one can now compute read coverage only taking the 2 central base pairs of each mapped fragment into account.
50-
51-
Structural updates
52-
^^^^^^^^^^^^^^^^^^^
53-
54-
* All modules have comprehensive and automatic tests that evaluate proper functioning after any modification of the code.
55-
* Virtualization for stability: we now provide a ``docker`` image and enable the easy deployment of deepTools via the Galaxy ``toolshed``.
56-
* Our documentation is now version-aware thanks to readthedocs and ``sphinx``.
57-
* The API is public and documented.
58-
59-
Renamed tools
60-
^^^^^^^^^^^^^
61-
62-
* **heatmapper** to :doc:`tools/plotHeatmap`
63-
* **profiler** to :doc:`tools/plotProfile`
64-
* **bamCorrelate** to :doc:`tools/multiBamSummary`
65-
* **bigwigCorrelate** to :doc:`tools/multiBigwigSummary`
66-
* **bamFingerprint** to :doc:`tools/plotFingerprint`
67-
68-
69-
Increased efficiency
70-
^^^^^^^^^^^^^^^^^^^^
71-
72-
* We dramatically improved the **speed** of bigwig related tools (:doc:`tools/multiBigwigSummary` and ``computeMatrix``) by using the new `pyBigWig module <https://github.com/dpryan79/pyBigWig>`_.
73-
74-
* It is now possible to generate one composite heatmap and/or meta-gene image based on **multiple bigwig files** in one go (see :doc:`tools/computeMatrix`, :doc:`tools/plotHeatmap`, and :doc:`tools/plotProfile` for examples)
75-
76-
* ``computeMatrix`` now also accepts multiple input BED files. Each is treated as a group within a sample and is plotted independently.
77-
78-
* We added **additional filtering options for handling BAM files**, decreasing the need for prior filtering using tools other than deepTools: The ``--samFlagInclude`` and ``--samFlagExclude`` parameters can, for example, be used to only include (or exclude) forward reads in an analysis.
79-
80-
* We separated the generation of read count tables from the calculation of pairwise correlations that was previously handled by ``bamCorrelate``. Now, read counts are calculated first using ``multiBamSummary`` or ``multiBigWigCoverage`` and the resulting output file can be used for calculating and plotting pairwise correlations using ``plotCorrelation`` or for doing a principal component analysis using ``plotPCA``.
81-
82-
New features and tools
83-
^^^^^^^^^^^^^^^^^^^^^^
84-
85-
* Correlation analyses are no longer limited to BAM files -- bigwig files are possible, too! (see :doc:`tools/multiBigwigSummary`)
86-
87-
* Correlation coefficients can now be computed even if the data contains NaNs.
88-
89-
* Added **new quality control** tools:
90-
- use :doc:`tools/plotCoverage` to plot the coverage over base pairs
91-
- use :doc:`tools/plotPCA` for principal component analysis
92-
- :doc:`tools/bamPEFragmentSize` can be used to calculate the average fragment size for paired-end read data
93-
94-
* Added the possibility for **hierarchical clustering**, besides *k*-means to ``plotProfile`` and ``plotHeatmap``
95-
96-
* ``plotProfile`` has many more options to make compelling summary plots
97-
98-
99-
Minor changes
100-
-------------
101-
102-
Changed parameters names and settings
103-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
104-
105-
* ``computeMatrix`` can now read files with DOS newline characters.
106-
* ``--missingDataAsZero`` was renamed to ``--skipNonCoveredRegions`` for clarity in ``bamCoverage`` and ``bamCompare``.
107-
* Read extension was made optional and we removed the need to specify a default fragment length for most of the tools: ``--fragmentLength`` was thus replaced by the new optional parameter ``--extendReads``.
108-
* Added option ``--skipChromosomes`` to ``multiBigwigSummary``, which can be used to, for example, skip all 'random' chromosomes.
109-
* Added the option for adding titles to QC plots.
110-
111-
Bug fixes
112-
^^^^^^^^^
113-
114-
* Resolved an error introduced by ``numpy version 1.10`` in ``computeMatrix``.
115-
* Improved plotting features for ``plotProfile`` when using as plot type: 'overlapped_lines' and 'heatmap'
116-
* Fixed problem with BED intervals in ``multiBigwigSummary`` and ``multiBamSummary`` that returned wrongly labeled raw counts.
117-
* ``multiBigwigSummary`` now also considers chromosomes as identical when the names between samples differ by 'chr' prefix, e.g. chr1 vs. 1.
118-
* Fixed problem with wrongly labeled proper read pairs in a BAM file. We now have additional checks to determine if a read pair is a proper pair: the reads must face each other and are not allowed to be farther apart than 4x the mean fragment length.
119-
* For ``bamCoverage`` and ``bamCompare``, the behavior of ``scaleFactor`` was updated such that now, if given in combination with the normalization options (``--normalizeTo1x`` or ``--normalizeUsingRPKM``), the given scaling factor will be multiplied with the factor computed by the respective normalization method.
120-
121-
35+
* computeMatrix
36+
- --sortRegions 'no' option no longer exists
37+
- Sorting ascend / descend no longer has subsorting by position.
38+
- --quiet / -q option no longer exists.
39+
- bed files in computeMatrix no longer support '#' to define groups.
40+
- 'chromosome matching' i.e. chr1 <-> 1, chrMT <-> MT is no longer performed.
41+
42+
* normalization
43+
- Exactscaling is no longer an option, it's always performed.
44+
45+
* alignmentSieve
46+
- options label, smartLabels, genomeChunkLength are removed.
47+
- ignoreDuplicates is removed, and (if wanted) should be set by the SamFlagExclude setting.
48+
49+
Testing
50+
-------

pydeeptools/deeptools/bamCompare2.py

Lines changed: 19 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -135,6 +135,11 @@ def getOptionalArgs():
135135
'This is determined BEFORE any applicable pseudocount '
136136
'is added.',
137137
action='store_true')
138+
optional.add_argument('--no_collapse',
139+
help='By default adjacent bins that have the same value are collapsed. This reduces the size of the output file (drastically).'
140+
'If you like to opt out of this behavior, you can set this flag.',
141+
default=True,
142+
action='store_false')
138143

139144
return parser
140145

@@ -252,6 +257,15 @@ def main(args=None):
252257
args.samFlagExclude = 0
253258
if not args.region:
254259
args.region = 'None'
260+
if not args.extendReads:
261+
args.extendReads = False
262+
args.extendReadsLen = 0
263+
elif isinstance(args.extendReads, bool):
264+
args.extendReadsLen = 0
265+
args.extendReads = True
266+
elif isinstance(args.extendReads, int):
267+
args.extendReadsLen = args.extendReads
268+
args.extendReads = True
255269
if not args.blackListFileName:
256270
args.blackListFileName = 'None'
257271
else:
@@ -272,8 +286,10 @@ def main(args=None):
272286
args.scaleFactorsMethod, # scaling method
273287
args.operation,
274288
args.pseudocount,
289+
args.extendReads,
290+
args.extendReadsLen,
291+
args.centerReads,
275292
args.blackListFileName,
276-
args.ignoreDuplicates,
277293
args.minMappingQuality,
278294
args.samFlagInclude,
279295
args.samFlagExclude,
@@ -283,32 +299,6 @@ def main(args=None):
283299
args.ignoreForNormalization,
284300
args.binSize, # bin size
285301
args.region, # regions
286-
True # verbose
302+
args.verbose, # verbose
303+
args.no_collapse, # collapse the ofile or not.
287304
)
288-
289-
# #if args.normalizeUsing == "RPGC":
290-
# # sys.exit("RPGC normalization (--normalizeUsing RPGC) is not supported with bamCompare!")
291-
# #if args.normalizeUsing == 'None':
292-
# args.normalizeUsing = None # For the sake of sanity
293-
# if args.scaleFactorsMethod != 'None' and args.normalizeUsing:
294-
# sys.exit("`--normalizeUsing {}` is only valid if you also use `--scaleFactorsMethod None`! To prevent erroneous output, I will quit now.\n".format(args.normalizeUsing))
295-
296-
# # Get mapping statistics
297-
# bam1, mapped1, unmapped1, stats1 = bamHandler.openBam(args.bamfile1, returnStats=True, nThreads=args.numberOfProcessors)
298-
# bam1.close()
299-
# bam2, mapped2, unmapped2, stats2 = bamHandler.openBam(args.bamfile2, returnStats=True, nThreads=args.numberOfProcessors)
300-
# bam2.close()
301-
302-
# scale_factors = get_scale_factors(args, [stats1, stats2], [mapped1, mapped2])
303-
# if scale_factors is None:
304-
# # check whether one of the depth norm methods are selected
305-
# if args.normalizeUsing is not None:
306-
# args.scaleFactor = 1.0
307-
# # if a normalization is required then compute the scale factors
308-
# args.bam = args.bamfile1
309-
# scale_factor_bam1 = get_scale_factor(args, stats1)
310-
# args.bam = args.bamfile2
311-
# scale_factor_bam2 = get_scale_factor(args, stats2)
312-
# scale_factors = [scale_factor_bam1, scale_factor_bam2]
313-
# else:
314-
# scale_factors = [1, 1]

pydeeptools/deeptools/bamCoverage2.py

Lines changed: 20 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,12 @@ def get_optional_args():
101101
choices=['forward', 'reverse'],
102102
default=None)
103103

104+
optional.add_argument('--no_collapse',
105+
help='By default adjacent bins that have the same value are collapsed. This reduces the size of the output file (drastically).'
106+
'If you like to opt out of this behavior, you can set this flag.',
107+
default=True,
108+
action='store_false')
109+
104110
return parser
105111

106112
def scaleFactor(string):
@@ -136,9 +142,18 @@ def main(args=None):
136142
if not args.normalizeUsing:
137143
args.normalizeUsing = 'None'
138144
if not args.Offset:
139-
args.Offset = [1, -1]
145+
args.Offset = [0, 0]
146+
elif len(args.Offset) == 1:
147+
args.Offset = [args.Offset[0], 0]
140148
if not args.extendReads:
141-
args.extendReads = 0
149+
args.extendReads = False
150+
args.extendReadsLen = 0
151+
elif isinstance(args.extendReads, bool):
152+
args.extendReadsLen = 0
153+
args.extendReads = True
154+
elif isinstance(args.extendReads, int):
155+
args.extendReadsLen = args.extendReads
156+
args.extendReads = True
142157
if not args.filterRNAstrand:
143158
args.filterRNAstrand = 'None'
144159
if not args.blackListFileName:
@@ -170,6 +185,7 @@ def main(args=None):
170185
args.MNase,
171186
args.Offset,
172187
args.extendReads,
188+
args.extendReadsLen,
173189
args.centerReads,
174190
args.filterRNAstrand,
175191
args.blackListFileName,
@@ -178,7 +194,6 @@ def main(args=None):
178194
args.smoothLength,
179195
args.binSize, # bin size
180196
# Filtering options
181-
args.ignoreDuplicates,
182197
args.minMappingQuality,
183198
args.samFlagInclude,
184199
args.samFlagExclude,
@@ -187,5 +202,6 @@ def main(args=None):
187202
# running options
188203
args.numberOfProcessors, # threads
189204
args.region, # regions
190-
args.verbose # verbose
205+
args.verbose, # verbose
206+
args.no_collapse,
191207
)

pydeeptools/deeptools/multiBamSummary2.py

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -224,6 +224,15 @@ def process_args(args=None):
224224
args.outRawCounts = "None"
225225
if not args.scalingFactors:
226226
args.scalingFactors = "None"
227+
if not args.extendReads:
228+
args.extendReads = False
229+
args.extendReadsLen = 0
230+
elif isinstance(args.extendReads, bool):
231+
args.extendReadsLen = 0
232+
args.extendReads = True
233+
elif isinstance(args.extendReads, int):
234+
args.extendReadsLen = args.extendReads
235+
args.extendReads = True
227236
# defaults for the filtering options
228237
if not args.samFlagInclude:
229238
args.samFlagInclude = 0
@@ -248,7 +257,6 @@ def main(args=None):
248257
"""
249258
args = process_args(args)
250259
print(f"args = {args}")
251-
print("running r_mbams")
252260
r_mbams(
253261
args.command,
254262
args.bamfiles,
@@ -264,6 +272,7 @@ def main(args=None):
264272
args.blackListFileName,
265273
args.verbose,
266274
args.extendReads,
275+
args.extendReadsLen,
267276
args.centerReads,
268277
args.samFlagInclude,
269278
args.samFlagExclude,

pydeeptools/deeptools/parserCommon.py

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -257,14 +257,14 @@ def normalization_options():
257257
default=None,
258258
required=False)
259259

260-
group.add_argument('--exactScaling',
261-
help='Instead of computing scaling factors based on a sampling of the reads, '
262-
'process all of the reads to determine the exact number that will be used in '
263-
'the output. This requires significantly more time to compute, but will '
264-
'produce more accurate scaling factors in cases where alignments that are '
265-
'being filtered are rare and lumped together. In other words, this is only '
266-
'needed when region-based sampling is expected to produce incorrect results.',
267-
action='store_true')
260+
# group.add_argument('--exactScaling',
261+
# help='Instead of computing scaling factors based on a sampling of the reads, '
262+
# 'process all of the reads to determine the exact number that will be used in '
263+
# 'the output. This requires significantly more time to compute, but will '
264+
# 'produce more accurate scaling factors in cases where alignments that are '
265+
# 'being filtered are rare and lumped together. In other words, this is only '
266+
# 'needed when region-based sampling is expected to produce incorrect results.',
267+
# action='store_true')
268268

269269
group.add_argument('--ignoreForNormalization', '-ignore',
270270
help='A list of space-delimited chromosome names '

0 commit comments

Comments
 (0)