Make an amplified sequence variant (ASV) abundance matrix This function generates an ASV abundance matrix using raw reads processed during previous steps, including read preparation, removing primers, and using DADA2 core denoising alogrithm to infer ASVs.
Source:R/make_asv_abund_matrix.R
make_asv_abund_matrix.Rd
Make an amplified sequence variant (ASV) abundance matrix This function generates an ASV abundance matrix using raw reads processed during previous steps, including read preparation, removing primers, and using DADA2 core denoising alogrithm to infer ASVs.
Arguments
- analysis_setup
analysis_setup An object containing directory paths and data tables, produced by the
prepare_reads
function- overwrite_existing
Logical, indicating whether to overwrite existing results. Default is FALSE.
Details
The function processes data for each unique barcode separately, inferring ASVs, merging reads, and creating an ASV abundance matrix
Examples
# The primary wrapper function for DADA2 ASV inference steps
analysis_setup <- prepare_reads(
data_directory = system.file("extdata", package = "demulticoder"),
output_directory = tempdir(),
tempdir_path = tempdir(),
tempdir_id = "demulticoder_run_temp",
overwrite_existing = TRUE
)
#> Rows: 2 Columns: 22
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (3): primer_name, forward, reverse
#> dbl (16): minCutadaptlength, maxN, maxEE_forward, maxEE_reverse, truncLen_fo...
#> lgl (3): already_trimmed, multithread, verbose
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Rows: 2 Columns: 22
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (3): primer_name, forward, reverse
#> dbl (16): minCutadaptlength, maxN, maxEE_forward, maxEE_reverse, truncLen_fo...
#> lgl (3): already_trimmed, multithread, verbose
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Rows: 4 Columns: 3
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (3): sample_name, primer_name, organism
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Creating output directory: /tmp/Rtmp23cMn7/demulticoder_run_temp/prefiltered_sequences
cut_trim(
analysis_setup,
cutadapt_path="/usr/bin/cutadapt",
overwrite_existing = TRUE
)
#> Running Cutadapt 3.5 for its sequence data
#> Running Cutadapt 3.5 for rps10 sequence data
make_asv_abund_matrix(
analysis_setup,
overwrite_existing = TRUE
)
#> 711900 total bases in 2698 reads from 2 samples will be used for learning the error rates.
#> Initializing error rates to maximum possible estimate.
#> selfConsist step 1 ..
#> selfConsist step 2
#> selfConsist step 3
#> Convergence after 3 rounds.
#> Error rate plot for the Forward read of primer pair its
#> Warning: log-10 transformation introduced infinite values.
#> Sample 1 - 1481 reads in 660 unique sequences.
#> Sample 2 - 1217 reads in 614 unique sequences.
#> 725393 total bases in 2698 reads from 2 samples will be used for learning the error rates.
#> Initializing error rates to maximum possible estimate.
#> selfConsist step 1 ..
#> selfConsist step 2
#> selfConsist step 3
#> Convergence after 3 rounds.
#> Error rate plot for the Reverse read of primer pair its
#> Warning: log-10 transformation introduced infinite values.
#> Sample 1 - 1481 reads in 1021 unique sequences.
#> Sample 2 - 1217 reads in 815 unique sequences.
#> 1316 paired-reads (in 21 unique pairings) successfully merged out of 1418 (in 32 pairings) input.
#> Duplicate sequences in merged output.
#> 1065 paired-reads (in 25 unique pairings) successfully merged out of 1110 (in 28 pairings) input.
#> Duplicate sequences detected and merged.
#> Identified 0 bimeras out of 38 input sequences.
#> 824778 total bases in 2935 reads from 2 samples will be used for learning the error rates.
#> Initializing error rates to maximum possible estimate.
#> selfConsist step 1 ..
#> selfConsist step 2
#> Convergence after 2 rounds.
#> Error rate plot for the Forward read of primer pair rps10
#> Warning: log-10 transformation introduced infinite values.
#> Sample 1 - 1429 reads in 933 unique sequences.
#> Sample 2 - 1506 reads in 1018 unique sequences.
#> 821851 total bases in 2935 reads from 2 samples will be used for learning the error rates.
#> Initializing error rates to maximum possible estimate.
#> selfConsist step 1 ..
#> selfConsist step 2
#> selfConsist step 3
#> Convergence after 3 rounds.
#> Error rate plot for the Reverse read of primer pair rps10
#> Warning: log-10 transformation introduced infinite values.
#> Sample 1 - 1429 reads in 1044 unique sequences.
#> Sample 2 - 1506 reads in 1284 unique sequences.
#> 1420 paired-reads (in 2 unique pairings) successfully merged out of 1422 (in 4 pairings) input.
#> 1503 paired-reads (in 5 unique pairings) successfully merged out of 1504 (in 6 pairings) input.
#> Identified 0 bimeras out of 5 input sequences.
#> $its
#> [1] "/tmp/Rtmp23cMn7/demulticoder_run_temp/asvabund_matrixDADA2_its.RData"
#>
#> $rps10
#> [1] "/tmp/Rtmp23cMn7/demulticoder_run_temp/asvabund_matrixDADA2_rps10.RData"
#>