Filter ASV abundance matrix and convert to taxmap and phyloseq objects
Source:R/convert_asv_matrix.R
convert_asv_matrix_to_objs.Rd
Filter ASV abundance matrix and convert to taxmap and phyloseq objects
Usage
convert_asv_matrix_to_objs(
analysis_setup,
min_read_depth = 0,
minimum_bootstrap = 0,
save_outputs = FALSE,
overwrite_existing = FALSE
)
Arguments
- analysis_setup
analysis_setup An object containing directory paths and data tables, produced by the
prepare_reads
function- min_read_depth
ASV filter parameter. If mean read depth of across all samples is less than this threshold, ASV will be filtered.
- minimum_bootstrap
Threshold for bootstrap support value for taxonomic assignments. Below designated minimum bootstrap threshold, taxnomoic assignments will be set to N/A
- save_outputs
Logical, indicating whether to save the taxmap object. Default is FALSE.
- overwrite_existing
Logical, indicating whether to overwrite existing results. Default is FALSE.
Examples
# Convert final matrix to taxmap and phyloseq objects for downstream analysis steps
analysis_setup <- prepare_reads(
data_directory = system.file("extdata", package = "demulticoder"),
output_directory = tempdir(),
tempdir_path = tempdir(),
tempdir_id = "demulticoder_run_temp",
overwrite_existing = TRUE
)
#> Rows: 2 Columns: 22
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (3): primer_name, forward, reverse
#> dbl (16): minCutadaptlength, maxN, maxEE_forward, maxEE_reverse, truncLen_fo...
#> lgl (3): already_trimmed, multithread, verbose
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Rows: 2 Columns: 22
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (3): primer_name, forward, reverse
#> dbl (16): minCutadaptlength, maxN, maxEE_forward, maxEE_reverse, truncLen_fo...
#> lgl (3): already_trimmed, multithread, verbose
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Rows: 4 Columns: 3
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (3): sample_name, primer_name, organism
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Creating output directory: /tmp/Rtmp23cMn7/demulticoder_run_temp/prefiltered_sequences
cut_trim(
analysis_setup,
cutadapt_path="/usr/bin/cutadapt",
overwrite_existing = TRUE
)
#> Running Cutadapt 3.5 for its sequence data
#> Running Cutadapt 3.5 for rps10 sequence data
make_asv_abund_matrix(
analysis_setup,
overwrite_existing = TRUE
)
#> 711900 total bases in 2698 reads from 2 samples will be used for learning the error rates.
#> Initializing error rates to maximum possible estimate.
#> selfConsist step 1 ..
#> selfConsist step 2
#> selfConsist step 3
#> Convergence after 3 rounds.
#> Error rate plot for the Forward read of primer pair its
#> Warning: log-10 transformation introduced infinite values.
#> Sample 1 - 1481 reads in 660 unique sequences.
#> Sample 2 - 1217 reads in 614 unique sequences.
#> 725393 total bases in 2698 reads from 2 samples will be used for learning the error rates.
#> Initializing error rates to maximum possible estimate.
#> selfConsist step 1 ..
#> selfConsist step 2
#> selfConsist step 3
#> Convergence after 3 rounds.
#> Error rate plot for the Reverse read of primer pair its
#> Warning: log-10 transformation introduced infinite values.
#> Sample 1 - 1481 reads in 1021 unique sequences.
#> Sample 2 - 1217 reads in 815 unique sequences.
#> 1316 paired-reads (in 21 unique pairings) successfully merged out of 1418 (in 32 pairings) input.
#> Duplicate sequences in merged output.
#> 1065 paired-reads (in 25 unique pairings) successfully merged out of 1110 (in 28 pairings) input.
#> Duplicate sequences detected and merged.
#> Identified 0 bimeras out of 38 input sequences.
#> 824778 total bases in 2935 reads from 2 samples will be used for learning the error rates.
#> Initializing error rates to maximum possible estimate.
#> selfConsist step 1 ..
#> selfConsist step 2
#> Convergence after 2 rounds.
#> Error rate plot for the Forward read of primer pair rps10
#> Warning: log-10 transformation introduced infinite values.
#> Sample 1 - 1429 reads in 933 unique sequences.
#> Sample 2 - 1506 reads in 1018 unique sequences.
#> 821851 total bases in 2935 reads from 2 samples will be used for learning the error rates.
#> Initializing error rates to maximum possible estimate.
#> selfConsist step 1 ..
#> selfConsist step 2
#> selfConsist step 3
#> Convergence after 3 rounds.
#> Error rate plot for the Reverse read of primer pair rps10
#> Warning: log-10 transformation introduced infinite values.
#> Sample 1 - 1429 reads in 1044 unique sequences.
#> Sample 2 - 1506 reads in 1284 unique sequences.
#> 1420 paired-reads (in 2 unique pairings) successfully merged out of 1422 (in 4 pairings) input.
#> 1503 paired-reads (in 5 unique pairings) successfully merged out of 1504 (in 6 pairings) input.
#> Identified 0 bimeras out of 5 input sequences.
#> $its
#> [1] "/tmp/Rtmp23cMn7/demulticoder_run_temp/asvabund_matrixDADA2_its.RData"
#>
#> $rps10
#> [1] "/tmp/Rtmp23cMn7/demulticoder_run_temp/asvabund_matrixDADA2_rps10.RData"
#>
assign_tax(
analysis_setup,
asv_abund_matrix,
retrieve_files=FALSE,
overwrite_existing=TRUE
)
#> Duplicate sequences detected and merged.
#> samplename_barcode input filtered denoisedF denoisedR merged nonchim
#> 1 S1_its 2564 1481 1427 1433 1316 1316
#> 2 S2_its 1996 1217 1145 1124 1065 1065
#> samplename_barcode input filtered denoisedF denoisedR merged nonchim
#> 1 S1_rps10 1830 1429 1429 1422 1420 1420
#> 2 S2_rps10 2090 1506 1505 1505 1503 1503
objs<-convert_asv_matrix_to_objs(
analysis_setup
)
#> Rows: 38 Columns: 5
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (3): asv_id, sequence, dada2_tax
#> dbl (2): S1_its, S2_its
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> For its dataset
#> ASV matrix found, but save_outputs is FALSE. Rerun previous part of the pipeline.
#> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#> Rows: 5 Columns: 5
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (3): asv_id, sequence, dada2_tax
#> dbl (2): S1_rps10, S2_rps10
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> For rps10 dataset
#> ASV matrix found, but save_outputs is FALSE. Rerun previous part of the pipeline.
#> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~