Skip to contents

Assign taxonomy functions

Usage

assign_tax(
  analysis_setup,
  asv_abund_matrix,
  tryRC = FALSE,
  verbose = FALSE,
  multithread = FALSE,
  retrieve_files = FALSE,
  overwrite_existing = FALSE,
  db_rps10 = "oomycetedb.fasta",
  db_its = "fungidb.fasta",
  db_16S = "bacteriadb.fasta",
  db_other1 = "otherdb1.fasta",
  db_other2 = "otherdb2.fasta"
)

Arguments

analysis_setup

An object containing directory paths and data tables, produced by the prepare_reads function

asv_abund_matrix

ASV abundance matrix.

tryRC

Whether to try reverse complementing sequences during taxonomic assignment

verbose

Logical, indicating whether to display verbose output

multithread

Logical, indicating whether to use multithreading

retrieve_files

Specify TRUE/FALSE whether to copy files from the temp directory to the output directory

overwrite_existing

Logical, indicating whether to remove or overwrite existing files and directories from previous runs. Default is FALSE.

db_rps10

The reference database for the rps10 locus

db_its

The reference database for the ITS locus

db_16S

The reference database for the 16S locus

db_other1

The reference database for different locus 1 (assumes format is like SILVA DB entries)

db_other2

The reference database for a different locus 2 (assumes format is like SILVA DB entries)

Value

Taxonomic assignments of each unique ASV sequence

Details

At this point DADA2 assignTaxonomy is used to assign taxonomy to the inferred ASVs.

Examples

# Assign taxonomies to ASVs on a per barcode basis
analysis_setup <- prepare_reads(
  data_directory = system.file("extdata", package = "demulticoder"),
  output_directory = tempdir(),
  tempdir_path = tempdir(),
  tempdir_id = "demulticoder_run_temp",
  overwrite_existing = TRUE
)
#> Rows: 2 Columns: 23
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr  (3): primer_name, forward, reverse
#> dbl (16): minCutadaptlength, maxN, maxEE_forward, maxEE_reverse, truncLen_fo...
#> lgl  (4): already_trimmed, count_all_samples, multithread, verbose
#> 
#>  Use `spec()` to retrieve the full column specification for this data.
#>  Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Rows: 2 Columns: 23
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr  (3): primer_name, forward, reverse
#> dbl (16): minCutadaptlength, maxN, maxEE_forward, maxEE_reverse, truncLen_fo...
#> lgl  (4): already_trimmed, count_all_samples, multithread, verbose
#> 
#>  Use `spec()` to retrieve the full column specification for this data.
#>  Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Rows: 4 Columns: 3
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (3): sample_name, primer_name, organism
#> 
#>  Use `spec()` to retrieve the full column specification for this data.
#>  Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Creating output directory: /tmp/RtmpRgOGJ7/demulticoder_run_temp/prefiltered_sequences

cut_trim(
analysis_setup,
cutadapt_path="/usr/bin/cutadapt",
overwrite_existing = TRUE
)
#> Running Cutadapt 3.5 for its sequence data 
#> Read in 2564 paired-sequences, output 1479 (57.7%) filtered paired-sequences.
#> Read in 1996 paired-sequences, output 1215 (60.9%) filtered paired-sequences.
#> Running Cutadapt 3.5 for rps10 sequence data 
#> Read in 1830 paired-sequences, output 1429 (78.1%) filtered paired-sequences.
#> Read in 2090 paired-sequences, output 1506 (72.1%) filtered paired-sequences.

make_asv_abund_matrix(
analysis_setup, 
overwrite_existing = TRUE
)
#> 710804 total bases in 2694 reads from 2 samples will be used for learning the error rates.
#> Initializing error rates to maximum possible estimate.
#> selfConsist step 1 ..
#>    selfConsist step 2
#>    selfConsist step 3
#> Convergence after  3  rounds.
#> Error rate plot for the Forward read of primer pair its 
#> Warning: log-10 transformation introduced infinite values.
#> Sample 1 - 1479 reads in 660 unique sequences.
#> Sample 2 - 1215 reads in 613 unique sequences.
#> 724230 total bases in 2694 reads from 2 samples will be used for learning the error rates.
#> Initializing error rates to maximum possible estimate.
#> selfConsist step 1 ..
#>    selfConsist step 2
#>    selfConsist step 3
#> Convergence after  3  rounds.
#> Error rate plot for the Reverse read of primer pair its 
#> Warning: log-10 transformation introduced infinite values.
#> Sample 1 - 1479 reads in 1019 unique sequences.
#> Sample 2 - 1215 reads in 814 unique sequences.
#> 1315 paired-reads (in 21 unique pairings) successfully merged out of 1416 (in 32 pairings) input.
#> Duplicate sequences in merged output.
#> 1063 paired-reads (in 25 unique pairings) successfully merged out of 1108 (in 28 pairings) input.

#> Duplicate sequences detected and merged.
#> Identified 0 bimeras out of 38 input sequences.
#> 824778 total bases in 2935 reads from 2 samples will be used for learning the error rates.
#> Initializing error rates to maximum possible estimate.
#> selfConsist step 1 ..
#>    selfConsist step 2
#> Convergence after  2  rounds.
#> Error rate plot for the Forward read of primer pair rps10 
#> Warning: log-10 transformation introduced infinite values.
#> Sample 1 - 1429 reads in 933 unique sequences.
#> Sample 2 - 1506 reads in 1018 unique sequences.
#> 821851 total bases in 2935 reads from 2 samples will be used for learning the error rates.
#> Initializing error rates to maximum possible estimate.
#> selfConsist step 1 ..
#>    selfConsist step 2
#>    selfConsist step 3
#> Convergence after  3  rounds.
#> Error rate plot for the Reverse read of primer pair rps10 
#> Warning: log-10 transformation introduced infinite values.

#> Sample 1 - 1429 reads in 1044 unique sequences.
#> Sample 2 - 1506 reads in 1284 unique sequences.
#> 1420 paired-reads (in 2 unique pairings) successfully merged out of 1422 (in 4 pairings) input.
#> 1503 paired-reads (in 5 unique pairings) successfully merged out of 1504 (in 6 pairings) input.

#> Identified 0 bimeras out of 5 input sequences.

#> $its
#> [1] "/tmp/RtmpRgOGJ7/demulticoder_run_temp/asvabund_matrixDADA2_its.RData"
#> 
#> $rps10
#> [1] "/tmp/RtmpRgOGJ7/demulticoder_run_temp/asvabund_matrixDADA2_rps10.RData"
#> 
assign_tax(
analysis_setup,
asv_abund_matrix, 
retrieve_files=FALSE, 
overwrite_existing = TRUE
)
#> Duplicate sequences detected and merged.
#>   samplename_barcode input filtered denoisedF denoisedR merged nonchim
#> 1          S1_R1_its  2564     1479      1425      1431   1315    1315
#> 2          S2_R1_its  1996     1215      1143      1122   1063    1063
#>   samplename_barcode input filtered denoisedF denoisedR merged nonchim
#> 1        S1_R1_rps10  1830     1429      1429      1422   1420    1420
#> 2        S2_R1_rps10  2090     1506      1505      1505   1503    1503