Skip to contents

Prepare reads for primer trimming using Cutadapt

Usage

prepare_reads(
  data_directory = "data",
  output_directory = "output",
  tempdir_path = NULL,
  tempdir_id = "demulticoder_run",
  multithread = FALSE,
  overwrite_existing = FALSE
)

Arguments

data_directory

User-specified directory path where the user has placed raw FASTQ (forward and reverse reads), metadata.csv, and primerinfo_params.csv files. Default is "data".

output_directory

User-specified directory for outputs. Default is "output".

tempdir_path

Path to a temporary directory. If NULL, a temporary directory path will be identified using the tempdir() command.

tempdir_id

ID for temporary directories. Default is "demulticoder_run". The user can provide any helpful ID, whether it be a date or specific name for the run.

multithread

Logical, indicating whether to use multithreading for certain operations. Default is FALSE.

overwrite_existing

Logical, indicating whether to remove or overwrite existing files and directories from previous runs. Default is FALSE.

Value

A list containing data tables, including metadata, primer sequences to search for based on orientation, paths for trimming reads, and user-defined parameters for all subsequent steps.

Examples

# Pre-filter raw reads and parse metadata and primer_information to prepare 
# for primer trimming and filter
analysis_setup <- prepare_reads(
  data_directory = system.file("extdata", package = "demulticoder"),
  output_directory = tempdir(),
  tempdir_path = tempdir(),
  tempdir_id = "demulticoder_run_temp",
  overwrite_existing = TRUE
)
#> Rows: 2 Columns: 22
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr  (3): primer_name, forward, reverse
#> dbl (16): minCutadaptlength, maxN, maxEE_forward, maxEE_reverse, truncLen_fo...
#> lgl  (3): already_trimmed, multithread, verbose
#> 
#>  Use `spec()` to retrieve the full column specification for this data.
#>  Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Rows: 2 Columns: 22
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr  (3): primer_name, forward, reverse
#> dbl (16): minCutadaptlength, maxN, maxEE_forward, maxEE_reverse, truncLen_fo...
#> lgl  (3): already_trimmed, multithread, verbose
#> 
#>  Use `spec()` to retrieve the full column specification for this data.
#>  Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Rows: 4 Columns: 3
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (3): sample_name, primer_name, organism
#> 
#>  Use `spec()` to retrieve the full column specification for this data.
#>  Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Creating output directory: /tmp/Rtmp23cMn7/demulticoder_run_temp/prefiltered_sequences