Skip to contents

Prepare reads for primer trimming using Cutadapt

Usage

prepare_reads(
  data_directory = "data",
  output_directory = "output",
  tempdir_path = NULL,
  tempdir_id = "demulticoder_run",
  overwrite_existing = FALSE
)

Arguments

data_directory

User-specified directory path where the user has placed raw FASTQ (forward and reverse reads), metadata.csv, and primerinfo_params.csv files. Default is "data".

output_directory

User-specified directory for outputs. Default is "output".

tempdir_path

Path to a temporary directory. If NULL, a temporary directory path will be identified using the tempdir() command.

tempdir_id

ID for temporary directories. Default is "demulticoder_run". The user can provide any helpful ID, whether it be a date or specific name for the run.

overwrite_existing

Logical, indicating whether to remove or overwrite existing files and directories from previous runs. Default is FALSE.

multithread

Logical, indicating whether to use multithreading for certain operations. Default is FALSE.

Value

A list containing data tables, including metadata, primer sequences to search for based on orientation, paths for trimming reads, and user-defined parameters for all subsequent steps.

Examples

# Pre-filter raw reads and parse metadata and primer_information to prepare 
# for primer trimming and filter
analysis_setup <- prepare_reads(
  data_directory = system.file("extdata", package = "demulticoder"),
  output_directory = tempdir(),
  tempdir_path = tempdir(),
  tempdir_id = "demulticoder_run_temp",
  overwrite_existing = TRUE
)
#> Rows: 2 Columns: 23
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr  (3): primer_name, forward, reverse
#> dbl (16): minCutadaptlength, maxN, maxEE_forward, maxEE_reverse, truncLen_fo...
#> lgl  (4): already_trimmed, count_all_samples, multithread, verbose
#> 
#>  Use `spec()` to retrieve the full column specification for this data.
#>  Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Rows: 2 Columns: 23
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr  (3): primer_name, forward, reverse
#> dbl (16): minCutadaptlength, maxN, maxEE_forward, maxEE_reverse, truncLen_fo...
#> lgl  (4): already_trimmed, count_all_samples, multithread, verbose
#> 
#>  Use `spec()` to retrieve the full column specification for this data.
#>  Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Rows: 4 Columns: 3
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (3): sample_name, primer_name, organism
#> 
#>  Use `spec()` to retrieve the full column specification for this data.
#>  Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Creating output directory: /tmp/RtmpRgOGJ7/demulticoder_run_temp/prefiltered_sequences