Although advanced tools like git and RMarkdown greatly enhance reproducibility, they are only useful if files and folders are organized in such a way that another person (or you 3 weeks/months/years later) can make sense of them. This can be surprisingly difficult and there is no universal solution that will work best for all people or projects. However, there are some tips that will help you arrive at a solution that fits your work style and the needs of your project.
File/Folder naming
Good file/folder naming is perhaps the easiest way to improve the understandably of the digital aspects of your project. Ideally, a file’s name and the names of the folders it is in should be enough information to make a good guess at its contents. Note that this information need not all be in the file’s name, which would lead to unreadably long file names; it should be distributed throughout the names of the folders it is in as well as the file’s name. Below are a couple tips on file naming:
- Name files with a purpose. Never use arbitrary names (e.g.
Document1.Rmd
, asdafsa.Rmd
). If the file is unimportant or temporary, name it something like temp.Rmd
, scratch.Rmd
, or random_gibberish.Rmd
so that you don’t need to open it to know that it is unimportant.
- Only use letters, numbers, underscores, and dashes in file names. Use underscores instead of spaces; file names with spaces are irritating to manipulate on the command line and are not well handled by UNIX-based operating systems.
- One or more dashes make good delimiters if a file’s name has multiple distinct components (e.g.
chapter_1--what_i_did_today.Rmd
).
- Avoid mixing upper and lower case letters. Most operating systems are case-sensitive, so
file.Rmd
and File.Rmd
would be considered different files yet they might be easily confused by people. If you use uppercase, use it for the whole file name (e.g. README.txt
).
- Use YYYY-MM-DD for dates If a file/folder pertains to a specific date, prefix the date in the YYYY-MM-DD format (e.g. “2016-03-23–wheat_audpc.Rmd”). Using YYYY-MM-DD instead of something like YYYY-DD-MM means that files are sorted by date correctly by default. Also, YYYY-MM-DD is the ISO standard format for dates.
- Order your analyses If multiple files/folders have a logical order, prefix them with numbers, padding with leading zeros when needed (e.g.
002--DNA_extraction.Rmd
). The leading zeros make them ordered correctly by default.
- Use the correct extension All files you create are, in essence, text files, but they don’t all have to be labeled
.txt
.
- If a text file has a table of comma-separated values, use
.csv
instead of .txt
; if it is tab-separated, used .tsv
.
- If a text file contains markdown, use
.md
.
Folder structure
How to best construct a logical hierarchy of folders is less well defined than naming conventions since it will vary widely with project type and personal preference. Spend some time thinking about a system that will work for you and stick to it, at least within a single project.
- Use sub-folders. Don’t hesitate to use sub-folders to group similar files. A deep hierarchy with simple file names is better than a shallow hierarchy with long names.
- Keep things minimal. Avoid large numbers of unrelated files/folders in a single folder. It should be easy to scan over file names to get a sense of what is in a folder.
- Use README files. Add a “README.md” file to folders explaining their purpose and contents, especially if there are many files or files with un-intuitive names.
- Use a project-specific folder. Avoid using default, operating-system-specific directories like “My Documents” since these often get cluttered by program-generated files and have properties that are not supported by other operating systems. Instead, make a single directory that contains all of your files, not including programs and operating system components. This makes it easy to backup your files and transfer them to other computers.
Working habits
Good organization takes time and work to create and maintain, but developing good habits can make the task less daunting and can make you more efficient in the long term.
- Be consistent. Maintain a consistent organization style on all of the computers and projects you work on.
- Keep temporary files seperate. Have a temporary folder for files that you might not keep or have not decided where to put yet. However, this must be kept clean to be useful.
- Use shortcuts. Use shortcuts/links to access commonly used files/folders, especially if your project calls for a deep folder hierarchy.
- Back up early, Back up often. Make routine backups stored in three different ways (e.g. An external hard drive stored at home, an online backup service, and code and text on GitHub).