Advances in computing have made moving files between different platforms much easier than just a few years ago. The adoption of the UNICODE standard for character encoding and cross compatibility between operating systems has made problems opening a file received as an attachment to an email, downloaded from a web site, or accessed on a cloud storage platform extremely rare, but problems can still occur.
While the vast majority of desktop computers still use Microsoft Windows or the Macintosh OS, there remain many other operating systems (and file systems) that can interact with files at different points. Cell phones, tape drives, networking equipment, televisions, and even digital cameras support file systems today.
Most file systems today, and the operating systems that incorporate them, support much longer file names than the personal computers that ran Microsoft DOS and early versions of Windows. These computers used the 8.3 filename which allowed eight characters to the left of a period with three characters to the right to tell the computer which application to use to display it. However, it is still possible to run into problems related to filename length.
Adopting good file naming conventions can help ensure that files will work with different operating systems and disk formats, such as Windows, Linux, Mac OS X and UNIX. File naming is also an important consideration when transferring files via the Internet, where it may not be evident what computer platform was used when the files were originally created.
File names can be either descriptive or non-descriptive. Descriptive file names are useful for small, well-defined projects with existing identification schemes that link the digital object to the source material. However, inconsistent application of terms or typos will contribute to indexing and sorting errors. Non-descriptive file names are usually system-generated sequential numerical strings, such as a digital ID number and are often linked to metadata stored elsewhere. Non-descriptive file names are often created for large scale digitization projects and may employ a digital ID number and numerical sequences to indicate batch or parent-child relationships. The advantage of non-descriptive names is that there is less chance of repeated or non-unique file names within a data structure.
Some applications and computer scripts may not recognize spaces or will process your files differently when using spaces. A best practice is to replace spaces in file names with an underline (_) or hyphen (-). Appendix B of NARA Bulletin 2015-04 states that spaces aren’t allowed in filenames. Web environments translate spaces and will render them as “%20”. For example, “File Name.doc” would appear on-line in the URL as “File%20Name.doc” where?. This alteration can cause confusion in identifying the actual file name.
Punctuation, symbols, or special characters (periods, commas, parentheses, ampersands, asterisks, etc.) should be avoided. Some of these symbols are used in operating systems to perform certain tasks, such as to identify folder levels in Microsoft products and Mac operating systems. Periods are used to identify file formats such as .jpg and .doc.
The following are best practices for file naming. File names should:
- Be unique and consistently structured;
- Be persistent and not tied to anything that changes over time or location;
- Limit the character length to no more than 25-35 characters;
- Use leading 0s to facilitate sorting in numerical order if following a numeric scheme “001, 002, …010, 011 … 100, 101, etc.” instead of “1, 2, …10, 11 … 100, 101, etc.”;
- Contain a file format extension;
- Use a period followed by a file extension (for example, .tif, .jpg, .gif, .pdf, .wav, .mpg);
- Use lowercase letters. However, when a name has more than one word, start each word with an uppercase letter for example, “File_Name_Convention_001.doc”;
- Use numbers and/or letters but not characters such as symbols or spaces that could cause complications across operating platforms;
- Use hyphens or underscores instead of spaces;
- Use international standard date notation (YYYY-MM-DD or YYYYMMDD);
- Avoid blank spaces anywhere within the character string; and
- Not use an overly complex or lengthy naming scheme that is susceptible to human error during manual input, such as “filenameconventionjoesfinalversioneditedfinal.doc”.
Update on 8/31/17: Changed underscores to hyphens.