Node.js in Practice (2015)

Part 1. Node fundamentals

Chapter 6. File system: Synchronous and asynchronous approaches to files

This chapter covers

·        Understanding the fs module and its components

·        Working with configuration files and file descriptors

·        Using file-locking techniques

·        Recursive file operations

·        Writing a file database

·        Watching files and directories

As we’ve noted in previous chapters, Node’s core modules typically stick to a low-level API. This allows for various (even competing) ideas and implementations of higher-level concepts like web frameworks, file parsers, and command-line tools to exist as third-party modules. The fs (or file system) module is no different.

The fs module allows the developer to interact with the file system by providing

·        POSIX file I/O primitives

·        File streaming

·        Bulk file I/O

·        File watching

The fs module is unique compared with other I/O modules (like net and http) in that it has both asynchronous and synchronous APIs. That means that it provides a mechanism to perform blocking I/O. The reason the file system also has a synchronous API is largely because of the internal workings of Node itself, namely, the module system and the synchronous behavior of require.

The goal of this chapter is to show you a number of techniques, of varying complexity, to use when working with the file system module. We’ll look at

·        Asynchronous and synchronous approaches for loading configuration files

·        Working with the file descriptors

·        Advisory file-locking techniques

·        Recursive file operations

·        Writing a file database

·        Watching for file and directory changes

But before we get to the techniques, let’s first take a high-level view of all you can do with the file system API in order to capture the functionality and provide some insight into what tool may be the best for the job.

6.1. An overview of the fs module

The fs module includes wrappers for common POSIX file operations, as well as bulk, stream, and watching operations. It also has synchronous APIs for many of the operations. Let’s take a high-level walk through the different components.

6.1.1. POSIX file I/O wrappers

At a bird’s-eye view, the majority of methods in the file system API are wrappers around standard POSIX file I/O calls ( These methods will have a similar name. For example, the readdir call ( has an fs.readdir counterpart in Node:

var fs = require('fs');

fs.readdir('/path/to/dir', function (err, files) {

  console.log(files); // [ 'fileA', 'fileB', 'fileC', 'dirA', 'etc' ]


Table 6.1 shows a list of the supported POSIX file methods in Node, including a description of their functionality.

Table 6.1. Supported POSIX file methods in Node

POSIX method

fs method




Changes the name of a file



Truncates or extends a file to a specified length



Same as truncate but takes a file descriptor



Changes file owner and group



Same as chown but takes a file descriptor



Same as chown but doesn’t follow symbolic links



Changes file permissions



Same as chmod but takes a file descriptor



Same as chmod but doesn’t follow symbolic links



Gets file status



Same as stat but returns information about link if provided rather than what the link points to



Same as stat but takes a file descriptor


Makes a hard file link



Makes a symbolic link to a file



Reads value of a symbolic link



Returns the canonicalized absolute pathname



Removes directory entry



Removes directory



Makes directory



Reads contents of a directory



Deletes a file descriptor


Opens or creates a file for reading or writing



Sets file access and modification times



Same as utimes but takes a file descriptor



Synchronizes file data with disk



Writes data to a file


Reads data from a file

The POSIX methods provide a low-level API to many common file operations. For example, here we use a number of synchronous POSIX methods to write data to a file and then retrieve that data:

When it comes to reading and writing files, typically you won’t need a level this low, but rather can use a streaming or bulk approach.

6.1.2. Streaming

The fs module provides a streaming API with fs.createReadStream and fs.createWriteStream. fs.createReadStream is a Readable stream, whereas fs.createWriteStream is a Writeable. The streaming APIs can connect to other streams with pipe. For example, here’s a simple application that copies a file using streams:

File streaming is beneficial when you want to deal with bits and pieces of data at a time or want to chain data sources together. For a more in-depth look at streams, check out chapter 5.

6.1.3. Bulk file I/O

The file system API also includes a few bulk methods for reading (fs.readFile), writing (fs.writeFile), or appending (fs.appendFile).

The bulk methods are good when you want to load a file into memory or write one out completely in one shot:

6.1.4. File watching

The fs module also provides a couple of mechanisms for watching files ( and fs.watchFile). This is useful when you want to know if a file has changed in some way. uses the underlying operating system’s notifications, making it very efficient. But can be finicky or simply not work on network drives. For those situations, the less-efficient fs.watchFile method, which uses stat polling, can be used.

We’ll look more at file watching later on in this chapter.

6.1.5. Synchronous alternatives

Node’s synchronous file system API sticks out like a sore thumb. With a big Sync tacked onto the end of each synchronous method, it’s hard to miss its purpose. Synchronous methods are available for all the POSIX and bulk API calls. Some examples include readFileSync, statSync, and readdirSync. Sync tells you that this method will block your single-threaded Node process until it has finished. As a general rule, synchronous methods should be used when first setting up your application, and not within a callback:

Of course there are exceptions to the rule, but what’s important is understanding the performance implications of using synchronous methods.


Testing server performance

How do we know synchronous execution within the request handling of a web server is slower? A great way to test this is using ApacheBench ( Our earlier example showed a ~2x drop in performance when serving a 10 MB file synchronously on every request rather than cached during application setup. Here’s the command used in this test:

ab -n 1000 -c 100 "http://localhost:3000"


With our quick overview out of the way, we’re now ready to get into some of the techniques you’ll use when working with the file system.

Technique 39 Loading configuration files

Keeping configuration in a separate file can be handy, especially for applications that run in multiple environments (like development, staging, and production). In this technique, you’ll learn the ins and outs of how to load configuration files.


Your application stores configuration in a separate file and it depends on having that configuration when it starts up.


Use a synchronous file system method to pull in the configuration on initial setup of your application.


A common use of synchronous APIs is for loading configuration or other data used in the application on startup. Let’s say we have a simple configuration file stored as JSON that looks like the following:


  "site title": "My Site",

  "site base url": "",

  "google maps key": "92asdfase8230232138asdfasd",

  "site aliases": [ "", "" ]


Let’s first look at how we could do this asynchronously so you can see the difference. For example, say doThisThing depends on information from our configuration file. Asynchronously we could write it this way:

This will work and may be desirable for some setups, but will also have the effect of having everything that depends on the configuration nested in one level. This can get ugly. By using a synchronous version, we can handle things more succinctly:

One of the characteristics of using Sync methods is that whenever an error occurs, it will be thrown:


A note about require

We can require JSON files as modules in Node, so our code could even be shortened further:

var config = require('./config.json');


But there’s one caveat with this approach. Modules are cached globally in Node, so if we have another file that also requires config.json and we modify it, it’s modified everywhere that module is used in our application. Therefore, using readFileSync is recommended when you want to tamper with the objects. If you choose to use require instead, treat the object as frozen (read-only); otherwise you can end up with hard-to-track bugs. You can explicitly freeze an object by using Object.freeze.


This is different from asynchronous methods, which use an error argument as the first parameter of the callback:

In our example of loading a configuration file, we prefer to crash the application since it can’t function without that file, but sometimes you may want to handle synchronous errors.

Technique 40 Using file descriptors

Working with file descriptors can be confusing at first if you haven’t dealt with them. This technique serves as an introduction and shows some examples of how you use them in Node.


You want to access a file descriptor to do writes or reads.


Use Node’s fs file descriptor methods.


File descriptors (FDs) are integers (indexes) associated with open files within a process managed by the operating system. As a process opens files, the operating system keeps track of these open files by assigning each a unique integer that it can then use to look up more information about the file.

Although it has file in the name, it covers more than just regular files. File descriptors can point to directories, pipes, network sockets, and regular files, to name a few. Node can get at these low-level bits. Most processes have a standard set of file descriptors, as shown in table 6.2.

Table 6.2. Common file descriptors


File descriptor




Standard input



Standard output



Standard error

In Node, we typically are used to the console.log sugar when we want to write to stdout:

console.log('Logging to stdout')

If we use the stream objects available on the process global, we can accomplish the same thing more explicitly:

process.stdout.write('Logging to stdout')

But there’s another, far less used way to write to stdout using the fs module. The fs module contains a number of methods that take an FD as its first argument. We can write to file descriptor 1 (or stdout) using fs.writeSync:

fs.writeSync(1, 'Logging to stdout')


Synchronous logging

console.log and process.stdout.write are actually synchronous methods under the hood, provided the TTY is a file stream


A file descriptor is returned from the open and openSync calls as a number:

There are a variety of methods that deal with file descriptors specified in the file system documentation.

Typically more interesting uses of file descriptors happen when you’re inheriting from a parent process or spawning a child process where descriptors are shared or passed. We’ll discuss this more when we look at child processes in a later chapter.

Technique 41 Working with file locking

File locking is helpful when cooperating processes need access to a common file where the integrity of the file is maintained and data isn’t lost. In this technique, we’ll explore how to write your own file locking module.


You want to lock a file to prevent processes from tampering with it.


Set up a file-locking mechanism using Node’s built-ins.


In a single-threaded Node process, file locking is typically something you won’t need to worry about. But you may have situations where other processes are accessing the same file, or a cluster of Node processes are accessing the same file.

In these cases, there’s the possibility that races and data loss may occur (more about this at Most operating systems provide mandatory locks (those enforced at a kernel level) and advisory locks (not enforced; these only work if processes involved subscribe to the same locking scheme). Advisory locks are generally preferred if possible, as mandatory locks are heavy handed and may be difficult to unlock (


File Locking with Third-Party Modules

Node has no built-in support for locking a file directly (either mandatory or advisory). But advisory locking of files can be done using syscalls such as flock (, which is available in a third-party module (


Instead of locking a file directly with something like flock, you can use a lockfile. Lockfiles are ordinary files or directories whose existence indicates some other resource is currently in use and not to be tampered with. The creation of a lockfile needs to be atomic (no races) to avoid collisions. Being advisory, all the participating processes would have to play by the same rules agreed on when the lockfile is present. This is illustrated in figure 6.1.

Figure 6.1. Advisory locking using a lockfile between cooperating processes

Let’s say we had a file called config.json that could potentially be updated by any number of processes at any time. To avoid data loss or corruption, a config.lock file could be created by the process making the updates and removed when the process is finished. Each process would agree to check for the existence of the lockfile before making any updates.

Node provides a few ways to perform this out of the box. We’ll look at a couple of options:

·        Creating a lockfile using the exclusive flag

·        Creating a lockfile using mkdir

Let’s look at using the exclusive flag first.

Creating lockfiles using the exclusive flag

The fs module provides an x flag for any methods that involve opening a file (like fs.writeFile, fs.createWriteStream, and This flag tells the operating system the file should be opened in an exclusive mode (O_EXCL). When used, the file will fail to open if it already exists:


Flag combinations when opening files

There are a variety of flag combinations you can pass when opening files; for a list of all of them consult the documentation:


We want to fail if another process has already created a lockfile. We fail because we don’t want to tamper with the resource behind the lockfile while another process is using it. Thus, having the exclusive flag mechanism turns out to be useful in our case. But instead of writing an empty file, it’s a good idea to throw the PID (process ID) inside of this file so if something bad happens, we’ll know what process had the lock last:

Creating lockfiles with mkdir

Exclusive mode may not work well if the lockfile exists on a network drive, since some systems don’t honor the O_EXCL flag on network drives. To circumvent this, another strategy is creating a lockfile as a directory. mkdir is an atomic operation (no races), has excellent cross-platform support, and works well with network drives. mkdir will fail if a directory exists. In this case, the PID could be stored as a file inside of that directory:

Making a lockfile module

So far we’ve discussed a couple ways to create lockfiles. We also need a mechanism to remove them when we’re done. In addition, to be good lockfile citizens, we should remove any lockfiles created whenever our process exits. A lot of this functionality can be wrapped up in a simple module:

Here’s an example usage:

For a more full-featured implementation using exclusive mode, check out the lockfile third-party module (

Technique 42 Recursive file operations

Ever need to remove a directory and all subdirectories (akin to rm -rf)? Create a directory and any intermediate directories given a path? Search a directory tree for a particular file? Recursive file operations are helpful and hard to get right, especially when done asynchronously. But understanding how to perform them is a good exercise in mastering evented programming with Node. In this technique, we’ll dive into recursive file operations by creating a module for searching a directory tree.


You want to search for a file within a directory tree.


Use recursion and combine file system primitives.


When a task spans multiple directories, things become more interesting, especially in an asynchronous world. You can mimic the command-line functionality of mkdir with a single call to fs.mkdir, but for fancier things like mkdir -p (helpful for creating intermediate directories), you have to think recursively. This means the solution to our problem will depend on “solutions to smaller instances of the same problem” (“Recursion (computer science)”:

In our example we’ll write a finder module. Our finder module will recursively look for matching files at a given start path (akin to find/start/path -name='file-in-question') and provide the paths to those files in an array.

Let’s say we had the following directory tree:

A search for the pattern /file.*/ from the root would give us the following:

[ 'dir-a/dir-b/dir-c/file-e.png',




  'dir-a/file-b.txt' ]

So how do we build this? To start, the fs module gives us some primitives we’ll need:

·        fs.readdir/fs.readdirSync —List all the files (including directories), given a path.

·        fs.stat/fs.statSync —Give us information about a file at the specified path, including whether the path is a directory.

Our module will expose synchronous (findSync) and asynchronous (find) implementations. findSync will block execution like other Sync methods, will be slightly faster than its asynchronous counterpart, and may fail on excessively large directory trees (since JavaScript doesn’t have proper tail calls yet:


Why are synchronous functions slightly faster?

Synchronous functions aren’t deferred until later, even though the asynchronous counterparts happen very quickly. Synchronous functions happen right away while you’re already on the CPU and you’re guaranteed to wait only exactly as long as necessary for the I/O to complete. But synchronous functions will block other things from happening during the wait period.


On the other hand, find will be slightly slower, but won’t fail on large trees (since the stack is regularly cleared due to the calls being asynchronous). find won’t block execution.

Let’s take a look at the code for findSync first:

Since everything is synchronous, we can use return at the end to get all our results, as it’ll never reach there until all the recursion has finished. The first error to occur would throw and could be caught, if desired, in a try/catch block. Let’s look at a sample usage:

Let’s switch now and take a look at how to tackle this problem asynchronously with the find implementation:

We can’t just return our results, like in the synchronous version; we need to call back with them when we know we’re finished. To know that we’re finished, we use a counter (asyncOps). We also have to be aware whenever we have callbacks to ensure we have a closure around any variables we expect to have around when any asynchronous call completes (this is why we switched from a standard for loop to a forEach call—more about this at

Our counter (asyncOps) increments right before we do an asynchronous operation (like fs.readdir or fs.stat). The counter decrements in the callback for the asynchronous operation. Specifically it decrements after any other asynchronous calls have been made (otherwise we’ll get back to 0 too soon). In a successful scenario, asyncOps will reach 0 when all the recursive asynchronous work has completed, and we can call back with the results (if (asyncOps == 0) cb(null, results)). In a failure scenario, asyncOps will never reach 0, and one of the error handlers would’ve been triggered and have already called back with the error.

Also, in our example, we can’t be sure that fs.stat will be the last thing to be called, since we may have a directory with no files in our chain, so we check at both spots. We also have a simple error wrapper to ensure we never call back with more than one error. If your asynchronous operation returns one value like in our example or one error, it’s important to ensure you’ll never call the callback more than once, as it leads to hard-to-track bugs later down the road.


Alternatives to counters

The counter isn’t the only mechanism that can track the completion of a set of asynchronous operations. Depending on the requirements of the application, recursively passing the original callback could work. For an example look at the third-party mkdirp module (


Now we have an asynchronous version (find) and can handle the result of that operation with the standard Node-style callback signature:

var finder = require('./finder');

finder.find(/file*/, '/path/to/root', function (err, results) {

  if (err) return console.error(err);




Third-party solutions to parallel operations

Parallel operations can be hard to keep track of, and can easily become bug-prone, so you may want to use a third-party library like async ( to help. Another alternative is using a promises library like Q (


Technique 43 Writing a file database

Node’s core fs module gives you the tools to build complexity like the recursive operations you saw in the last technique. It also enables you to do other complex tasks such as creating a file database. In this technique we’ll write a file database in order to look at other pieces in the fsmodule, including streaming, working together.


You want a simple and fast data storage structure with some consistency guarantees.


Use an in-memory database with append-only journaling.


We’ll write a simple key/value database module. The database will provide in-memory access to the current state for speed and use an append-only storage format on disk for persistence. Using append-only storage will provide us the following:

·        Efficient disk I/O performance —We’re always writing to the end of the file.

·        Durability —The previous state of the file is never changed in any way.

·        A simple way to create backups —We can just copy the file at any point to get the state of the database at that point.

Each line in the file is a record. The record is simply a JSON object with two properties, a key and a value. A key is a string representing a lookup for the value. The value can be anything JSON-serializable, which includes strings and numbers. Let’s look at some sample records:




{"key":"d","value":"a string"}

If a record is updated, a new version of the record will be found later in the file with the same key:

{"key":"d","value":"an updated string"}

If a record has been removed, it’ll also be found later in the file with a null value:


When the database is loaded, the journal will be streamed in from top to bottom, building the current state of the database in memory. Remember, data isn’t deleted, so it’s possible to store the following data:

{"key":"c","value":"my first value"}





In this case, at some point we saved "my first value" as the key c. Later on we deleted the key. Then, most recently, we set the key to be {"my":"object"}. The most recent entry will be loaded in memory, as it represents the current state of the database.

We talked about how data will be persisted to the file system. Let’s talk about the API we’ll expose next:

Let’s dive into the code to start putting this together. We’ll write a Database module to store our logic. It’ll inherit from EventEmitter so we can emit events back to the consumer (like when the database has loaded all its data and we can start using it):

We want to stream the data stored and emit a “load” event when that’s completed. Streaming will enable us to handle data as it’s being read in. Streaming also is asynchronous, allowing the host application to do other things while the data is being loaded:

As we read in data from the file, we find all the complete records that exist.


Structuring our writes to structure our reads

What do we do with the data we just pop()ed the last time a readable event is triggered? The last record turns out to always be an empty string ('') because we end each line with a newline (\n) character.


Once we’ve loaded the data and emitted the load event, a client can start interacting with the data. Let’s look at those methods next, starting with the simplest—the get method:

Let’s look at storing updates next:

Now we add some sugar for deleting a key:

There we have a simple database module. Last thing: we need to export the constructor:

module.exports = Database;

There are various improvements that could be made on this module, like flushing writes ( or retrying on failure. For examples of more full-featured Node-based database modules, check out node-dirty ( or nstore(

Technique 44 Watching files and directories

Ever need to process a file when a client adds one to a directory (through FTP, for instance) or reload a web server after a file is modified? You can do both by watching for file changes.

Node has two implementations for file watching. We’ll talk about both in this technique in order to understand when to use one or the other. But at the core, they enable the same thing: watching files (and directories).


You want to watch a file or directory and perform an action when a change is made.


Use and fs.watchFile.


It’s rare to see multiple implementations for the same purpose in Node core. Node’s documentation recommends that you prefer over fs.watchFile if possible, as it’s considered more reliable. But isn’t consistent across operating systems, whereas fs.watchFileis. Why the madness?

The story about

Node’s event loop taps into the operating system in order to juggle asynchronous I/O in its single-threaded environment. This also provides a performance benefit, as the OS can let the process know immediately when some new piece of I/O is ready to be handled. Operating systems have different ways of notifying a process about events (that’s why we have libuv). The culmination of that work for file watching is the method. combines all these different types of event systems into one method with a common API to provide the following:

·        A more reliable implementation in terms of file change events always getting fired

·        A faster implementation, as notifications get passed to Node immediately when they occur

Let’s look at the older method next.

The story about fs.watchFile

There’s another, older implementation of file watching called fs.watchFile. It doesn’t tap into the notification system but instead polls on an interval to see if changes have occurred.

fs.watchFile isn’t as full-fledged in the changes it can detect, nor as fast. But the advantage of using fs.watchFile is that it’s consistent across platforms and it works more reliably on network file systems (like SMB and NFS).

Which one is right for me?

The preferred is, but since it’s inconsistent across platforms, it’s a good idea to test whether it does what you want (and better to have a test suite).

Let’s write a program to help us play around file watching and see what each API provides. First, create a file called watcher.js with the following contents:

var fs = require('fs');'./watchdir', console.log);

fs.watchFile('./watchdir', console.log);

Now create a directory called watchdir in the same directory as your watcher.js file:

mkdir watchdir

Then, open a couple terminals. In the first terminal, run

node watcher

and in the second terminal, change to watchdir:

cd watchdir

With your two terminals open (preferably side by side), we’ll make changes in watchdir and see Node pick them up. Let’s create a new file:

touch file.js

We can see the Node output:

All right, so now we have a file created; let’s update its modification time with the same command:

touch file.js

Now when we look at our Node output, we see that only picked up this change:

change file.js

So if using touch to update a file when watching a directory is important to your application, has support.


fs.watchFile and directories

Many updates to files while watching a directory won’t be picked up by fs.watchFile. If you want to get this behavior with fs.watchFile, watch the individual file.


Let’s try moving our file:

mv file.js moved.js

In our Node terminal, we see the following output indicating both APIs picked up the change:

The main point here is to test the APIs using the exact use case you want to utilize. Hopefully, this API will get more stable in the future. Read the documentation to get the latest development ( Here are some tips to help navigate:

·        Run your test case, preferring Are events getting triggered as you expect them to be?

·        If you intend to watch a single file, don’t watch the directory it’s in; you may end up with more events being triggered.

·        If comparing file stats is important between changes, fs.watchFile provides that out of the box. Otherwise, you’ll need to manage stats manually using

·        Just because works on your Mac doesn’t mean it will work exactly the same way on your Linux server. Ensure development and production environments are tested for the desired functionality.

Go forth and watch wisely!

6.2. Summary

In this chapter we talked through a number of techniques using the fs module. We covered asynchronous and synchronous usage while looking at configuration file loading and recursive file handling. We also looked at file descriptors and file locking. Lastly we implemented a file database.

Hopefully this has expanded your understanding of some of the concepts possible with using the fs module. Here are a few takeaways:

·        Synchronous methods can be a nicer, simpler way to do things over their asynchronous counterparts, but beware of the performance issues, especially if you’re writing a server.

·        Advisory file locking is a helpful mechanism for resources shared across multiple processes as long as all processes follow the same contract.

·        Parallel asynchronous operations that require some sort of response after completion need to be tracked. Although it’s helpful to understand how to use counters or recursive techniques, consider using a well-tested third-party module like async.

·        Look at how you’ll use a particular file to determine which course of action to follow. If it’s a large file or can be dealt with in chunks, consider using a streaming approach. If it’s a smaller file or something you can’t use until you have the entire file loaded, consider a bulk method. If you want to change a particular part of a file, you probably want to stick with the POSIX file methods.

In the next chapter we’ll look at the other main form of I/O in Node: networking.