Node.js in Practice (2015)

Part 2. Real-world recipes

Chapter 11. Debugging: Designing for introspection and resolving issues

This chapter covers

·        Handling uncaught exceptions

·        Linting Node applications

·        Using debugging tools

·        Profiling applications and investigating memory leaks

·        Using a REPL to investigate a running process

·        Tracing system calls

Understanding how errors are generated and handled in any given platform is paramount to building stable applications. Good error introspection and tests that are built-in are the best offense for debugging issues later on. In this chapter we focus on how to prepare for and what to do when things go south.

Maybe your process keeps crashing or it’s using more memory than you expected. Perhaps it’s stuck at 100% CPU usage. We’ll look at debugging solutions for these and other problems you may encounter in your Node applications.

In the first part we’ll cover Node application design for error handling and detection. In the second half we’ll look at debugging specific types of problems.

11.1. Designing for introspection

When we design applications, we need to be thinking about how we’ll handle errors. Relevant error logging and intervention takes thought. It also takes a good understanding of where errors can occur in order to trap them. Throughout this book, we’ve covered various forms of errors that can happen in Node applications. Let’s cover all the types here.

11.1.1. Explicit exceptions

Explicit exceptions are those explicitly triggered by the throw keyword. They clearly indicate that something has gone wrong:

function formatName (name) {

  if (!name) throw new Error("name is required");

  ...

}

Explicit exceptions are handled by a try/catch block:

try {

  formatName();

} catch (err) {

  console.log(err.message, err.stack);

}

If you throw your own exceptions, keep these guidelines in mind:

·        throw should be used only in synchronous functions; or in some cases, it makes sense before the asynchronous action has occurred in asynchronous functions (like API misuse).

·        Always throw an Error object or something that inherits from Error. Using simple strings (like throw "Oh no!") won’t generate a stack trace, so you’ll have no information as to where the error occurred.

·        Don’t throw inside Node-style callback functions; nothing exists on the stack to catch it! Instead, deal directly with the error or pass the error off to another function that can properly handle the error.

 

Regaining throw

You can regain the use of throw for asynchronous blocks if the structures support it; some notable ones are domains, promises, or generators.

 

11.1.2. Implicit exceptions

Implicit exceptions are any runtime JavaScript errors not triggered by the throw keyword. Unfortunately, these exceptions can sneak into our code too easily.

One common implicit exception is ReferenceError, which is caused when a reference to a variable or property can’t be found.

Here, we see an innocent misspelling of data causes an exception:

function (err, data) {

  res.write(dat); // ReferenceError: dat is not defined

}

Another common implicit exception is SyntaxError, most famously triggered using JSON.parse on invalid JSON data:

JSON.parse("undefined"); // SyntaxError: Unexpected token u

It’s a good idea to wrap JSON.parse with a try/catch block, especially if you aren’t in control of the input JSON data.

 

Catch implicit exceptions early

A great way to catch implicit exceptions early is to utilize linting tools like JSHint or JSLint. Adding them to your build process helps keep your code in check. We’ll talk more on subject this later in the chapter.

 

11.1.3. The error event

The error event can be emitted from any EventEmitter in Node. If left unhandled, Node will throw the error. These events can be the most difficult to debug if not handled, since many times they’re triggered during asynchronous operations like streaming data where the call stack is minimal:

var EventEmitter = require('events').EventEmitter;

var ee = new EventEmitter();

ee.emit('error', new Error('No handler to catch me'));

This will output the following:

events.js:72

  throw er; // Unhandled 'error' event

 ^

Error: No handler to catch me

  at Object.<anonymous> (/debugging/domain/ee.js:5:18)

  at Module._compile (module.js:456:26)

  at Object.Module._extensions..js (module.js:474:10)

  at Module.load (module.js:356:32)

  at Function.Module._load (module.js:312:12)

  at Function.Module.runMain (module.js:497:10)

  at startup (node.js:119:16)

  at node.js:902:3

Luckily, we know where this error came from; we just wrote the code, after all! But in larger applications, we may have errors triggered at the DNS layer and we have no idea which module utilizing DNS just had a problem.

So, when possible, handle error events:

ee.on('error', function (err) {

  console.error(err.message, err.stack);

});

When writing your own EventEmitters, do yourself and your API consumers a favor and give them context to any errors in your dependencies that you’re propagating upward. Also, use Error objects over plain strings when emitting errors so a stack trace can be found.

11.1.4. The error argument

Errors that occur during an asynchronous operation are provided as the first argument in a callback function. Unlike the previous errors we’ve talked about, these never cause exceptions directly. But they can be the source of many implicit exceptions:

fs.readFile('/myfile.txt', function (err, buf) {

  var data = buf.toString();

  ...

});

Here, we ignore the error returned from readFile, perhaps assuming we’ll always have a buffer of the file data to continue working with. Unfortunately, the day comes when we can’t read the file and we have a ReferenceError because buf is not defined.

It’s more robust to just handle the asynchronous errors. A lot of times this can mean simply passing the error to another function that can gracefully handle the error:

function handleError (err) {

  console.error('Failed:', err.message, err.stack);

}

fs.readFile('/myfile.txt', function (err, buf) {

  if (err) return handleError(err);

  var data = buf.toString();

  ...

});

Handling each of these four types of errors effectively will give you much better data to work with when you’re debugging issues in the future!

Even with our best efforts and tooling, though, we can still miss exceptions and have a crashed server on our hands. Let’s look at designing our applications to handle these situations so we can quickly address and fix uncaught exceptions.

Technique 88 Handling uncaught exceptions

How do you effectively handle Node crashes? One of the first things you discover when working with Node is that it terminates a process whenever an exception is uncaught. It’s important to understand why this behavior exists and how you handle uncaught exceptions in order to build robustness into your programs.

Problem

You have an uncaught exception taking down your process.

Solution

Log the exception, and shut down gracefully.

Discussion

Sometimes exceptions go uncaught. When this happens, Node by default will terminate the process. There’s a good reason for this, which we’ll come back to, but let’s first talk about how we can change this default behavior.

With an uncaughtException handler set on the process object, Node will execute the handler instead of terminating your program:

process.on('uncaughtException', function (err) {

  console.error(err);

});

Yeah! Now your Node application will never crash! Although it’s true that exceptions won’t take down your process anymore, the drawbacks of leaving the Node program running will most likely outweigh the benefits. If you choose to keep the application running, the application could leak resources and possibly become unstable.

How does that happen? Let’s look at an example of an application we intend to run for a long time: a web server. We won’t allow Node to terminate the process by adding an uncaughtException handler that just logs the error. What do you think will happen when we have an uncaught exception while we’re handling a user’s request?

When a request comes in, an exception is thrown and then caught by the uncaught-Exception handler. What happens to the request? It is leaked, as that connection will remain open until the client times out (we also no longer have access to res to give a response back).

In figure 11.1, you can see an illustration of this leak happening. If we had no exception, we’d be fine, but since we had an exception, we leaked a resource.

Figure 11.1. Leaking resources when using uncaughtException

Although this example is simplified to be clear, uncaught exceptions are a reality. Most of the time it will be open handles to sockets or files that aren’t able to be closed properly. Uncaught exceptions are usually buried much deeper in the code, which makes determining what resources are being leaked even harder.

State can also be affected, since an uncaught exception takes you out of your current context and places you in a completely different context (the uncaughtException handler) where you don’t have references to objects in order to clean things up. In our example, we didn’t have access to the res object in order to send a response back to the client.

So what good is the uncaughtException handler? It enables your application to log and restart gracefully. It’s wise to treat an uncaughtException handler as a last chance to say your goodbyes before going down. Write out the error, perhaps send an email or do some other notification, and then gracefully kill the app:

The uncaughtException handler is a last defense. Ideally exceptions should be handled closer to the source so action can be taken to prevent leaks and instability. For that, you can use domains.

Using domains for uncaught exceptions

Whereas an uncaughtException casts a net over the entirety of your application code base to catch errors, domains allow you to control the portions of code that are monitored by the domain and handle exceptions closer to the source (more on this back in chapter 4). Let’s implement the same uncaughtException example from earlier, but using domains instead:

Using domains allowed us to sandbox our server code and still have access to the res object to give the user a response, which is an improvement on the previous example. But even though we’re able to give the user a response and close the connection, it’s still best practice to close out the process.

If you utilize domains, it’s not a bad idea to keep an uncaughtException handler as a catchall for those cases where an error slips by one of your domains, or your domain’s error handler throws an exception where no other domain is there to catch it.

Let’s switch to a helpful way to build introspection into your application and prevent errors before they happen: linting!

Technique 89 Linting Node applications

Lint tools can help catch a multitude of application errors when properly tuned. In our previous example, we misspelled res, which led to an uncaught exception. Although the code was valid JavaScript, we had an undefined variable being accessed. A lint tool would’ve caught this error.

Problem

You want to catch potential coding errors and exceptions.

Solution

Use a lint tool.

Discussion

Let’s talk about setting up an application to use JSHint with Node. JSHint is an actively maintained lint tool that includes a number of customizable options for JavaScript code bases.

First, we assume you already have a package.json file set up (if not: npm init) for your project. Next, let’s add jshint to our development dependencies:

npm install jshint --save-dev

Now let’s configure JSHint so it knows what it’s working with. We just throw a .jshintrc file—which takes a JSON configuration—in our project root. Let’s look at a basic configuration for Node projects:

JSHint has a lot of options (http://jshint.com/docs/options/) that bend rules to match your coding style and intent, but these just shown are some good basic defaults.

To run JSHint, add the following line to the "scripts" block in your package.json file (if you don’t have a "scripts" block, just add one):

You can then run JSHint from your project root this way:

npm run lint

JSHint will give you output that tells you what errors it found, and you can either correct or update the options to better fit your coding style and intent. Similarly to tests, it’s helpful to have your build tools run the lint tools as you push code, since it’s easy to forget to run and can be automated away.

Now that we’ve looked at ways to prevent and effectively handle application errors, let’s switch over to look at tools we can use to debug issues when they occur.

11.2. Debugging issues

We have our tests, our logging, and our linting. How do we actually debug and fix issues when they occur? Thankfully, there are a number of tools for a number of different situations. In this section we’ll take a look at various and likely unrelated problems you can encounter when running your applications, and techniques to solve them. We’ll start with using debuggers and move on to profiling, memory leaks, production debugging, and tracing.

Technique 90 Using Node’s built-in debugger

Whenever you need step-by-step analysis of the state of your application, a debugger can be an invaluable tool, and Node’s built-in debugger is no exception. Node’s built-in tooling allows you to watch variables, pause execution through breakpoints, step in and out of parts of your application, see backtraces, run an interactive context-aware REPL, and more.

Unfortunately, many shy away from the command-line tool, as it may seem intimidating at first. We want to debunk that and show how powerful it can be by walking you through most of the functionality it has to offer.

Problem

You want to run a debugger to set breakpoints, watch variables, and step through your application.

Solution

Use node debug.

Discussion

Let’s take a simple program to debug in order to demonstrate some of the features of the debugger:

var a = 0;

function changeA () {

  a = 50;

}

function addToA (toAdd) {

  a += toAdd;

}

changeA();

addToA(25);

addToA(25);

To run the built-in debugging tool, simply use the debug command:

node debug myprogram

It will start the application with the debugger breaking on the first executable line:

< debugger listening on port 5858

connecting... ok

break in start.js:1

  1 var a = 0;

  2

  3 function changeA () {

debug>

To view all the available commands and debugging variables, you can type help:

debug> help

Commands: run (r), cont (c), next (n), step (s), out (o),

backtrace (bt), setBreakpoint (sb), clearBreakpoint (cb),

watch, unwatch, watchers, repl, restart, kill, list, scripts,

breakOnException, breakpoints, version

To continue from the default starting breakpoint, just type cont, or just c for short. Since we don’t have any other breakpoints, the application will terminate:

debug> cont

program terminated

debug>

But we’re still in the debugger and can restart the application again by using the run command (r for short):

debug> run

< debugger listening on port 5858

connecting... ok

break in start.js:1

  1 var a = 0;

  2

  3 function changeA () {

debug>

And we’re back in business. We can also restart the application with the restart command or manually kill the application with the kill command if we need to.

The application is all about the letter A, so let’s take a peek at how that changes as our application executes by making a watch expression for that. The watch function takes an expression to watch as an argument:

debug> watch('a')

We can view the state of all that we’re watching using the watchers command:

debug> watchers

  0: a = undefined

Currently we’re paused before the assignment to 0 has even been made, so we’re undefined. Let’s step into the next line with next(or n for short):

debug> next

break in start.js:11

Watchers:

  0: a = 0

  9 }

  10

  11 changeA();

  12 addToA(25);

  13 addToA(25);

debug>

Well, that’s convenient: the debugger outputs our watchers for us as we step to the next section. If we were to type watchers again, we’d see similar output:

debug> watchers

  0: a = 0

If we ever want to remove a watch expression, we can use the unwatch command given the same expression we used to create it.

By default, the debugger will print just a couple lines before and after to give a sense of context. But sometimes we want to see more of what’s going on. We can use the list command, giving it the number of lines around the current line where we’re paused:

debug> list(5)

  6

  7 function addToA (toAdd) {

  8   a += toAdd;

  9 }

  10

  11 changeA();

  12 addToA(25);

  13 addToA(25);

  14

  15 });

debug>

We’re currently at line 11, the changeA function. If we were to type next, we’d move to the next line, which is the addToA function, but let’s investigate our changeA function more by stepping into it. To do that we just use the step command (or s for short):

debug> step

break in start.js:4

Watchers:

  0: a = 0

  2

  3 function changeA () {

  4   a = 50;

  5 }

  6

debug>

Now that we’re in this function, we can step out of it at any time using the out command. We’ll automatically step out of it once we reach the end, so we can also use next; let’s try it:

debug> next

break in start.js:5

Watchers:

  0: a = 50

  3 function changeA () {

  4   a = 50;

  5 }

  6

  7 function addToA (toAdd) {

debug>

As you can see, our watchers updated to show that a is now 50. Let’s go to the next line:

debug> next

break in start.js:12

Watchers:

  0: a = 50

  10

  11 changeA();

  12 addToA(25);

  13 addToA(25);

  14

debug>

Now we’re back to the line after our changeA function. Let’s step into this next function again. Remember what command that was?

debug> step

break in start.js:8

Watchers:

  0: a = 50

  6

  7 function addToA (toAdd) {

  8   a += toAdd;

  9 }

  10

debug>

Let’s explore another neat aspect of the debugger: the built-in REPL! We can access it by using the repl command:

debug> repl

Press Ctrl + C to leave debug repl

This is a standard REPL that’s aware of the context that surrounds it when you used the repl command. So we can, for instance, output the value of the toAdd argument:

> toAdd

25

We can also introduce state into the application. Let’s create a global b variable:

> b = 100100

In many ways, this behaves just like the standard Node REPL, so a lot of what you can do there, you can do here.

You can exit the REPL mode at any time with Ctrl-C. Let’s do that now. You’ll know you’ve exited because you’ll get your debug prompt back:

debug>

We were in a REPL for a while, so we likely lost context when we were paused. Let’s use list again to get our bearings:

debug> list()

  3 function changeA () {

  4   a = 50;

  5 }

  6

  7 function addToA (toAdd) {

  8 a += toAdd;

  9 }

  10

  11 changeA();

  12 addToA(25);

  13 addToA(25);

Ah yes, that’s right, we were on line 8. Well, you know what, we really wanted the changeA function to assign a to 100. It’s such a nice number to accompany such a nice letter! But we forgot to do that when we started the debugger. No problem! We can set a breakpoint here to save our spot by using the setBreakpoint function (or sb for short):

debug> setBreakpoint()

  3 function changeA () {

  4   a = 50;

  5 }

  6

  7 function addToA (toAdd) {

 *8   a += toAdd;

  9 }

  10

  11 changeA();

  12 addToA(25);

  13 addToA(25);

debug>

Note that our line 8 now has a star (*) next to it indicating we have a breakpoint set there. Let’s change that function in our code file and save it:

function changeA () {

  a = 100;

}

Back in our debugger, we can restart the app:

debug> restart

program terminated<

debugger listening on port 5858

connecting... ok

Restoring breakpoint debug.js:8

break in start.js:1

  1 var a = 0;

  2

  3 function changeA () {

debug>

Looks like our program was restarted and the breakpoint we set is still intact. Did it get our changes? Let’s see:

debug> list(20)

  1 var a = 0;

  2

  3 function changeA () {

  4   a = 100;

  5 }

  6

  7 function addToA (toAdd) {

  8   a += toAdd;

  9 }

  10

  11 changeA();

  12 addToA(25);

  13 addToA(25);

  14

  15 });

debug>

Another way we can set breakpoints right from our application code is to use the debugger keyword:

function changeA () {

  debugger;

  a = 100;

}

If we restart our application again, we’ll always stop on any debugger lines. We can clear breakpoints as well using clearBreakpoint (or cb for short).

Let’s look at one more topic: uncaught exceptions. Let’s introduce a nasty ReferenceError in our changeA function:

function changeA () {

  a = 100;

  foo = bar;

}

If we restart our application using restart and then cont to skip the initial breakpoint, our application will crash due to an uncaught exception. We can break on these exceptions instead using breakOnException:

debug> breakOnException

debug>

Now, instead of crashing, we’ll break first, allowing us to inspect the state of the application and use the REPL before the program terminates.

 

Helpful multifile debugger commands

This scenario only looked at a single file that included no other modules. The debugger also has a couple of commands that are helpful when you’re within multiple files. Use backtrace (or bt for short) to get a call stack on whatever line you’re currently paused at. You can also usescripts to get a list of loaded files and an indicator of what file you’re currently in.

 

The built-in debugger may feel odd at first if you’re used to a GUI tool for debugging applications, but it’s actually pretty versatile once you get the hang of it! Just throw a quick debugger statement where you’re working and fire it up.

Technique 91 Using Node Inspector

Want to do everything you can with the built-in debugger, but using the Chrome DevTools interface instead? There’s a module for that! It’s called node-inspector. In this technique we’ll look at how to set it up and start debugging.

Problem

You want to debug a Node application using Chrome DevTools.

Solution

Use node-inspector.

Discussion

Node allows remote debugging by exposing a debugging port that third-party modules and tools can hook into (including the built-in debugger). One popular module is node-inspector, which ties in debugging information from Node into the Chrome DevTools interface.

To set up node-inspector, simply install it:

npm install node-inspector -g

Don’t forget the -g flag to install it globally. Once you have it, you can fire it up by running the following command:

node-inspector

Now node-inspector is ready to roll and will tell you where to reach it:

$ node-inspector

Node Inspector v0.7.0-2

  info - socket.io started

Visit http://127.0.0.1:8080/debug?port=5858 to start debugging.

You can then visit that URL in any Blink-enabled browser like Chrome or Opera. But we don’t have any Node program that has an open debugging port to start debugging, so we receive an error message, as shown in figure 11.2.

Figure 11.2. Error screen when no debugging agent is found

Let’s leave that running for now and write a little application to debug:

var http = require('http');

var server = http.createServer();

server.on('request', function (req, res) {

  res.end('Hello World');

});

server.listen(3000);

Now we can run this application exposing the debugging port:

$ node --debug test.js

debugger listening on port 5858

Our application now lets us know that the debugger is listening on port 5858. If we refresh our Node inspector web page, it’ll look more interesting, as shown in figure 11.3.

Figure 11.3. Node inspector connected to the debugger

We can use the inspector much like the built-in debugger to set breakpoints and watch expressions. It also includes a console that’s similar to the REPL to allow you to poke around at the state of your application while it’s paused.

One difference between node-inspector and the built-in debugger is that Node doesn’t automatically break on the first expression. To enable that, you have to use the --debug-brk flag:

node --debug-brk test.js

This tells the debugger to break on the first line until the inspector can step through or continue execution. If we reload the inspector, we can see it’s paused on the first line, as shown in figure 11.4.

Figure 11.4. Using the --debug-brk flag

node-inspector is continually being developed to support more of Chrome DevTools’ functionality.

We’ve looked at two ways to use debugging tools in Node: the command-line debugger and node-inspector. Now, let’s switch to another tool for resolving performance-related issues: the profiler.

Technique 92 Profiling Node applications

Profiling aims to answer this question: Where is my application spending its time? For instance, you may have a long-running web server that gets stuck at 100% CPU usage when you hit a particular route. At first glance, you might view the various functions that touch that route to see if anything stands out, or you could run a profiler and let Node tell you where it’s stuck. In this technique you’ll learn how to use the profiler and interpret the results.

Problem

You want to find out where your application is spending its time.

Solution

Use node --prof.

Discussion

Node taps into the underlying V8 statistical profiler by the use of the --prof command-line flag. It’s important to understand how it works in order to interpret the data.

Every two milliseconds, the profiler looks at the running application and records the function executing at that moment. The function may be a JavaScript function, but it also can come from C++, shared libraries, or V8 garbage collection. The profiler writes these “ticks” to a file namedv8.log, where they’re then processed by a special V8 tick-processor program.

Let’s look at a simple application to see how this works. Here we have an application doing two different things—running a slower computational task every two seconds, and running a quicker I/O task more often:

function makeLoad () {

  for (var i=0;i<100000000000;i++);

}

function logSomething () {

  console.log('something');

}

setInterval(makeLoad, 2000);

setInterval(logSomething, 0);

We can profile this application like so:

node --prof profile-test.js

If we let it run for 10 seconds or so and kill it, we’ll get a v8.log in that same directory. The log isn’t too helpful by itself. Let’s process the log by using the V8 tick-processor tools. Those tools require that you build V8 from source on your machine, but there’s a handy third-party module that allows you to skip that. Just run the following command to install:

npm install tick -g

This will install the appropriate tick processor for your operating system in order to view the data. You can then run the following command in the same directory as your v8.log file to get some more helpful output:

node-tick-processor

You’ll get output that looks similar to the following (abbreviated to show structure):

Statistical profiling result from v8.log,

  (6404 ticks, 1 unaccounted, 0 excluded).

[Unknown]:

  ticks total nonlib name

      1  0.0%

[Shared libraries]:

  ticks total nonlib name

   4100 64.0%   0.0% /usr/lib/system/libsystem_kernel.dylib

   211   3.3%   0.0% /Users/wavded/.nvm/v0.10.24/bin/node

  ...

[JavaScript]:

  ticks total nonlib name

   1997 31.2%  96.4% LazyCompile: *makeLoad profile-test.js:1

   7     0.1%   0.3% LazyCompile: listOnTimeout timers.js:77

   5     0.1%   0.2% RegExp: %[sdj%]

  ...

[C++]:

  ticks total nonlib name

[GC]:

  ticks total nonlib name

      1  0.0%

[Bottom up (heavy) profile]:

  Note: percentage shows a share of a particular caller in

  the total amount of its parent calls.

  Callers occupying less than 2.0% are not shown.

  ticks parent name

  4100   64.0% /usr/lib/system/libsystem_kernel.dylib

  1997   31.2% LazyCompile: *makeLoad profile-test.js:1

  1997  100.0%   LazyCompile: ~wrapper timers.js:251

  1997  100.0%     LazyCompile: listOnTimeout timers.js:77

Let’s look at what each section means:

·        Unknown —For that tick, the profiler couldn’t find a meaningful function attached to the pointer. These are noted in the output but aren’t much help beyond that and can be safely ignored.

·        Shared libraries —These are usually underlying C++/C shared libraries; a lot of the I/O stuff happens here as well.

·        JavaScript —This is typically the most interesting part; it includes your application code as well as Node and V8 internal native JavaScript code.

·        C++ —This is C++ code in V8.

·        GC —This is the V8 garbage collector.

·        Bottom up (heavy) profile —This shows a more detailed stack for the highest hitters found by the profiler.

In our case, we can see that *makeLoad is the hottest JavaScript function, with 1997 ticks accounted for:

[JavaScript]:

  ticks total nonlib name

   1997 31.2%  96.4% LazyCompile: *makeLoad profile-test.js:1

   7     0.1%   0.3% LazyCompile: listOnTimeout timers.js:77

   5     0.1%   0.2% RegExp: %[sdj%]

This makes sense since it has some heavy computation. Another interesting section to note is RegExp: %[sdj%], which is used by util.format, which is used by console.log.

The profiler’s job is to show you what functions are running most often. It doesn’t necessarily mean that the function is slow, but it does mean either a lot happens in the function or it’s called often. The results should serve as clues to help you understand what can be done to improve the performance. In some cases it may be surprising to find out certain functions are running hot; other times it may be expected. Profiling serves as one piece of the puzzle to help solve performance-related issues.

Another potential source of performance-related issues is memory leaks, although, obviously they’re first a memory concern that may have performance ramifications. Let’s look at handling memory leaks next.

Technique 93 Debugging memory leaks

Before the days of Ajax and Node, there wasn’t much effort put into debugging JavaScript memory leaks, since page views were short-lived. But memory leaks can happen, especially in Node programs where a server process is expected to stay up and running for days, weeks, or months. How do you debug a leaking application? We’ll look at a technique that works locally or in production.

Problem

You want to debug a program leaking memory.

Solution

Use heapdump and Chrome DevTools.

Discussion

Let’s write a leaky application to demonstrate how to use a couple of tools to debug a memory leak. Let’s make a leak.js program:

How do we know this application is growing in memory? We could sit and watch top or some other process-monitoring application. We can also test it by logging the memory used. To get an accurate read, let’s force a garbage collection before logging out the memory usage. Let’s add the following code to our leak.js file:

setInterval(function () {

  gc();

  console.log(process.memoryUsage());

}, 10000)

In order to use the gc() function, we need to expose it by running our application with the --expose-gc flag:

node --expose-gc leak.js

Now we can see some output showing clearly that we’re growing in memory usage:

{ rss: 15060992, heapTotal: 6163968, heapUsed: 2285608 }

{ rss: 15331328, heapTotal: 6163968, heapUsed: 2428768 }

{ rss: 15495168, heapTotal: 8261120, heapUsed: 2548496 }

{ rss: 15585280, heapTotal: 8261120, heapUsed: 2637936 }

{ rss: 15757312, heapTotal: 8261120, heapUsed: 2723192 }

{ rss: 15835136, heapTotal: 8261120, heapUsed: 2662456 }

{ rss: 15982592, heapTotal: 8261120, heapUsed: 2670824 }

{ rss: 16089088, heapTotal: 8261120, heapUsed: 2814040 }

{ rss: 16220160, heapTotal: 9293056, heapUsed: 2933696 }

{ rss: 16510976, heapTotal: 10324992, heapUsed: 3085112 }

{ rss: 16605184, heapTotal: 10324992, heapUsed: 3179072 }

{ rss: 16699392, heapTotal: 10324992, heapUsed: 3267192 }

{ rss: 16777216, heapTotal: 10324992, heapUsed: 3293760 }

{ rss: 17022976, heapTotal: 10324992, heapUsed: 3528376 }

{ rss: 17117184, heapTotal: 10324992, heapUsed: 3635264 }

{ rss: 17207296, heapTotal: 10324992, heapUsed: 3728544 }

Although we know we’re growing pretty steadily, we don’t really know “what” is leaking from this output. For that we need to take some heap snapshots and compare them to see what’s changing in our application. We’ll use the third-party heapdump module (https://github.com/bnoordhuis/node-heapdump). The heapdump module allows us to take snapshots either programmatically or by sending a signal to the process (UNIX only). These snapshots can be processed using the Chrome DevTools.

Let’s install the module first:

npm install heapdump --save-dev

Then include it in our leak.js file and instrument it to output a heap snapshot every 10 seconds:

var heapdump = require('heapdump');

var string = '1 string to rule them all';

var leakyArr = [];

var count = 2;

setInterval(function () {

  leakyArr.push(string.replace(/1/g, count++));

}, 0);

setInterval(function () {

  if (heapdump.takeSnapshot()) console.log('wrote snapshot');

}, 10000);

Now, every 10 seconds a file is written to the current working directory of the process that contains the snapshot. A garbage collection is automatically performed whenever a snapshot is taken. Let’s run our application to write a couple snapshots and then terminate it:

$ node leak3.js

wrote snapshot

wrote snapshot

Now we can see what was written:

$ ls

heapdump-29701132.649984.heapsnapshot

heapdump-29711146.938370.heapsnapshot

The files are saved with their respective timestamps. The larger the number, the more recent the snapshot. Now we can load these files into Chrome DevTools. Open Chrome and then the Developer Tools, go to the Profiles tab, and right-click on Profiles to load a snapshot file (see figure 11.5).

Figure 11.5. Loading a heap snapshot into the Chrome DevTools

To compare our two snapshots, let’s load them in the order we took them (see figure 11.6).

Figure 11.6. Loading a second snapshot for comparison

Now that we have them loaded, we can do some investigation. Let’s select the second one and then choose the Comparison option. Chrome will automatically select the previous snapshot to compare to (see figure 11.7).

Figure 11.7. Using the comparison view

Now we can see something immediately interesting in our view—a lot of strings are being created and not being garbage collected (see figure 11.8).

Figure 11.8. Examining memory allocations between the snapshots

So we can see that strings could be a problem here. But what strings are getting created? Here we have to do some investigation. Expanding the (string) tree will show us the largest strings first—typically application source code and some larger strings used in Node core and V8. But when we scroll down, we start to see strings generated in our application, and lots of them. By clicking one, we can see the retaining tree, or its relationship to other objects (see figure 11.9).

Figure 11.9. Drilling down to the types of data being created in memory

In this exercise, we had a hunch we were going to leak strings stored inside the leaky-Arr variable. But this exercise shows the relationship between the code and the tools to inspect memory usage. As a developer, you’ll know your source code, and the clues you get inside DevTools will be specific to your code and modules. The Comparison view can give a great snapshot of what’s changing.

We only talked about one way of creating snapshots. You can also send a SIGUSR2 (on *NIX systems) to a process with heapdump to take a snapshots at will:

kill -USR2 1120

Just remember that it’ll write the snapshot to the CWD of the process and will fail silently if the CWD isn’t writable by the process user.

You can also be programmatically clever, depending on your needs. For example, you could set up heapdump to take a snapshot after a certain memory threshold is breached, or if it grows faster than some expected limit given an interval.

Taking heap snapshots is something you can do in production for a small performance penalty while the snapshot is being written to disk. Let’s turn our attention to another technique you can use in production that has a minimal penalty and lets you poke around at the application state: using a REPL.

Technique 94 Inspecting a running program with a REPL

Attaching a debugger to a production process isn’t a viable option, as we don’t want to be pausing execution or adding a performance tax of running the V8 debugger. So how can we debug live or performance-sensitive issues? We can use a REPL to dive into the process and inspect or change state. In this technique we’ll first look at how a REPL works in Node, and how to set up your own REPL server and client. Then we’ll turn to inspecting a running process.

Problem

You want to interact with a running process to inspect or change its state.

Solution

Set up a REPL in the process and a REPL client to access.

Discussion

The Node REPL is a great way to play around and experiment with JavaScript and Node. The simplest way to play around with a REPL is to run Node without any arguments, as shown in figure 11.10.

Figure 11.10. Sample Node REPL session

But you can create your own REPLs using the built-in repl module. In fact, Node uses the same module when you type node. Let’s make our own REPL:

Executing this program creates a REPL that looks and functions much like node does:

$ node repl-basic.js

> 10 + 20

30

But we don’t need to use the process’s stdio for input and output; we can use a UNIX or a TCP socket! This allows us to connect to a long-running process from the outside. Let’s make a TCP REPL server:

Now if we fire up our REPL server, it’ll be listening on port 1337:

$ node repl-tcp.js

node repl listening on 1337

We can then connect to it with a TCP client like telnet or Netcat. You can do this in a separate terminal window:

$ nc localhost 1337

> 10 + 20

30

> exit

$

That’s cool! But it doesn’t behave like our basic REPL (see figure 11.11) or the node command:

Figure 11.11. Using Netcat against a REPL server

·        The Tab key doesn’t autocomplete available properties and variables.

·        We don’t have any readline support, so the Up Arrow key doesn’t give us any command history.

·        No color or bold output.

The reasons for this are twofold. First, the repl module can’t determine that we’re running a TTY (terminal) session, so it provides a minimal interface avoiding the use of ANSI/VT100 escape codes for color and formatting. These escape codes end up being noise to clients like Netcat. Second, our client isn’t behaving like a TTY. It isn’t sending the proper input codes to get nice perks like autocomplete behavior or history.

In order to change this behavior, we need to modify both the server and client. First, to send proper ANSI/VT100 escape codes for things like color and bold output, we need to add the terminal option to our REPL configuration:

Second, to get the input tab completion and readline, we need to create a REPL client that can send the raw TTY input to the server. We can create that using Node:

Now we can start our REPL server with terminal support:

$ node repl-tcp-terminal.js

node repl listening on 1337

We can connect to the server with our REPL client in another terminal session:

$ node repl-client.js

> 10 + 20

30

> .exit

$

Now our REPL session behaves as if we were running the node or a basic REPL. We can use autocomplete, access our command history, and get colored output. Sweet!

Inspecting a running process

We’ve discussed how to use the repl module to create connection points and use various clients to access it. We did this to get you comfortable setting up REPL instances on your applications so you can use them to inspect a running process. Now, let’s get practical and instrument an existing application with a REPL server, and interact with it using the REPL client we created.

First, let’s create a basic HTTP server:

var http = require('http');

var server = http.createServer();

server.on('request', function (req, res) {

  res.end('Hello World');

});

server.listen(3000);

console.log('server listening on 3000');

This should look familiar. But let’s expose this server to our REPL server by adding the following code:

 

A note about useGlobal

When enabled, whenever you create a new variable (like var a = 1), it will be put in the global context (global.a === 1). But a now will also be accessible in functions run in a later turn in the event loop.

 

We exposed the server by setting a property on r.context. We can expose anything we want to the REPL in order to interact with it. It’s important to note that we also can override anything already existing in the context. This includes all the standard Node global variables like global,process, or Buffer.

Now that we have our server exposed, let’s see how we can inspect and debug our HTTP server. First let’s fire up our HTTP and REPL servers:

$ node repl-app.js

server listening on 3000

repl listening on 1337

Now let’s use our REPL client to tap into the server:

$ node repl-client.js

We can tap into useful bits of information right away. For instance, we can see how long our process has been running, or how its memory usage is:

We also exposed the server object, and we can access that by just typing server:

> server

{ domain: null,

  _events:

  ...

  _connectionKey: '4:0.0.0.0:3000' }

Let’s see how many connections are currently active:

> server.connections

0

Clearly this would be more interesting in a production context, since we are the only ones using the server and we haven’t made a connection yet! Let’s hit http://localhost:3000 in our browser and inspect the connections again and see if they’ve changed:

That works. Let’s instrument something more complex. Can you think of a way to start tallying the number of requests coming in to our server using the REPL?

 

Adding instrumentation

One powerful aspect of a REPL is the ability to add instrumentation to help us understand behavior in our application as it’s happening. This is especially handy for tricky problems where restarting the application loses our precious state and we have no idea how to duplicate the issue except to wait for it to happen again.

 

Since our HTTP server is an EventEmitter, we can add another request handler that will be called on every request to instrument it with the behavior we want using the REPL:

Now we’re tracking incoming requests. Let’s hit Refresh a few times on our browser and see if it worked:

> numReqs

8

Excellent. Since we have access to the request objects, we can inspect any information about requests available to us: IP addresses, headers, paths, and so on. In this example we exposed an HTTP server, but any objects can be put on the context where it makes sense in your application. You may even consider writing a module exposing commonly used methods in the REPL.

Some issues can’t be resolved at an application level and need deeper system introspection. One way to gain deeper understanding is by tracing.

Technique 95 Tracing system calls

Understanding how the underlying system calls work can really help you understand a platform. For example, Python and Node both have functionality to perform DNS lookups, but they go about it differently at a lower level. And if you’re wondering why one is behaving differently than the other, tracing tools will show you that!

At their core, tracing tools monitor underlying system calls (typically C function names, arguments, and return values) that an application or multiple applications are making, and do interesting things with the data (like logging or statistics).

Tracing helps in production. If you have a process stuck at 100% and are unsure why, a tracer can help expose the underlying state at the system level. For example, you may discover in this instance that you exceeded the allowed open files for a process, and all I/O attempts are being rejected, causing the problem. Since tracing tools aren’t performance intrusive like a profiler, they can be valuable assets.

Problem

You want to understand what’s happening in your application at the system level.

Solution

Use tracing tools specific to the operating system to gain introspection.

Discussion

All the techniques we’ve discussed so far have been system-agnostic. This one is OS-specific. There are a lot of different tools, but most are unique to an operating system. For our example, we’ll use the Linux-specific tool called strace. Similar tools exists for Mac OS X/Solaris (dtruss) and Windows (ProcessMonitor: http://technet.microsoft.com/en-us/sysinternals/bb896645.aspx).

A tracing program is essentially a dump of system calls as they happen in a process. If you’re unfamiliar with the underlying OS, prepare to learn! We’ll walk through tracing a simple application to see what’s happening at the OS level when we run it to learn how to read trace logs.

Let’s write an extremely simple program to trace:

console.log('hello world');

This seems innocent enough. To see what’s going on behind the scenes, let’s trace this:

sudo strace -o trace.out node hello

You’ll see the program output “hello world” and exit as expected. But we also got a dump of every system call in trace.out. Let’s examine that file.

Right at the top we can see our first call, which makes sense. We’re executing /usr/bin/node, passing it the arguments node and hello:

execve("/usr/bin/node", ["node", "hello"], [/* 24 vars */]) = 0

If you ever wondered why process.argv[0] is node and process.argv[1] is the path to our Node program, now you can see how the underlying call is being made! The strace output tells us the arguments passed and the return value.

To find more information about what execve is (and any other system call), we can just look at the man pages on the host if available (best option), or if not, look online:

man execve

 

More on man command

Manual pages also include error codes that are helpful to get more details on, for example, what ENOENT or EPERM mean on an operating system. Many of these error codes can be found in the openman page.

 

Let’s examine more of this file. Many of the initial calls are loading the shared libraries libuv needs. Then we get to our application:

getcwd("/home/wavded", 4096) = 13

...

stat("/home/wavded/hello", 0x7fff082fda08) = -1

  ENOENT (No such file or directory)

stat("/home/wavded/hello.js",

  {st_mode=S_IFREG|0664, st_size=27, ...}) = 0

We can see Node grabbing the current working directory and then looking up our file to run. Note that we executed our application without the .js extension, so Node first looks for a program called “hello” and doesn’t find it, and then looks for hello.js and is successful. If we were to run it with the .js extension, you wouldn’t see the first stat call.

Let’s look at the next interesting bit:

open("/home/wavded/hello.js", O_RDONLY) = 9

fstat(9, {st_mode=S_IFREG|0664, st_size=27, ...}) = 0

...

read(9, "console.log('hello world')\n", 27) = 27

close(9) = 0

Here we open the hello.js file in read-only mode and get assigned a file descriptor. File descriptors are just integers assigned by the OS. But to understand the subsequent calls, we should take note that 9 is the number assigned for hello.js until we see a subsequent close call.

After open, we then see an fstat to get the file’s size. Then we read the contents of the file in the read line. The strace output also shows us the contents of the buffer we used to store the file. We then close the file descriptor.

A trace output file won’t show us any application code being run. We just see the system effects of what’s being run. That is, we won’t see V8 parsing or executing our console.log but we’ll see the underlying write out to stdout. Let’s look at that next:

write(1, "hello world\n", 12) = 12

Recall from chapter 6 that every process has three file descriptors automatically assigned for stdin (0), stdout (1), and stderr (2). We can see that this write call uses stdout (1) to write out hello world. We also see that console.log appends a newline for us.

Our program eventually exits on the last line of trace.out:

exit_group(0)

The zero (0) here represents the process exit code. In this case it’s successful. If we were to exit with process.exit(1) or some other status, we’d see that number reflected here.

Tracing a running process

So far we’ve used strace to start and trace a program till it exits. How about tapping into a running process?

Here we can just grab the PID for the process:

ps ax | grep node

The first number in the row is our PID:

32476 ? Ssl 0:08 /usr/bin/node long-running.js

Once we have our PID, we can run strace against it:

sudo strace -p 32476

All the currently running system calls will output to the console.

This can be a great first line of defense when debugging live issues where CPU is pegged. For example, if we’ve exceeded our ulimit for a process, this will typically peg our CPU, since open system calls continually will fail. Running strace on the process would quickly show a bunch of ENFILE errors occurring. And from the openman page, we can see a nice entry for the error:

ENFILE The system limit on the total number of

open files has been reached.

 

Listing open files

In this case, we can use another handy Linux tool called lsof to get a list of open files for a process given a PID to further investigate what we have open right now.

 

We can also get a CPU pegged at 100% and open up strace and see just the following repeating over and over:

futex(0x7ffbe00008c8, FUTEX_WAKE_PRIVATE, 1) = 1

This, for the most part, is just event loop noise, and it’s likely that your application code is stuck in an infinite loop somewhere. Tools like node --prof would help at this point.

About other operating system tools

The actual system calls we looked at will be different on other operating systems. For example, you’ll see epoll calls being made on Linux that you won’t ever see on Mac OS X because libuv uses kqueue for Mac. Although most OSes have POSIX methods like open, the function signatures and error codes can vary. Get to understand the machines you host and develop your Node applications on to make best use of the tracing tools!

 

Homework!

Make a simple HTTP server and trace it. Can you find out where the port is being bound, where connections are being accepted, and where responses are being written back to the client?

 

11.3. Summary

In this chapter we looked at debugging Node applications. First, we focused on error handling and prevention:

·        How do you handle errors that your application generates?

·        How are you notified about crashes? Do you have domains set up or an uncaughtException handler?

·        Are you using a lint tool to help prevent exceptions?

Then, we focused on debugging specific problems. We used various tools available in Node and third-party modules. The big thing to take away is knowing the right tool for the job, so when a problem arises, you can assess it and gain useful information:

·        Do you need to be able to set breakpoints, watch expressions, and step through your code? Use the built-in debug command or node-inspector.

·        Do you need to see where your application is spending its time? Use the Node built-in profiler (node --prof).

·        Is your application using more memory than expected? Take heap snapshots and inspect the results.

·        Do you want to investigate a running process without pausing it or incurring a performance penalty? Set up and use a REPL server.

·        Do you want to see what underlying system calls are being made? Use your operating system’s tracing tools.

In the next chapter we’ll dive into writing web applications with Node!