Reloading node with no downtime

I wrote a blog post about Unix signals and Graceful shutdown in node.js applications five months ago. In this article I will explain how to reload a node.js application with no downtime.

One of the things that I like about nginx is how it handles configuration changes Controlling nginx. The master process "reload the configuration" by creating new worker process when it receives the SIGHUP signal.

Node.js comes with a cluster module that allows us to do very powerful things.

For this example I will use one worker but it can be extended to use as many workers as you want.

master.js:

var cluster = require('cluster');

console.log('started master with ' + process.pid);

//fork the first process
cluster.fork();

process.on('SIGHUP', function () {
  console.log('Reloading...');
  var new_worker = cluster.fork();
  new_worker.once('listening', function () {
    //stop all other workers
    for(var id in cluster.workers) {
      if (id === new_worker.id.toString()) continue;
      cluster.workers[id].kill('SIGTERM');
    }
  });
});

The master process start the first worker and then listen to the SIGHUP signal. Then when it receives a SIGHUP signal it fork a new worker and wait the worker until is listening on the IPC channel, once the worker process is listening it kill the other workers.

This works out of the box because the cluster module allows several worker process to listen on the same address.

server.js:

var cluster = require('cluster');

if (cluster.isMaster) {
  require('./master');
  return;
}

var express = require('express');
var http = require('http');
var app = express();

app.get('/', function (req, res) {
  res.send('ha fsdgfds gfds gfd!');
});

http.createServer(app).listen(8080, function () {
  console.log('http://localhost:8080');
});

This is the entry point for the application, it is a simple express application with the exception of the first part.

You can test this as follows:

I've uploaded a more complete example to github.

Posted in node | 0 Comments and 0 Reactions

Graceful shutdown in node.js

According to wikipedia - Unix Signal:

Signals are a limited form of inter-process communication used in Unix, Unix-like, and other POSIX-compliant operating systems. A signal is an asynchronous notification sent to a process or to a specific thread within the same process in order to notify it of an event that occurred.

There are a bunch of generic signals, but I will focus on two:

  • SIGTERM is used to cause a program termination. It is a way to politely ask a program to terminate. The program can either handle this signal, clean up resources and then exit, or it can ignore the signal.
  • SIGKILL is used to cause inmediate termination. Unlike SIGTERM it can't be handled or ignored by the process.

Wherever and however you are deploying your node.js application it is very likely that the system in charge of running your app use these two signals:

  • Upstart: When stoping a service, by default it sends SIGTERM and waits 5 seconds, if the process is still running, it sends SIGKILL.
  • supervisord: When stoping a service, by default it sends SIGTERM and waits 10 seconds, if the process is still running, it sends SIGKILL.
  • runit: When stoping a service, by default it sends SIGTERM and waits 10 seconds, if the process is still running, it sends SIGKILL.
  • Heroku dynos shutdown: as described in this link heroku send SIGTERM, waits the process to exit for 10 seconds and if the process is still running it sends SIGKILL.
  • Docker: If you run your node app in a docker container, when running docker stop command the main process inside the container will receive SIGTERM, and after a grace period (10 seconds by default), SIGKILL.

So, let's try a with this simple node application:

var http = require('http');

var server = http.createServer(function (req, res) {
  setTimeout(function () { //simulate a long request
    res.writeHead(200, {'Content-Type': 'text/plain'});
    res.end('Hello World\n');
  }, 4000);
}).listen(9090, function (err) {
  console.log('listening http://localhost:9090/');
  console.log('pid is ' + process.pid)
});

As you can see response are delayed 4 seconds. The node documentation here says:

SIGTERM and SIGINT have default handlers on non-Windows platforms that resets the terminal mode before exiting with code 128 + signal number. If one of these signals has a listener installed, its default behaviour will be removed (node will no longer exit).

It is not clear from here what's the default behavior, I send SIGTERM in the middle of a request the request will fail as you can see here:

» curl http://localhost:9090 &
» kill 23703
[2] 23832
curl: (52) Empty reply from server

Fortunately, the http server has a close method that stops the server for receiving new connections and calls the callback once it finished handling all requests. This method comes from the NET module, so is pretty handy for any type of tcp connections.

Now, if I modify the example to something like this:

var http = require('http');

var server = http.createServer(function (req, res) {
  setTimeout(function () { //simulate a long request
    res.writeHead(200, {'Content-Type': 'text/plain'});
    res.end('Hello World\n');
  }, 4000);
}).listen(9090, function (err) {
  console.log('listening http://localhost:9090/');
  console.log('pid is ' + process.pid);
});

process.on('SIGTERM', function () {
  server.close(function () {
    process.exit(0);
  });
});

And then I use the same commands as above:

» curl http://localhost:9090 &
» kill 23703
Hello World
[1]  + 24730 done       curl http://localhost:9090

You will notice that the program doesn't exit until it finished processing and serving the last request. More interesting is the fact that after the SIGTERM signal it doesn't handle more requests:

» curl http://localhost:9090 &
[1] 25072

» kill 25070

» curl http://localhost:9090 &
[2] 25097

curl: (7) Failed connect to localhost:9090; Connection refused
[2]  + 25097 exit 7     curl http://localhost:9090

» Hello World
[1]  + 25072 done       curl http://localhost:9090

Some examples in blogs and stackoverflow uses a timeout on SIGTERM in the case that server.close takes longer than expected. As mentioned above this is unnecesary because every process manager will send a SIGKILL if the SIGTERM takes too much time.

Posted in node | 0 Comments and 0 Reactions

A common case of double callbacks in node.js

A double-callback is in javascript jargon a callback that we expect to be called once but for some reason is called twice or more times.

Sometimes it is easy to discover, as in this example:

function doSomething(callback) {
  doAnotherThing(function (err) {
    if (err) callback(err);
    callback(null, result);
  });
}

The obvious error here is that when doAnotherThing fails the callback is called once with the error, and once with result.

However there is one special case that is very hard to reproduce and to discover, moreover it has happened to me several times.

Yesterday, my friend and co-worker Alberto asked me this:

"Why does this test hangs on the assertion line?" expect(foo).to.be.equal('123');

The test look like this

it('test something', function (done) {
  function_under_test(function (err, output) {
    expect(foo).to.be.equal('123');
  });
});

After some debugging I found out that it didn't hang only on expect but when we throw any error inside the callback.

A bunch of calls before in the stack, there was a little function with a bug like this:

function (callback) {
  another_function(function (err, some_data) {
    if (err) return callback(err);
    try {
      callback(null, JSON.parse(some_data));
    } catch(err) {
      callback(new Error(some_data + ' is not a valid JSON'));
    }
  });
}

The intention of the developer with this try method is clear: to catch JSON.parse errors. But the problem is that it also catch errors thrown inside callback and execute the callback with a wrong error.

The solution is trivial, parse outside the try as follows:

function (callback) {
  another_function(function (err, some_data) {
    if (err) return callback(err);
    try {
      var parsed = JSON.parse(some_data)
    } catch(err) {
      return callback(new Error(some_data + ' is not a valid JSON'));
    }
    callback(null, parsed);
  });
}

Introducing these errors is very easy I've done several times, throubleshooting is very hard, so be careful and do not wrap callbacks call in try/catch blocks.

Posted in node | 0 Comments and 0 Reactions

  • Categories

  • Archives