Elsewhere

Requiring Elsewhere

Elsewhere is available on NPM. To install it, run:

npm install elsewhere

Once you have it installed, you can require() it and interact with it using the graph() method.

var elsewhere = require('elsewhere');

elsewhere.graph('http://premasagar.com').then(function (err, graph) {
    res.end(graph);
});

graph()

The elsewhere.graph() method accepts the following parameters:

url This is the URL that you wish to graph. The assumption is that it represents a person (or a company or organisation) and that the page at that URL has rel=me links to other URLs that also represent the person.

options (optional) This object contains properties used to configure the graph. The properties that can be passed in are: strict, logLevel, crawlLimit, domainLimit, stripDeeperLinks, useCache, cacheTimeLimit and cacheItemLimit. These are the same options that can be set in the global configuration (see below).

callback (optional) This is a function to be called once the graph is ready. The callback is passed an error (string or null) and the completed graph as a object literal.

The method returns a promise object (created by the Underscore.deferred module). This is an alternative to using the callback parameter and provides fine-grained flow control.

Some examples of valid calls to the graph() method:

var options = {strict:true},
	callback = function (err, graph) {
		if (err) {
			console.log(graph);
		} else {
			console.log(JSON.stringify(graph));
		}
	};

elsewhere.graph('http://premasagar.com', options, callback);

elsewhere.graph('http://chrisnewtn.com', callback);

elsewhere.graph('http://glennjones.net').then(callback);

If an error happens trying to parse the starting point URL, the returning ‘err’ object will contain a string message of the error. The ‘graph’ object will still be returned. Any after during the parsing of additional URLs will be added to the warnings collection, part of the graph object.

Global configuration

Instead of using a local options object each time you call the graph() method, you can also set global options by setting properties in the elsewhere.options object.

Global options act as default values, which can then be overriden by options passed when calling the graph() method.

strict (boolean) Whether the crawler allows only reciprocal rel="me" links or not. A reciprocal link is where the page at a URL links to another page in the graph, and a page in the graph links back to the original URL. When true, there will be no false positives, but fewer results. Default: false

logLevel (integer) There are 4 levels of logging in Elsewhere: 4 - log, 3 - info, 2 - warn and 1 - error. The 4 setting gives the most granular logs, which are useful in a debugging scenario. Default: 3

crawlLimit (integer) The number of links that Elsewhere will follow without a successful verification before it abandons the chain. Default: 3

domainLimit (integer) The number of links crawled within a particular domain before the crawling of subsequent links in the domain is abandoned. Default: 3

stripDeeperLinks (boolean) If set to true then Elsewhere will remove links from the graph if they are at a deeper path than other links in the same domain. For example, plus.google.com/{id} is retained, but plus.google.com/{id}/posts is discarded. This is useful, for example, to strip out paginated contacts pages on social networks. Default: true

useCache Whether a request should use the cache during a request. Default: true

cacheTimeLimit The amount of time, in milliseconds, that graphs and pages are kept in the cache before they are discarded. Default: 3600000

cacheItemLimit (integer) The maximum number of items that can be kept in the cache before the oldest items are discarded. Use to limit memory. Default: 1000

httpHeaders (object) An object the HTTP header properties use when requesting resources from the internet.

For example:

elsewhere.options.strict = false;

If you are running Elsewhere as a server, then you may set the options directly in lib/options.js.

Custom cache

Elsewhere use an in-memory cache to store the HTML of web pages.

The options object contains a property called cacheTimeLimit that can be used to set the cache refresh time. By default, this is 3600000ms (1 hour). The number of items stored in the cache can be limited using the options property cacheItemLimit. By default, the cache is limited to 1000 items.

You can replace the cache with your own, for example, to store the cached date in a database or file system. To add you own custom cache, all you need to do is provide an object with the following interface:

{
  get: function (url) {
    // add code to get data
    return data
  },
  has: function (url) {
    // add code to check your data store
    return true or false
  },
  fetch: function (url, callback) {
    // add code to return data
    callback(null, data);
  },
  set: function (url, data) {
    // add code to store data
    return object
  }
}

You must then add this object as the cache property of the options object passed into the graph() method.

Custom logger

Elsewhere uses a simple logging system that writes to Node’s console. You can replace the logger with your own, for example, to store warnings and errors in a database or log file. To add your own custom logger, all you need to do is provide an object with the following interface:

{
  info:  function (message) { /* code to pass on message */ },
  log:   function (message) { /* code to pass on message */ },
  warn:  function (message) { /* code to pass on message */ },
  error: function (message) { /* code to pass on message */ }
}

You must then add this object to the logger property of the options object passed into the graph() method.

It is living and ceasing to live that are imaginary solutions. Existence is elsewhere.
Andre Breton