Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion CHANGES.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,15 @@
CHANGELOG
=========

* @bdeitte Add client-side telemetry support with `includeDatadogTelemetry` option (disabled by default and in beta) and telemetryFlushInterval

## 12.0.0 (2025-12-16)

* @bdeitte event calls now use prefix and suffix
* @bdeitte mock mode no longer creates a socket
* @bdeitte using an IP no longer invokes DNS lookup
* @bdeitte client close no longer fails when errorHandler is defined but socket is null
* @bdeitte tags ending with '\' no longer breaks telegraph
* @bdeitte tags ending with '\\' no longer breaks telegraph

## 11.4.0 (2025-12-7)

Expand Down
24 changes: 23 additions & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ hot-shots is a Node.js client library for StatsD, DogStatsD (Datadog), and Teleg
- **lib/statsFunctions.js**: Core metric methods (timing, increment, gauge, etc.)
- **lib/helpers.js**: Tag formatting, sanitization, and utility functions
- **lib/constants.js**: Protocol constants and error codes
- **lib/telemetry.js**: Client-side telemetry for DogStatsD (tracks metrics/bytes sent/dropped)
- **index.js**: Main entry point (exports lib/statsd.js)
- **types.d.ts**: TypeScript type definitions

Expand All @@ -40,7 +41,13 @@ npm run coverage # Run tests with coverage report
```

### Linting
The project uses ESLint 5.x with pretest hooks. All code must pass linting before tests run.
The project uses ESLint 8.x with pretest hooks. All code must pass linting before tests run.

Key linting rules to follow:
- Use single quotes for strings (not double quotes or backticks for simple strings)
- Always use curly braces for if/else blocks, even single-line ones
- Ternary operators: put `?` and `:` at the end of lines, not the beginning
- No trailing spaces or mixed indentation

### Running Single Tests
```bash
Expand Down Expand Up @@ -77,6 +84,13 @@ npx mocha test/specific-test.js --timeout 5000
- Distribution metrics
- Automatic DD_* environment variable tag injection
- Unix Domain Socket support
- Client-side telemetry (opt-in via `includeDatadogTelemetry`)

### DogStatsD-Only Features Pattern
Features specific to DogStatsD (not Telegraf) should:
1. Check `this.telegraf` and throw/return error if true
2. Be disabled in mock mode where appropriate
3. Child clients should inherit parent behavior (e.g., share telemetry instance)

### Telegraf
- Different tag separator format
Expand All @@ -91,6 +105,14 @@ The project uses Mocha with 5-second timeouts. Tests are organized by feature:
- Error handling and edge cases
- Child client functionality
- Buffering and performance tests
- Telemetry tests

### Test Helpers
Tests use helpers from `test/helpers/helpers.js`:
- `createServer(serverType, callback)` - Creates a test server for the given protocol
- `createHotShotsClient(opts, clientType)` - Creates a client ('client', 'child client', etc.)
- `closeAll(server, statsd, allowErrors, done)` - Properly closes server and client in afterEach
- `testTypes()` - Returns all protocol/client combinations for parameterized tests

## Dependencies

Expand Down
44 changes: 44 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,8 @@ Parameters (specified as one object passed into hot-shots):
* `maxRetryDelayMs`: Maximum delay in milliseconds between retry attempts (caps exponential backoff). Defaults to `1000`.
* `backoffFactor`: Exponential backoff multiplier for retry delays. Defaults to `2`.
* `udpSocketOptions`: Used only when the protocol is `udp`. Specify the options passed into dgram.createSocket(). Defaults to `{ type: 'udp4' }`
* `includeDatadogTelemetry`: Enable client-side telemetry to track metrics about the client itself. This helps diagnose high-throughput metric delivery issues. Telemetry metrics are prefixed with `datadog.dogstatsd.client.` and are not billed as custom metrics. `default: false`. See [Client-Side Telemetry](#client-side-telemetry) for details.
* `telemetryFlushInterval`: When telemetry is enabled, how often (in ms) to send telemetry metrics. `default: 10000`

### StatsD methods
All StatsD methods other than `event`, `close`, and `check` have the same API:
Expand Down Expand Up @@ -257,6 +259,8 @@ Some of the functionality mentioned above is specific to DogStatsD or Telegraf.
* histogram method- DogStatsD or Telegraf
* event method- DogStatsD
* check method- DogStatsD
* includeDatadogTelemetry parameter- DogStatsD
* telemetryFlushInterval parameter- DogStatsD

## Errors

Expand Down Expand Up @@ -309,6 +313,46 @@ optionalDependency, and how it's used in the codebase, this install
failure will not cause any problems. It only means that you can't use
the uds feature.

## Datadog Telemetry

When `includeDatadogTelemetry` is enabled, the client automatically sends telemetry metrics about itself to help diagnose metric delivery issues in high-throughput scenarios. This feature should matche the behavior of official Datadog clients as described in [the docs](https://docs.datadoghq.com/developers/dogstatsd/high_throughput/?tab=go#client-side-telemetry).

Telemetry is automatically disabled when using `mock: true`, `telegraf: true`, or in child clients.

### Telemetry Metrics

The following metrics are sent every `telemetryFlushInterval` milliseconds (default: 10 seconds):

| Metric | Description |
|--------|-------------|
| `datadog.dogstatsd.client.metrics` | Total number of metrics sent |
| `datadog.dogstatsd.client.metrics_by_type` | Metrics broken down by type (gauge, count, set, timing, histogram, distribution) |
| `datadog.dogstatsd.client.events` | Total number of events sent |
| `datadog.dogstatsd.client.service_checks` | Total number of service checks sent |
| `datadog.dogstatsd.client.bytes_sent` | Total bytes successfully sent |
| `datadog.dogstatsd.client.bytes_dropped` | Total bytes dropped |
| `datadog.dogstatsd.client.packets_sent` | Total packets successfully sent |
| `datadog.dogstatsd.client.packets_dropped` | Total packets dropped |

The `metric_dropped_on_receive` from the official Datadog clients is intentionally omitted. That metric tracks drops on an internal receive channel, which doesn't apply to hot-shots' architecture. Also `bytes_dropped_queue` is omitted as this also didn't fit into how hot-shots works.

### Telemetry Tags

All telemetry metrics include these tags:
* `client:nodejs` - Identifies the hot-shots client
* `client_version:<version>` - The hot-shots version
* `client_transport:<protocol>` - The transport protocol (udp, tcp, uds, stream)

### Example

```javascript
var client = new StatsD({
host: 'localhost',
includeDatadogTelemetry: true,
telemetryFlushInterval: 10000 // Optional, default is 10 seconds
});
```

## Submitting changes

Thanks for considering making any updates to this project! This project is entirely community-driven, and so your changes are important. Here are the steps to take in your fork:
Expand Down
14 changes: 12 additions & 2 deletions lib/statsFunctions.js
Original file line number Diff line number Diff line change
Expand Up @@ -258,6 +258,11 @@ function applyStatsFns (Client) {
throw err;
}

// Track service check in telemetry
if (this.telemetry) {
this.telemetry.recordServiceCheck();
}

const check = ['_sc', this.prefix + name + this.suffix, status], metadata = options || {};

if (metadata.date_happened) {
Expand Down Expand Up @@ -304,9 +309,9 @@ function applyStatsFns (Client) {
* @option date_happened {Date} Assign a timestamp to the event. Default is now.
* @option hostname {String} Assign a hostname to the event.
* @option aggregation_key {String} Assign an aggregation key to the event, to group it with some others.
* @option priority {String} Can be normal or low. Default is 'normal'.
* @option priority {String} Can be 'normal' or 'low'. Default is 'normal'.
* @option source_type_name {String} Assign a source type to the event.
* @option alert_type {String} Can be error’, ‘warning’, ‘info or success. Default is 'info'.
* @option alert_type {String} Can be 'error', 'warning', 'info' or 'success'. Default is 'info'.
* @param tags {Array=} The Array of tags to add to metrics. Optional.
* @param callback {Function=} Callback when message is done being delivered. Optional.
*/
Expand All @@ -323,6 +328,11 @@ function applyStatsFns (Client) {
throw err;
}

// Track event in telemetry
if (this.telemetry) {
this.telemetry.recordEvent();
}

// Convert to strings
let message;

Expand Down
67 changes: 65 additions & 2 deletions lib/statsd.js
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ const process = require('process'),
const constants = require('./constants');
const createTransport = require('./transport');
const debug = util.debuglog('hot-shots');
const Telemetry = require('./telemetry');

const PROTOCOL = constants.PROTOCOL;
const TCP_ERROR_CODES = constants.tcpErrors();
Expand Down Expand Up @@ -107,6 +108,29 @@ const Client = function (host, port, prefix, suffix, globalize, cacheDns, mock,
this.closingFlushInterval = options.closingFlushInterval || 50;
this.udpSocketOptions = options.udpSocketOptions || { type: 'udp4' };

// Telemetry options (Datadog-specific, disabled by default)
// Only enable for non-telegraf, non-mock, non-child clients
this.includeDatadogTelemetry = options.includeDatadogTelemetry === true &&
!options.telegraf &&
!options.mock &&
!options.isChild;

// Initialize telemetry if enabled
if (this.includeDatadogTelemetry) {
this.telemetryFlushInterval = options.telemetryFlushInterval || Telemetry.DEFAULT_TELEMETRY_FLUSH_INTERVAL;
this.telemetry = new Telemetry({
protocol: this.protocol,
flushInterval: this.telemetryFlushInterval,
tagPrefix: this.tagPrefix,
tagSeparator: this.tagSeparator
});
} else if (options.isChild && options.telemetry) {
// Child clients share parent's telemetry instance
this.telemetry = options.telemetry;
} else {
this.telemetry = null;
}

// If we're mocking the client, create a buffer to record the outgoing calls.
if (this.mock) {
this.mockBuffer = [];
Expand Down Expand Up @@ -144,6 +168,16 @@ const Client = function (host, port, prefix, suffix, globalize, cacheDns, mock,
global.statsd = this;
}

// Start telemetry if enabled (only for parent clients)
if (this.includeDatadogTelemetry && this.telemetry) {
// Set the send function for telemetry to use
// We use sendMessage directly to bypass metric tracking (avoid infinite loop)
this.telemetry.setSendFunction((message, callback) => {
this.sendMessage(message, callback, true); // true = isTelemetry
});
this.telemetry.start();
}

debug('hot-shots client initialized: protocol=%s, host=%s, port=%s, prefix=%s, maxBufferSize=%s, mock=%s',
this.protocol, this.host, this.port, this.prefix, this.maxBufferSize, this.mock);

Expand Down Expand Up @@ -243,6 +277,11 @@ Client.prototype.sendAll = function (stat, value, type, sampleRate, tags, callba
* @param callback {Function=} Callback when message is done being delivered. Optional.
*/
Client.prototype.sendStat = function (stat, value, type, sampleRate, tags, callback) {
// Track metric in telemetry (even if sampled out, matching official Datadog behavior)
if (this.telemetry) {
this.telemetry.recordMetric(type);
}

let message = `${this.prefix + stat + this.suffix}:${value}|${type}`;
sampleRate = sampleRate || this.sampleRate;
if (sampleRate && sampleRate < 1) {
Expand Down Expand Up @@ -357,8 +396,9 @@ Client.prototype.flushQueue = function (callback) {
*
* @param message {String} The constructed message without tags
* @param callback {Function=} Callback when message is done being delivered. Optional.
* @param isTelemetry {Boolean=} Whether this is a telemetry message (to avoid tracking telemetry). Optional.
*/
Client.prototype.sendMessage = function (message, callback) {
Client.prototype.sendMessage = function (message, callback, isTelemetry) {
// don't waste the time if we aren't sending anything
if (message === '' || this.mock) {
if (callback) {
Expand All @@ -367,6 +407,9 @@ Client.prototype.sendMessage = function (message, callback) {
return;
}

const messageBytes = Buffer.byteLength(message);
debug('hot-shots sendMessage: message size in bytes is %d', messageBytes);

const socketWasMissing = !this.socket;
if (socketWasMissing && (this.protocol === PROTOCOL.TCP || this.protocol === PROTOCOL.UDS)) {
debug('hot-shots sendMessage: socket missing, attempting to recreate for protocol=%s', this.protocol);
Expand All @@ -381,6 +424,10 @@ Client.prototype.sendMessage = function (message, callback) {
if (socketWasMissing) {
const error = new Error('Socket not created properly. Check previous errors for details.');
debug('hot-shots sendMessage: socket creation failed - %s', error.message);
// Track bytes dropped due to socket error (only for non-telemetry messages)
if (this.telemetry && !isTelemetry) {
this.telemetry.recordBytesDroppedWriter(messageBytes);
}
if (callback) {
return callback(error);
} else if (this.errorHandler) {
Expand All @@ -396,13 +443,21 @@ Client.prototype.sendMessage = function (message, callback) {
if (errFormatted) {
errFormatted.code = err.code;
debug('hot-shots sendMessage: error sending - %s (code: %s)', err.message, err.code);
// Track bytes dropped due to writer error (only for non-telemetry messages)
if (this.telemetry && !isTelemetry) {
this.telemetry.recordBytesDroppedWriter(messageBytes);
}
// handle TCP/UDS error that requires socket replacement when we are not
// emitting the `error` event on `this.socket`
if ((this.protocol === PROTOCOL.TCP || this.protocol === PROTOCOL.UDS) && (callback || this.errorHandler)) {
protocolErrorHandler(this, this.protocol, err);
}
} else {
debug('hot-shots sendMessage: successfully sent %d bytes', message.length);
// Track bytes sent successfully (only for non-telemetry messages)
if (this.telemetry && !isTelemetry) {
this.telemetry.recordBytesSent(messageBytes);
}
}
if (callback) {
callback(errFormatted, bytes);
Expand Down Expand Up @@ -444,6 +499,12 @@ Client.prototype.close = function (callback) {
clearInterval(this.intervalHandle);
}

// Stop telemetry and flush one last time
if (this.includeDatadogTelemetry && this.telemetry) {
this.telemetry.stop();
this.telemetry.flush(); // Final flush before close
}

// flush the queue one last time, if needed
this.flushQueue((err) => {
if (err) {
Expand Down Expand Up @@ -554,7 +615,9 @@ const ChildClient = function (parent, options) {
bufferFlushInterval: parent.bufferFlushInterval,
telegraf : parent.telegraf,
protocol : parent.protocol,
closingFlushInterval : parent.closingFlushInterval
closingFlushInterval : parent.closingFlushInterval,
// Child inherits telemetry from parent (for metric tracking)
telemetry : parent.telemetry
});
};
util.inherits(ChildClient, Client);
Expand Down
Loading