Skip to content

osquery ingester encryption#2581

Merged
zackattack01 merged 25 commits intomainfrom
zack/osq_ingester_encryption
Mar 12, 2026
Merged

osquery ingester encryption#2581
zackattack01 merged 25 commits intomainfrom
zack/osq_ingester_encryption

Conversation

@zackattack01
Copy link
Contributor

@zackattack01 zackattack01 commented Feb 11, 2026

This adds the ingestion, storage, and use of the new encryption keys for launcher to encrypt all osquery logs and results before publication. The keys are distributed via enrollment (standard and secretless) and through control data as part of the auth tokens subsystem (no updates required for the control data because the log publish client is already registered as a subscriber to the auth tokens subsystem).

This uses the cloudflare/circl HPKE library implementation for the actual encryption as this seems to be the most widely used currently, but this logic is currently being added to the internal golang crypto libraries and we can update to use that when available.

primary changes:

  • adds new storage keys for HPKE public key and PSK
  • adds new expected fields to enrollment and krypto-ec-middleware callback responses
  • persists keys when populated within the enrollment responses
  • adds caching/refresh and key fetching logic to osquery publisher for new keys
  • adds payload encryption for the new format using HPKE

Not in scope for this PR:
A few things left for subsequent PRs because these changes were already getting large as the RFD grew-

  • metadata required for k2 to be able to recreate the DCK for decryption should be added as the AAD parameter, that is noted as a TODO here but not implemented
  • where we should compress is also noted but not implemented, I'm working on determining how to account for that when batching and didn't want to overcomplicate these changes
sequenceDiagram
    participant L as Launcher Agent
    participant CS as K2 Enrollment/Control
    participant AI as Agent-Ingester
    participant K as Kafka
    participant K2 as K2

    Note over CS: Stores HPKE public keys<br/>(X25519) and PSKs<br/>per region

    L->>CS: Enrollment / Config Refresh
    CS-->>L: HPKE Public Key + PSK + psk_id<br/>(region-specific)

    Note over L: For each batch:<br/>Generate ephemeral keypair<br/>HPKE encrypts with PSK

    L->>L: plaintext = serialize(logs_or_results)
    L->>L: encapsulated_key, ciphertext = HPKE.Encrypt(<br/>  public_key, psk, psk_id, plaintext)
    L->>L: Package: {hpke_key_id, psk_id,<br/>  encapsulated_key, ciphertext}

    L->>AI: Send encrypted blob
    Note over AI: Cannot decrypt<br/>(no private key or PSK)

    AI->>K: Forward encrypted blob
    K->>K2: Deliver encrypted blob

    Note over K2: Holds HPKE private key<br/>(X25519) and PSK
    K2->>K2: Lookup PSK by psk_id
    K2->>K2: plaintext = HPKE.Decrypt(<br/>  private_key, psk, <br/>  encapsulated_key, ciphertext)
    K2->>K2: Process plaintext data
Loading


jsonData, err := json.Marshal(payload)
if err != nil {
logger.Log(ctx, slog.LevelError,
Copy link
Contributor Author

@zackattack01 zackattack01 Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed a bunch of logging in this flow because errors were being double-logged from the deferred logging above and via the error returned to publish callers


hpkeKey := lpc.getHPKEKeyForEnrollment(enrollmentID)
if hpkeKey == nil {
return nil, fmt.Errorf("no HPKE key available for enrollment '%s': %w", enrollmentID, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

err is nil here at this point, right?

Suggested change
return nil, fmt.Errorf("no HPKE key available for enrollment '%s': %w", enrollmentID, err)
return nil, fmt.Errorf("no HPKE key available for enrollment '%s'", enrollmentID)

Same suggestion for 172 / no PSK available for enrollment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yes good catch, i'll get those fixed up. thank you!

// and pings the osquery publisher to update its token cache if any updates were made.
// Any errors here are logged but this is all a best effort approach to avoid interrupting the enrollment flow-
// this data is also returned via control server, so we can retry if needed later.
func (e *kryptoEcMiddleware) persistAgentIngesterKeys(ctx context.Context, resp *callbackResponse) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a future PR, not this one -- since this update logic is pretty similar to the one in extension.go, probably worth pulling out and consolidating in the knapsack? That was the direction I went with EnrollmentTracker because I also had stuff duplicated here and in the extension, and making that update made it a little easier to write tests and keep track of changes to the update logic.

It looks like the knapsack even already has access to the osquery publisher, so you could remove e.osqueryPublisher from the middleware here and have knapsack handle the Ping call, too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah that seems much cleaner! will do in the next PR

)
}
}
e.persistAgentIngesterKeys(req.Context(), &r)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if I missed it in the updated RFD -- do we expect the HPKE/PSK to rotate? How long are they valid for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do expect them to rotate but don't have an explicit expiration anywhere yet, we will just mark them as revoked eventually in k2 and they will fall out of scope, I haven't implemented that yet. we have the key ids wired through everything so should be able to support multiple versions of active keys at one time

Comment on lines +72 to +76
// parse the public key
pkR, err := kemID.Scheme().UnmarshalBinaryPublicKey(hpkeKey.Key)
if err != nil {
return nil, fmt.Errorf("failed to unmarshal HPKE public key: %w", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this part be worth doing during refreshTokenCache and storing in another map protected by tokensMutex, so that we don't have to do this same unmarshal operation on every publish request?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I think that would be a good improvement! i'll add that for future work once this structure is all settled

@zackattack01 zackattack01 added this pull request to the merge queue Mar 12, 2026
Merged via the queue into main with commit 7eff1c4 Mar 12, 2026
44 checks passed
@zackattack01 zackattack01 deleted the zack/osq_ingester_encryption branch March 12, 2026 21:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants