Skip to content

Review Notes #4

@dennis-tra

Description

@dennis-tra

@cortze my notes for the bitswap sniffer:

  • rather use in-memory leveldb instead of the mutexwrap(mapDatastore). The latter has very bad performance. It would be even better if that was not in-memory but really on disk + a metric that monitors the datastore size. This could become a difficult to debug issue if we don’t have visibility here.
  • What does httpWorkers control? It’s set to 1. I guess it’s not relevant for us, is it?
  • I think 15s discovery interval is quite frequent. Decrease to 1m?
  • You’re adding to the bitswapStatsCount but it should be “set” to the value you get from the bitswap stats. This can only be done if the metric becomes a gauge instead of a counter I believe.
  • you can preallocate the sharedCids slice. This will release some pressure from the GC. Set it to make([]SharedCid, 0, len(x)+len(y)+…) .
  • In that same streamCid method: for haveCids and dontHaveCids you’re looping over the freshly initialized slices. However, you don’t need these extra slices per loop at all. You’re only using them for the log message where you can use the original slices.
  • nit: I’d prefer named fields when initializing the SharedCid struct. This makes it clear which value is assigned to which field instead of relying on the implicit order.
  • I would also just render the t.producer.String() once and use it in every loop.
  • Check if the sniffer appealing function really sends out a message. It very very probably does but still worth checking?
  • You’re only generating a single random CID. Doesn’t it make sense to create new ones? Maybe it doesn’t trigger a new message when the same cid is used for GetBlock? If generating new ones does indeed make sense, we should control for an unbounded wantlist and perhaps remove old CIDs eventually again
  • Some lifecycle comments:
    • To me it looks like the root context is properly passed everywhere! This makes things much easier
    • when closing the closeFlusher channel, block until all workers and the internalFlushingLoop have exited. This is usually done with an additional channel or a waitgroup. I’d prefer a channel because then you can also select on a time.After to configure a timeout and log an error message if the go routines were not cleaned up on time. That way you’ll still continue the shutdown procedure.
    • Similarly, the lifecycle of the discovery.Serve, cidConsumer and makeSnifferAppealing go routines isn’t managed on shutdown.
    • The host isn’t shut down
    • ^^ all of the above isn’t really necessary to fix (especially because it’s annoying to get right) because the process stops a few milliseconds later anyway. I just wanted to leave it here to indicate what I was looking for
  • Could you add a justfile recipe to start a local clickhouse database that the sniffer would directly write data to?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions