Skip to content

Add Louvain community detection algorithm#453

Open
Becheler wants to merge 16 commits intoboostorg:developfrom
Becheler:feature/louvain-algorithm
Open

Add Louvain community detection algorithm#453
Becheler wants to merge 16 commits intoboostorg:developfrom
Becheler:feature/louvain-algorithm

Conversation

@Becheler
Copy link
Contributor

@Becheler Becheler commented Feb 2, 2026

Implement #447

  • Multi-level modularity optimization following Blondel et al (2008)
  • Supports custom quality functions (other than modularity) with policy-based design (extensions to come to propose alternative quality functions to match gen-louvain)
  • Incremental quality tracking
  • Lazy rollback
  • Vertex shuffling
  • Competitive with established implementations (I will work on making the benchmarks reproducible)

But:

  • documentation is not ready
  • example is not ready
  • just realized I forgot to test with different graph types + boost concepts
benchmark_comparison

@jeremy-murphy
Copy link
Contributor

Not sure why this error is occurring only for OSX 11.7 C++14. I assume this is not specific to your code.

In file included from ../../../boost/container/detail/operator_new_helpers.hpp:26:
../../../boost/container/detail/aligned_allocation.hpp:87:26: error: no member named 'aligned_alloc' in the global namespace; did you mean 'aligned_allocate'?
   return rounded_size ? ::aligned_alloc(al, rounded_size) : 0;
                         ^~~~~~~~~~~~~~~
                         aligned_allocate
../../../boost/container/detail/aligned_allocation.hpp:77:14: note: 'aligned_allocate' declared here
inline void* aligned_allocate(std::size_t al, std::size_t sz)
             ^
1 error generated.

Copy link
Contributor

@jeremy-murphy jeremy-murphy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments, more later.

Comment on lines 11 to 12
// Revision History:
//
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know embedded revision histories were a thing in the past, but I would prefer to keep all this metadata in git.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not like it either, I'm glad you don't ahah :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍🏽

Comment on lines 43 to 44
std::map<VertexDescriptor, WeightType> internal_weights;
std::map<VertexDescriptor, std::set<VertexDescriptor>> vertex_mapping;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally try to avoid hard-coding the choice of data structure, especially std::map and std::set, so instead of templating VertexDescriptor and WeightType we should template the whole map type, so users can use boost::unordered_map or some other kind of property map of their own choice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mhhh I see. I felt it was not in the BGL spirit to do so, and Joaquin also mentioned it, but I was not sure how to solve it without making the API heavy. Can we default to a concrete type to simplify user's experience ? Also, is it ok to use boost::unordered_map if it adds the constraint of key_type being hashable ? That was my idea behind using std::map

Copy link
Contributor

@jeremy-murphy jeremy-murphy Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, we can still (and should) make the user experience nice with defaults. That's the great thing, we get both benefits, the cost is more work on the part of the library authors. :)
Again, I think astar is an example, but instead of using param = arg in the function definition, make an overload, like so:

auto user_friendly_foo(graph const &g) {
ConcreteA a;
ConcreteB b;
return generic_foo(g, a, b);
}

Sorry, typing on my phone, so please ignore random syntax errors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Umm, yeah, we probably shouldn't add new constraints into the default interface. Users will too easily assume that the constraint is mandatory.
So maybe use boost::flat_map as the default and users can always use a hash map if they want to.

Comment on lines 68 to 70
std::set<community_type> unique_communities;
std::map<community_type, vertex_descriptor> comm_to_vertex;
std::map<vertex_descriptor, std::set<vertex_descriptor>> vertex_to_originals;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should almost certainly be input parameters taken by non-const reference so that the user a) decides their type and b) automatically gets their value at the end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So they gain access to the whole hierarchy. Ufff it's a lot of guts leaking out haha
Will do, and tell you in case of problemsm thanks again for your time !

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be wrong. But have a look at the astar API for some examples of prior art.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And on second thought, this is not a priority, we can always do it later. Getting it correct and fast are higher priorities.

this was not supposed to be commited :)
this was not supposed to be commited :)
//=======================================================================
// Copyright 2026 Becheler Code Labs for C++ Alliance
// Authors: Arnaud Becheler
//
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You retain the copyright, so no need to mention the C++ Alliance. Also, is "Becheler Code Labs" a real legal entity? If this is not the case, then assigning (C) to your physical person would be best.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍🏽

auto unfold(const CommunityMap& final_partition) const
{
assert(!empty());

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what the BGL convention is, Boost libs generally use BOOST_ASSERT instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really mind, but yeah, better to follow whatever is common practice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍🏽

current_nodes.insert(kv.first);

// From coarse to fine
for (int level = size() - 1; level >= 0; --level) {
Copy link
Member

@joaquintides joaquintides Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type mismatch, size() is a std::size_t. More idiomatic, but maybe not to your taste:

for(auto level = size(); level--; ) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍🏽


template <typename Graph, typename CommunityMap, typename WeightMap, typename QualityFunction = newman_and_girvan>
typename property_traits<WeightMap>::value_type
louvain_local_optimization(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this function supposed to be public?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not too sure, as it can theoretically be used as a one pass optimization step. But you're right it's obviously not the main usage, so I move it to details :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍🏽

// L_c = internal edge weight for community c
// k_c = sum of degrees in community c
// m = total edge weight / 2
struct newman_and_girvan
Copy link
Member

@joaquintides joaquintides Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If users can provide their own quality function other than newman_and_girvan, you probably need to define a concept LouvainQuality of sorts. Take a look at how's done at

https://github.com/boostorg/graph/blob/develop/include/boost/graph/astar_search.hpp#L56

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍🏽
I defined:

  • GraphPartitionQualityFunctionConcept
  • GraphPartitionQualityFunctionIncrementalConcept

bool has_rolled_back = false;

// Randomize vertex order once
std::mt19937 gen(seed);
Copy link
Member

@joaquintides joaquintides Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sould be made generic and passed by the user (with a default arg) as a URGB&&.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍🏽

// L_c = internal edge weight for community c
// k_c = sum of degrees in community c
// m = total edge weight / 2
struct newman_and_girvan
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this modularity thing a general concept or does it apply to Louvain only?

Copy link
Contributor Author

@Becheler Becheler Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a yes and no situation.

  • The modularity can be used outside of louvain to assess partition quality of a graph.
  • But the current implementation with incremental computations (remove, insert, gain) is particularly suited for Louvain.
  • Making it generally useful would require to disentangle the two aspects.
  • But I would rather do it in a clustering folder so we can have:
include/boost/graph/clustering/
├── quality_functions.hpp        # 10 incremental metrics/criterions for gen-louvain
├── label_propagation.hpp      # another clustering method (future work)
├── leiden.hpp                          # Leiden algorithm (future work)
├── louvain.hpp                        # Louvain algorithm
├── girvan_newman.hpp         # Edge betweenness clustering (currently in bc_clustering.hpp)

Copy link
Contributor

@jeremy-murphy jeremy-murphy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple more requests, still going...


// Track hierarchy of aggregation levels for unfolding partitions.
template <typename VertexDescriptor>
struct hierarchy_t
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one data member and no real invariants to speak off suggests that you don't really need this type.
I think unfold could be a free function, maybe in a private namespace depending on how reusable it is, that takes levels and final_partition as parameters.

By the way, please only use the _t suffix for type aliases, not for names of classes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍🏽

auto unfold(const CommunityMap& final_partition) const
{
assert(!empty());

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really mind, but yeah, better to follow whatever is common practice.

Weights must be non-negative.
</blockquote>

IN: <tt>URBG&amp;&amp; gen</tt>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to provide a default arg for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what it would be. And also it would differ from what i've seen in random.hpp utilities:
https://github.com/boostorg/graph/blob/3131c24630e42c79b43c1f32558041c219ab84b8/include/boost/graph/random.hpp

(e.g.&nbsp;<tt>std::mt19937</tt>).
</blockquote>

IN: <tt>weight_type min_improvement_inner</tt>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Louvain guaranteed to converge (i.e. to eventually stop)? If this is not clear, does it make sense to provide additional hard limits on the number of inner/outer iterations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand well it's guaranteed to terminate in theory:

  • the quality (modularity) is monotically non-decreasing through the algorithm
  • becasue the algorithm is supposed to only accept nodes moves and community merges that strictly improve (or does not decrease in some variations) the quality of the partition
  • modularity is bounded <=1
  • the number of partition is finite

That being said there is the case of large graphs and the trouble on floating point precision.

  • For very large graphs maybe it would make sense to have some sense of async task: "please dear louvain, aggregate this for some time and when I'm done with waiting give me the last aggreagated graph you had". But that sounds like a very different interface ?
  • The current API is still more flexible than igraph and genlouvain, that do not offer parametrization of the stopping condition (igraph has 0 for inner and outer thresholds and genlouvain has 10-6, fixed)

Am i making sense ?


<H3>Parameters</H3>

IN: <tt>const Graph&amp; g</tt>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The algorithm has the additional requirement that vertices are copyable, hashable etc., as they're internally stored in unordered_sets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I have been changing the vertices handling in this aspect because it was not friendly with some types of graphs. The interface now takes a VertexIndexMap but I still have to commit those changes, sorry 😓
I will update the documentation in that sense once I merged the new stuff

#include <boost/unordered/unordered_flat_set.hpp>
#include <boost/container_hash/hash.hpp>
#include <algorithm>
#include <iostream>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this #include needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hem. Nope 😄

#include <iostream>

// Hash specialization for std::pair to use with boost::unordered containers
namespace std {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big no-no :-)

  • It's forbidden to specialize hash for types other than user-defined ones, which std::pair is not.
  • Boost.Unordered does not use std::hash by default, but boost::hash. Your temp_edge_weights works because boost::hash works off the shelf with pairs, so the std specialization is not even used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oooooh 🥲 Thank you, will remove

return Q_new;
}

/// @brief Fast version, requires the QualityFunction to implement GraphPartitionQualityFunctionIncrementalConcept
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this comment is wrong, this version does not require the quality function to be incremental.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments