Working With Dense Graphs

Many problems require us to deal to dense graphs. I’ll discuss a general technique on how to solve such problems in sub-quadratic time. While this isn’t anything mind-boggling, I find this technique to be extremely useful and surprisingly less-known among competitive programmers.

For whose who don’t know what dense graphs are, they are essentially graphs with $O(n^2)$ edges. The opposite of dense graphs are called sparse graphs, which usually contain $\approx O(n)$ edges. Now, you might be wondering how is it possible to deal with dense graphs in sub-quadratic time while there are already quadratic number of edges. The catch is this technique is only applicable in problems where the edges are not explicitely given, rather we can decide about the edges based on some given condition. Take this for example:

There are $n$ points in Cartesian plane and an integer $d$. The $i$ th point has coordinates $(x_i, y_i)$. There is an edge between point $i$ and point $j$, iff their manhattan distance is no greater than $d$. In other words, [\mid x_i - x_j \mid + \mid y_i - y_j \mid \leq d]Find out how many connected components are there in this graph.

As you can see, this will be a extremely dense graph ($n \choose 2$ edges in the worse case). However, the edges are not explicitely given, and we can find out the edges based on the given condition. For this, it is an ideal problem to apply our technique. If everything we’ve discussed is satisfied, we can run any algorithm in sub-quadratic time that uses only DFS/BFS. This includes:

Finding the number of connected components
Bicoloring the graph
Strongly connected components (SCC)
Shortest path (if the graph is unweighted)
Topological sort and many more

Okay, enough hype. Now let’s get to the actual trick. But first, we need a data structure. For convenience we will call it a black box ds, because it can be anything depending on the problem. Initially, our black box ds will contain all the nodes of the graph. For our trick to work, it must support the following operations:

Delete(u): It deletes everything about node $u$ from the DS.
Find(u): It returns any node $v$ from the DS (which is not yet deleted) such that there is an edge $(u, v)$. If there is no such edge return $-1$ (or any other sentinel value).

Easy enough, right? Note that we don’t need to insert anything, we only need the deletion operation. Now, we can write the pseudocode of our DFS like this:

def dfs(u):
	Delete(u)
	while True:
		v = Find(u)
		if v == -1: break
		else: dfs(v)

Well, that was easy. It is essentially a different way of implementing DFS. It does everything a traditional DFS can do. The Delete() operation is analogous to the visited array we keep in the traditional implementation. And instead of using adjacency list directly, we are using the Find() function to get an adjacent node.

Did you notice one thing? After we’ve found node v in Find(u), we have called dfs(v) immediately after that. So v gets deleted right away after it was discovered. In fact, all calls to our Find() function will return different nodes over the course of whole algorithm. In other words, each node will be discovered exactly once and deleted exactly once. If the complexity of Find and Delete operation is $T$, the DFS will run in $O(nT)$ time!!

Now, all we have to do is create the black box DS. For example, in the above problem, we define our DS like this:

Initially it will contain all the points
Delete(u): deletes u from the DS
Find(u): returns the closest point from u (in manhattan distance). If the closest point is further than $d$, then returns -1.

Creating a DS that does these is possible, but a little hard for our scope. For interested readers: we can first convert the manhattan distance to chebyshev distance through some inequalities. Then we can use merge sort tree for the Find() and Delete() operations.

For our purpose, let’s discuss an easier problem:

Given an undirected graph $G$ with $n$ nodes and $m$ edges, find the number of connected components in the complement graph of $G$. Both $n$ and $m$ can be up to $10^6$.

Alright, so this is definitely a dense graph, since we have $n \choose 2$ $- m$ edges. Our black box DS will be std::set! This is how it works:

Initially the set will contain all the nodes.
Delete(u): deletes node $u$ from the set. (set.erase(u) basically)
Find(u): This is a bit tricky. Let $w$ be the node with the maximum label in the adjacency list of $u$ (we are talking about the adjacency list of $G$ here, not the complement graph). Find out what is the smallest node greater than $w$ that is in the set. If you are still confused, this is basically *set.upper_bound(w). This node will definitely be connected to $u$ in the complement graph. So we can safely return this node. But what if there’s no such node: ie $w$ is the largest node present in the set. In that case, remove $w$ from the adjacency list of $u$ and repeat the same step. Note that, there will be at most $\text{deg(u)}$ removals for node $u$, therefore there will be a total of $\sum_{i=1}^{n} \text{deg}(i) = 2m$ removals over the whole algorithm. So, it doesn’t break our complexity.

Here, both of our operations has an amortized complexity of $O(\log{N})$. So we can solve the problem in $O(n \log{n})$ time.

Final Notes

This trick works in both directed and undirected graphs and in both dfs and bfs. It may be sometimes applicable in Dijkstra’s algorithms as well, although I have not found a way to generalize this for every problem. There is another cool trick for handling Dijkstra. See this problem for example.

This trick is also not that great for finding minimum spanning trees. For that, I encourage you to learn Borůvka’s algorithm. Really cool stuff!

Some exercises

Written on January 14, 2021