Markov Clustering Algorithm

Markov Clustering Algorithm

During one of my epiphanies studying subjects related to data science, I’ve stumbled across a not-so-well-known clustering algorithm. After a few researches on that topic, I realized that I would benefit from a visual tool to test its parameters and evaluate their influence.

In that occasion, I was studying web development, hence I decided to exercise my knowledge in a hands-on project. Take a look at the result !

Static image of the webpage

“The MCL algorithm is short for the Markov Cluster Algorithm, a fast and scalable unsupervised cluster algorithm for graphs (also known as networks) based on simulation of (stochastic) flow in graphs.” - van Dongen, the author. Check the author’s page for more information.

The original method has only two parameters (Inflation and Prune Treshold):

  • The purpose of the inflation parameter is to reduce the connections across small communities - “rich get richer, poor get poorer”.

  • The prune treshold accelerates convergence by removing weak connections at each iteration.

The original algorithm has some known limitations: outputs many small clusters and does not scale well. To solve the first problem, Srinivas Parthasarathy proposed the ‘regularized segregation method’ that allows more control over cluster sizes. He has also proposed solutions to enable the algorithm to scale better, but those were not considered in my project.

Avatar
Andre Sbrocco Figueiredo
Engineer & Software Developer

Andre is a young mechanical engineer looking for an innovative project to embrace.