Unraveling the Blockchain: Examining Anomalies in The Block Chain with AI and Network Theory
24 Jan 2025
Setting the Scene
Over the past decade, blockchain technology has permeated various industries, from logistics and finance to healthcare and cybersecurity. As its adoption has accelerated, so too has the generation of vast amounts of data within blockchain ecosystems. This surge in data can often be intimidating and overwhelming to analyse. But fear not—we’re here to guide you on a simple journey of unraveling blockchain data in a digestible and insightful way.
What is the blockchain?
The blockchain is a decentralised digital ledger, securely recording interactions across a network of computers, making it extremely difficult to tamper with information due to strong encryption and consensus mechanisms. This ensures transparency and trust across the network without a central authority.
Analysis
The scope of this investigation is to detect significant anomalies that may identify fraudulent activity on the blockchain network. This will be achieved by detecting anomalies through using machine learning to identify unusual transaction patterns, then synthesising the network through creating hubs to then identify which anomalies are most significant.
One of the most prominent applications of blockchain technology is in cryptocurrencies. Within Australia the ATO estimates 600,000 tax payers have invested in cryptocurrency.
Ethereum, a major player in the blockchain space, hosts numerous programs and applications on its network. Fortunately, Google offers BigQuery data on Ethereum transactions, providing us with a rich dataset to explore anomalies in transactions—anomalies that could potentially signal fraudulent activity.
Let’s start by visualising 15,000 transactions, where each node represents an address and each edge a transaction between addresses.
As expected, the graph is densely packed with information. At first glance, it’s challenging to discern significant patterns or anomalies within this vast network.
Visually, we can notice that there are some addresses that make many transactions with different addresses and there are many addresses that make one or two transactions with other addresses. However, this doesn’t communicate which transactions are anomalies, or which transactions are significant relative to other transactions. To tackle this complexity, we leverage the power of data science, using computers to run anomaly detection algorithms. After some careful feature engineering, we employ the popular method of Isolation Forest to detect statistical anomalies algorithmically.
Now that we have detected anomalies, the data still remains difficult to parse in terms of identifying which anomalies are most critical when considering the network’s structure, time of transaction and value.
So, to determine which anomalies are most significant holistically we break the network into hubs or communities. This idea originates from social phenomena that social groups (or in this case transactions) occur within communities. Using the popular Louvain community detection algorithm we create a network that is much easier to digest.
In this refined network, the size of each node represents the number of nodes within the community, while edge width indicates the number of nodes connected across communities. Red nodes highlight the presence of anomalies within these communities.
Now, in this network we can now delve into the structure of the network to identify which nodes influence the network’s structure the most. To do this we use some centrality measures.
Next, we examine the top communities for each of these centrality measures, highlighting the most significant communities within our larger network. This reduction brings us from approximately 13,000 nodes to 3,000 with anomalies still marked in red. This network comprises the nodes with the greatest influence within our larger community.
Finally, we analyse the structure of this network once more, extracting the nodes with the highest centrality scores. This analysis unveils the most significant addresses and transactions within the Ethereum blockchain network across our set time period, allowing us to create a network visualisation of these crucial transactions.
This image is interactive
The nodes above are the most significant in the network, and this network explains the transactions between these addresses.
While we can extract some crucial information about how these key nodes interact with one another, given that there aren’t many interactions across these addresses, we will create a network with all the transactions that these key nodes partake in. In the below network the red nodes are anomalies and the edge colour represents the time of transaction, with dark blue being the most recent transactions and light blue being the most old transactions.
This streamlined network reveals the most critical transactions across the network. Notably, some of the anomalies centering many transactions were found to have strange transactions, where many of the account transactions with the address would only ever have one transaction, exclusively with that wallet. This could indicate legal scenarios such as an airdrop or gaming rewards. However, this could also indicate a scam or money laundering.
Through the process of simplifying and analysing blockchain data, we’ve identified key interactions and uncovered significant anomalies using machine learning models. This approach not only demystifies the complexity of blockchain data but also sheds light on a methodology to determine anomalies and strange patterns occurring on the blockchain.
Use cases
Financial institutions can use this methodology to detect potentially fraudulent activities in cryptocurrency transactions. Identifying anomalies in transaction patterns can help in flagging suspicious accounts or activities that deviate from normal behavior.
Regulators and compliance officers can utilize this approach to monitor and investigate potential money laundering schemes such as AUSTRAC. By analysing transaction patterns and anomalies, they can identify illicit activities that might be concealed within the blockchain.
Healthcare systems can use insights from this analysis to flag any unusual behavior in the changing of a patient’s medical diagnosis, or a doctor’s behavior in changing patient information.
Closing remarks
By leveraging network theory and data science we have brought a complex network of 15,000 transactions to a dozen anomalies. This approach not only helps to provide insights from blockchain data but any complex system interactions.
We hope that this enlightens you to be aware of powerful methodologies to analyse complex systems, and unravel the blockchain network to identify suspicious transactions.
Frequently Asked Questions (FAQs)
What is blockchain technology and how does it work?
Blockchain technology is a decentralized digital ledger that securely records transactions across a network of computers. Each transaction is grouped into a "block" and linked to the previous block in a chain, forming an immutable record. This structure is protected by cryptography, making it extremely difficult to alter information. Blockchain enables trust and transparency without the need for a central authority, making it suitable for industries like finance, healthcare, and logistics.
How can machine learning detect anomalies in blockchain transactions?
Machine learning can detect anomalies in blockchain transactions by analyzing patterns in transaction data. Techniques like Isolation Forest and other anomaly detection algorithms identify unusual behaviors, such as irregular transaction volumes, unexpected transaction timings, or abnormal addresses. These anomalies can signal fraudulent activities, such as money laundering or scams, allowing for early detection and investigation.
How can blockchain data analysis help detect fraud and money laundering?
Blockchain data analysis helps detect fraud and money laundering by examining transaction patterns for unusual activities. By using machine learning models and network analysis, analysts can identify transactions that deviate from typical behavior, like large transfers to suspicious addresses or circular transactions. This allows financial institutions, regulators, and compliance officers to flag potentially illicit activities within the blockchain, facilitating quicker responses and investigations.
What is community detection in blockchain networks and why is it important?
Community detection in blockchain networks involves grouping related nodes (addresses or transactions) based on their connections. This helps simplify complex transaction data and identify key clusters or "hubs" within the blockchain. Using algorithms like Louvain, community detection helps highlight significant patterns and anomalies in the network, making it easier to pinpoint potential fraudulent activities or influential transactions that could require further investigation.
This post was written by Jack, with support from Matthew.
About EdgeRed
EdgeRed is an Australian boutique consultancy specialising in data and analytics. We draw value and insights through data science and artificial intelligence to help companies make faster and smarter decisions.
Subscribe to our newsletter to receive our latest data analysis and reports directly to your inbox.