Kademlia

Kademlia

To join the xoken network a new node must be able to communicate with some existing nodes on the network. This process is known as Peer Discovery. In Distributed Systems usually Distributed hash tables (DHT) are used to look up other nodes in the network.

The Arivi P2P Suite uses an improved version of Kademlia, which is an implementation of distributed hash table for decentralized peer-to-peer computer networks, for Peer Discovery. Each node maintains a hash table that contains a list of NodeIDs and their routing information. Conventional implementation of Kademlia uses XOR-based distance metric to find its neighbouring nodes. But Arivi P2P Suite improves on the this conventional version of Kademlia by a system of verifications to give better resistance to common threats faced by Kademlia as discussed in the later sections.

Overview of Conventional Kademlia

Joining the network

So any new node which has yet to register itself on the network connects to what we define as a bootstrap node. A bootstrap node is a node whose ID and address is already known at the time of initialization of the new node either provided by the network or randomly found. If the joining node has not yet participated in the network, it computes a random ID number such that it is not already assigned to any other node. It uses this ID until leaving the network.

The joining node inserts the bootstrap node into its k-buckets (list of k-nearest neighbours). The joining node then does a FIND_NODE of its own ID against the bootstrap node (the only other node it knows). The "self-lookup" will populate other nodes' k-buckets with the new node ID and will populate the joining node's k-buckets with the nodes in the path between it and the bootstrap node. The peers in the k-bucket are called neighbours.

The list of neighbours should be maintained in such a way that these nodes are online and any node from the network can receive our messages.

Protocol Messages

Conventional Kademlia implements the following protocol messages which are necessary for routing in Kademlia.

PING

This is used to check if a said peer is still in the network. It is usually followed by a PONG message.

PONG

This is used to reply to a PING message.

FIND_NODE

Request network address of node with given ID. After sending this message the node would expect to receive a FN_RESP message with a list of nodes closest to the requested one (including the requested node itself). This is done by finding the closest buckets to Log2(XOR(Target_Node, SELF)) And then selecting the closest ( ie XOR(Target_Node, An)) a peers in the said buckets.

FN_RESP

Contains the list of nodes closest to the requested node in response to the FIND_NODE.

Security Issues in conventional Kademlia

Eclipse Attacks

An eclipse attack is when most (if not all) of a node’s peers are malicious and they prevent the node from being well-connected to the network to obtain information about transactions it might be interested in. To launch an eclipse attack an attacker needs launch numerous nodes whose node ids are closer to the victim node by XOR-distance measure. But in Arivi architecture it is a difficult process to generate NodeIds which are closer to the victim as the attacker will have to solve ed25519 and curve25519 ciphers to find a closer node.

Also against an elliptic curve, the fundamental operation that is performed during the attack is an elliptic curve point addition; this is significantly more expensive than testing an AES key

Sybil Attacks

A sybil attack is where a malicious actor is trying to spam the network with nodes that they control attempting to subvert the network's reputation system. Since Arivi uses kademlia only for peer discovery, sybil attacks against it can only disrupt the routing between peers. In particular two malevolent behaviors are considered: sybils asked for next hop can answer back with an empty list or they can respond with a list made of randomly chosen peers.

Possible Solutions

Proof-of-work

A Proof-of-work could be applied to the NodeId generation which will make it computationally costly for the attacker to generate multiple nodes quickly. But this is not a feasible solution as every node does not have equivalent computation power. Therefore if the POW’s difficulty is high then mobile devices will face a bottleneck for joining.

Another way can be by attaching a challenge with each request, but this will greatly decrease the network efficiency. Hence both of these are not implemented.

Centralized Authentication

A Centralized Authority to publish Authentication certificate could make sybil attacks near to impossible but this works against the principles of decentralization and a truly permission-less network cannot implement it as such.

Below we introduce our improved Kademlia which successfully defends the network against most eclipse attacks.

Verification Based Kademlia Routing

Introduction

Kademlia in Arivi is mainly used for two purposes, ie :
  1. Peer Discovery
  2. Maintaining the k-buckets with honest nodes that are used by the P2P layer.

These purposes can be served by the implementation of the following proposed system which incorporates verification of a node by some other neighbours.

Kademlia maintains a set of k-buckets with the following constraints -
  • Each node will only keep a select number of nodes from a particular IP address.
  • The P2P layer can issue KICK command for any node which it evaluates to be dishonest and Kademlia then removes it from the kbucket and adds to a blacklist.
  • Each FIND_NODE(D) is returned with a closest neighbours to D and D if it exists in the kbuckets.
  • The size of kbuckets is unbounded. But the nodes regularly purge nodes that are inactive from their k-buckets.

Protocol Message

VERIFY (Target_NodeID, For_NodeID)

This asks a particular node to issue a FIND_NODE command for the For_NodeID on the Target_NodeID and checks if it returns the requesting node’s NodeID.

V_RESP ( Found)

This is used as a response to a VERIFY message where the list of NodeIDs closest to For_NodeID signed and recieved from the Target_NodeID is returned to the node requesting VERIFY.   Note - FIND_NODE, FN_RESP, PING, PONG work the same way. - IF any node gets PING or FIND_NODE from any random arbitrary unknown node, it will add it to its kbuckets and then initiate its own verification process.

Joining the Network (CONT)

Each node A joining the network it will first do a FIND_NODE(SELF) on BOOTSTRAP NODES which are predefined. It will the fill its k-buckets with all the NodeIDs. Each entry in the k-buckets will have a STATUS { Verified | Unverified } (basically a FLAG) associated with it.

All newly added nodes will have STATUS (UNVERIFIED) at the start. BOOTSTRAP NODES will have STATUS (VERIFIED). VERIFIED nodes will be used for all outgoing requests from the P2P layer of the node whereas UNVERIFIED are not used for any outgoing P2P Layer purposes but its incoming P2P Layer requests are tolerated and answered at a lower priority

For each Unverified node Z, A will first PING it and wait for a PONG response. If PONG is not received then Z is removed from the kbuckets. If PONG is received A will select a random node K such that its STATUS(K) == VERIFIED.

A will then send a VERIFY(Z) request to K. Since A has Already ‘talked’ to Z. A should ideally get a list in the V_RESP such that it itself is also there.





IF the list contains A and nodes close to A then STATUS(Z) = VERIFIED.

ELSE IF K returns an empty message the process is repeated after choosing a new random node until the new K returns a signed message from Z for a maximum of p times.

IF after p times not even a single singed message of Z is received, STATUS(Z) = UNVERIFIED

IF the signed message from Z received through K does not contain A or is empty list or such then Z is removed from the kbuckets.

This Verification process is repeated after t time intervals and any UNVERIFIED node that remains UNVERIFIED for t0 time interval is removed from the kbuckets.