P2P For Web Devs, Part 1.1: dht-rpc
Source: Dev.to
Previous articles: P2P For Web Devs, Prologue: Why should I care? P2P For Web Devs, Part 1: Networking dht-rpc
dht-rpc is a Kademlia based DHT implementation. Quick refresher: Distributed Hash Table (DHT) stores data across different nodes in some kind of shared key space. Kademlia defines how we store data and find the exact node that holds the hash key you are looking for. It’s one of the ways to implement a DHT. Let’s build our own DHT and see what happens. I might mention some things that might make no sense yet, but as you read this article we will go back to those concepts and it will click. We have absolutely nothing now. Let’s create the very first node. import DHT from ‘dht-rpc’ const node = new DHT()
This single new DHT() statement does A LOT. It’s an instance of EventEmitter that during constructor execution will: Create a kademlia routing table
Open UDP sockets and start listening. Starts watching network interfaces, health, and ticking to react to changes and maintain the routing table. Tracks all the nodes it’s seen and keeps track of their liveliness. Probes the network to find closest nodes to itself. Now, at this stage this is a bit useless. It does a lot and at the same time nothing, because we are lacking some key parts. E.g. nodes identify themselves through a key that is a hash of your public ip and port number. dht-rpc does that for you, it automatically asks another node to ping you, this way getting public IP and confirming that the node is accessible. But you cannot do that because there’s no other nodes at the moment. Also, nodes start as ephemeral by default. Meaning even in the case of it being publicly accessible, other nodes won’t add it to their routing table until the node is deemed stable enough. An ephemeral node functions more as a client, it still interacts with the DHT and can query other nodes for information, but it is not discoverable by other nodes. So our new DHT() is essentially a client that did not connect to anything. It is technically running, but not really functioning. Let’s fix it. We need to have some kind of initial node that can be used by others to join the network. dht-rpc offers a super simple solution for it. E.g. if I want to work in localhost on port 10001 I can do this: const bootstrap = DHT.bootstrapper(10001, ‘127.0.0.1’)
It works exactly like any other DHT node, if you look at the code, it’s a basic new DHT() where we identify that this node: is not ephemeral is not behind a firewall as others need to be able to access it we provide the port number and ip address so that the node can identify itself correctly with a hash(ip+port)
we say that port is static So this bootstrap node is now live, listening for requests on the port we specified and identifies itself with a consistent id. At this point in time its routing table is empty as there’s no other nodes. Let’s create a node1 that uses this bootstrap. const node1 = new DHT({ bootstrap: [‘localhost:10001’] })
The node1 initializes its state, the routing table. Currently everything is empty. At this moment your own ID is random, instead of hash(ip+port), as you don’t know your own ip and you don’t have assigned port. Initializes networking, opens sockets on OS-assigned ports (as we did not explicitly specify them), starts network watchers and ticks to periodically ping known nodes. Send a UDP message for FIND_NODE to the bootstrap node, asking to find your own id(the randomly generated one). Once you receive a response, put that response sender into your routing table. The previously initiated ticks now will start periodically pinging bootstrap because our routing table now has the bootstrap node in it. We need to keep pinging to track their liveliness as nodes can go down any time. If you look at the bootstrap node - its routing table is still empty. Even though it is communicating with node1, bootstrap node does not add it into the routing table which makes node1 not discoverable by other nodes. That’s because node1 starts as ephemeral. After around 20 minutes of running, node1 will try to upgrade itself to a non-ephemeral node. It will do it by: Send “PING_NAT” messages to up to 5 known nodes. In our current state we know only about the single bootstrap node. In our message we indicate our server socket port number. If we receive pings on the server socket, that means we are not behind a firewall. And if our IP is stable through all of the pings that have been happening, we upgrade from ephemeral into a proper node. We change our ID from the randomly generated one into the hash(ip+port). Because our id changed, we re-bootstrap to get the nodes closest to our new id, by doing the FIND_NODE request again. Resulting in the latest routing table. After this, if you check in routing table of the bootstrap - our node1 is there! Meaning now bootstrap will also track node1 and keep pinging it. Let’s add a new node, let’s not wait for 20 minutes, and just indicate that this node is not ephemeral. const node2 = new DHT({ bootstrap: [‘localhost:10001’], ephemeral: false })
If you run this, you will see that node2 immediately learns about bootstrap and node1, and they also immediately become aware of node2. Even though our node2 was never explicitly configured to track node1, the data propagated through out entire network where all 3 nodes know about each other. Let’s kill this node2. After some time, you will see that node2 disappears from bootstrap and node1. So all those pings let’s the nodes in the network to know when another node went down, to correctly remove it from tracked nodes. Let’s spin up a 100 nodes. const clients = Array.from( { length: 100 }, () => new DHT({ bootstrap: [‘localhost:10001’], ephemeral: false }) )
If you look at the bootstrap node, you will notice that its routing table does not contain all 100 nodes. And if you keep spinning up new nodes, the bootstrap tracked nodes will increase but it will increase slower and slow
er. I got up to 800 nodes on my machine with bootstrap tracking 123 nodes before my poor Macbook Air started overheating. In the last 100 nodes that were added, the bootstrap added only 2 new nodes into its routing table. Why? If every node had to keep track of every single node, ping each other, etc, this wouldn’t be able to scale. Kademlia defines how we structure our routing table and how we can still find who we are looking for. A few times I mentioned “distance”. That’s not geographical distance, that’s a mathematical distance between the hash keys, the hash(ip+port) we mentioned. And the routing table groups seen nodes based on this distance, where at certain point you stop tracking newly added nodes in specific distance range. If a node gets a request for a hash, it’s fine if it doesn’t have that hash in its routing table, it just needs to return the closest nodes to the searched id that it does have. This way through a few hops you can get closer and closer to what you are looking for until you find it. This is also the reason why during bootstrapping you do the FIND_NODE request on your own id - you are trying to fill your own routing table by finding nodes that are close to you. This is a very rough overview, if you want to go deeper there’s a great article on it here. So we have our distributed hash table running, we have many nodes connecting to each other, identifying themselves with a hash and we are able to search for any hash. While that is cool, it’s not very useful on its own. Good thing is, the library is called dht-rpc and not just dht. So it allows us to make the nodes do whatever we want through RPC. A “trick” to it, is that the nodes ids and the data or actions we execute, are all on the same key space so exactly same way of finding nodes close to yourself, can be used to find nodes closest to data. Two clients don’t need to know anything about the current state of entire DHT, they can just query the DHT on same hash(whatever) and those two clients will together interact with same node that is closest to this hash. As nodes enter and leave the entire DHT, what is the “closest” node changes over time. Let’s modify the nodes script, to handle custom requests: const nodes = Array.from({ length: 100 }, () => { const node = new DHT({ bootstrap: [‘localhost:10001’], ephemeral: false })
node.on(‘request’, function (req) {
const myId = node.id.toString(‘hex’).substring(0, 5)
req.reply(Buffer.from(${myId} did something))
})
return node })
And now the client: const client = new DHT({ bootstrap: [‘localhost:10001’] }) await client.fullyBootstrapped() const query = client.query({ target: Buffer.alloc(32), command: 0 }) query.on(‘data’, (x) => console.log(x.value?.toString())) await query.finished()
We remove the ephemeral: false as this is a client to query the DHT, not be part of it. The target is the key we are looking for, so we care about the node closest to the target. command is just custom number that we can access from the nodes, that’s how you implement branching logic for different actions. Now if we run this, we get: … 46fc3 did something 1d4c6 did something 12ecc did something 0f40c did something …
We want to find the node closest to the target, but instead we got data from every single node that got traversed. If you run this client a few times, you will get different outputs, because even though you look for the same target, the traversal changes as each run generates a unique starting routing table. And it depends on the network, which node will respond first. While that is not what we need exactly, it’s still good to know that we are able to do this, you can inject code in the entire DHT traversal logic, and you can capture responses from that traversal on the client side. To actually know which response is from the closest node, we can add this line: console.log(‘closest reply’, query.closestReplies[0].value.toString())
Now, if you run this many times, you will get different results from the query as the traversal is different, but the last line will always be the same. It’s always the same node that is closest to your target. If you randomize your target, like target: crypto.randomBytes(32), you will start seeing different closest reply between runs. Reading replies when there’s nothing written is not very useful. Or if you need to execute any other action just from the closest node and not the entire traversal. Let’s clean up the entire solution and implement the proper logic. New DHT nodes: const nodes = Array.from({ length: 100 }, () => { const node = new DHT({ bootstrap: [‘localhost:10001’], ephemeral: false })
const state = new Map()
node.on(‘request’, function (req) { const target = req.target.toString(‘hex’) switch (req.command) { case 0: // Read req.reply(state.get(target)) return case 1: // Write const previousValue = state.get(target) state.set(target, req.value) req.reply(previousValue) return } return req.error(20) // Our own defined error code })
return node })
We store the state in memory, as a simple Map where the key is the target. So you can imagine a single node might be closest to multiple targets and store all of them. command property, so clients can indicate what they want to do and requests can handle it. Write command also responds with the previous value. Now, let’s implement our client: const client = new DHT({ bootstrap: [‘localhost:10001’] }) await client.fullyBootstrapped() const target = crypto.createHash(‘sha256’).update(‘data key’).digest() const q = client.query({ target, command: 0 }) await q.finished()
const res = await client.request(
{ target, command: 1, value: Buffer.from(‘write me!’) },
q.closestNodes[0]
)
const id = res.from.id.toString(‘hex’)
console.log(${id} responded with ${res.value?.toString()})
First, we create target key as a sha256 hash. Before I did just a manua
l buffer of 32 bytes, which does work, but we want to have uniformly distributed keys. You want to make sure you are not accidentally hitting collisions and that the keys are distributed well between nodes and not accumulating among just a few of them. Then we do exactly same client.query like before. This time I do not care about what it is responding, I just want to get the closest node to the target. After the query, now I can do client.request. The request body is almost same as it was with query, except our command changed to 1 and we added a value. But the most important thing is that request allows us to pass a specific node, so we can use the q.closestNodes[0] from the read query. Unlike a query that traverses the DHT, request will make just a single call to the specified node. It hits the same callback we configured in our nodes logic. If you run the client a few times, you will get the same id each time, as it is same node that is closest to target. And you will get undefined as value first time, and write me! the second time. You can make exactly same write request but with a query, that is completely valid, but you will end up with many nodes doing that same write as the DHT is traversed. We have strong building blocks already, but there’s one issue. What happens if the node that holds data disconnects? Or, even worse, nothing bad happens, instead a new node joins and it happens to be closer to the target. In that case subsequent closestReplies[0] requests start showing undefined even when you had data. Your expectations break even if none of the nodes broke. Good thing, DHT allows us to commit when we do a query. const q = client.query({ target, command: 0, value: Buffer.from(‘write me!’) }, { commit: true })
The query looks almost identical to before, except we added { commit: true }, and the value that before was on client.request now is in the client.query. When you indicate that a query should be committed, once the initial pass through travels the DHT triggering all your traversal requests and populating the closestNodes/closestReplies, it automatically executes same request again but only to the 20 closestReplies and this time it passes the token. The token is a simple hash of client IP and a secret that got returned by the node during response of DHT traversal. If you make a new request to same node including the token you got from previous request, it gets validated that the hash is correct for your IP address. This gives you a super basic and lightweight auth if you want to track and control who can write or access the data(keep in mind IPs are not static and clients can share them, so it’s not appropriate for anything serious). Or you can use it as a simple proof of commit, meaning your node can know that it was recognized as one of the “nearest” to the target. Here’s how our new node request handler looks: node.on(‘request’, function (req) { switch (req.command) { case 0: // Read or Write const target = req.target.toString(‘hex’) const previousValue = state.get(target) if (req.token) { // this is a commit, we write state.set(target, req.value) } req.reply(state.get(target)) return } return req.error(20) })
Compared to before, we combined our separate commands for read and write into a single command. Instead, we know it is a write if a token exists. Token does not need any validation because DHT itself already validated it.
With this, if you run our query, 20 nodes closest to the target will store the value into their state. Now, even if the closest node dies, you still have your data in other nearby nodes.
You now also do not need a separate client.request for the write, everything is done in this one query.
Now, we can also adjust how we read data, instead of looking at q.closestReplies[0] like we did before, we need to find any node that returns data instead.
for await (const res of client.query({ target, command: 0 })) {
let value = res.value?.toString()
if (value) {
console.log(${res.from.id.toString('hex')} returned ${value})
break
}
}
Note that in this query, we do not pass commit: true and the value. We simply read the responses as they come in, once any of the nodes returns it - we break early and are done. This way even if a new closest node does not have the value, it does not matter. If you run the current code a few times, you will notice that between runs you get different node responding with the value but still in the same subset of nodes. We did a lot! We got a very powerful building block that allows us to connect arbitrary amount of machines without a special coordinator. And we got a powerful RPC logic where we can make these nodes do whatever we want. While the examples shown here do work, it is not production code. It has problems like not necessarily getting the latest data, not dealing with malicious nodes, not having encryption, etc. The goal was to show core patterns that give you ideas on what you can do, not how to do production ready DHT. We will see how some of these problems are solved in the other modules that build on top of dht-rpc.