Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to run a DHT Provide Successfully #2517

Open
jtsmedley opened this issue May 1, 2024 · 5 comments
Open

Unable to run a DHT Provide Successfully #2517

jtsmedley opened this issue May 1, 2024 · 5 comments
Assignees
Labels
kind/stale need/author-input Needs input from the original author

Comments

@jtsmedley
Copy link

  • Version: 1.4.3
  • Platform: Linux 6.5.0-28-generic 29~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Apr 4 14:39:20 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

  • NodeJS: v20.10.0

  • Subsystem: kad-dht (12.0.14)
  • Environment Variables: DEBUG=libp2p:kad-dht:*

Severity: High

Description:

  1. Attempted to provide a file to the DHT using Helia and found that I could not get a successful provide to the network
  • Always crashes with the code ERR_QUERY_ABORTED.
  1. Next I attempted to use the js-libp2p package directly and I got the same error along with no successful provides to the DHT.

Steps to reproduce the error: Here is an example repository that reproduces the ERR_QUERY_ABORTED error that I am seeing.

@jtsmedley jtsmedley added the need/triage Needs initial labeling and prioritization label May 1, 2024
@achingbrain achingbrain added status/in-progress In progress and removed need/triage Needs initial labeling and prioritization labels May 7, 2024
@achingbrain
Copy link
Member

I've updated the script a bit to print more stats and show the effect of a deeper routing table over time:

import { createLibp2p } from 'libp2p';
import { createFromJSON } from "@libp2p/peer-id-factory";
import { CID } from "multiformats";
import { LevelDatastore } from 'datastore-level';
import * as libp2pInfo from 'libp2p/version';
import * as fs from "node:fs";
import { toString as uint8ArrayToString } from 'uint8arrays/to-string'
import * as crypto from "node:crypto"
import * as raw from 'multiformats/codecs/raw'
import * as Digest from 'multiformats/hashes/digest'
import { sha256 } from 'multiformats/hashes/sha2'
// Transport
import { webRTC, webRTCDirect } from '@libp2p/webrtc';
import { webSockets } from "@libp2p/websockets";
import { tcp } from "@libp2p/tcp";
// Encryption
import { noise } from '@chainsafe/libp2p-noise';
import { yamux } from '@chainsafe/libp2p-yamux';
import { mplex } from '@libp2p/mplex';
// Peer Discovery
import { bootstrap } from '@libp2p/bootstrap';
// Services
import { identify } from '@libp2p/identify'
import { kadDHT, removePrivateAddressesMapper } from '@libp2p/kad-dht'
import delay from 'delay'

async function getNode (type) {
    const peerIdFile = `${type}.peer`
    const datastoreDir = `${type}.db`

    let peerId

    if (fs.existsSync(peerIdFile)) {
        peerId = await createFromJSON(JSON.parse(fs.readFileSync(peerIdFile)));
    }

    const datastore = new LevelDatastore(datastoreDir)
    await datastore.open()

    const node = await createLibp2p({
        peerId,
        addresses: {
            listen: [
                '/ip4/0.0.0.0/tcp/0'
            ],
            announce: [
                '/dns4/example.com/tcp/1234'
            ]
        },
        transports: [tcp(), webSockets(), webRTC(), webRTCDirect()],
        peerDiscovery: [
            bootstrap({
                list: [
                    "/dnsaddr/bootstrap.libp2p.io/p2p/QmNnooDu7bfjPFoTZYxMNLWUQJyrVwtbZg5gBMjTezGAJN",
                    "/dnsaddr/bootstrap.libp2p.io/p2p/QmQCU2EcMqAqQPR2i9bChDtGNJchTbq5TbXJJ16u19uLTa",
                    "/dnsaddr/bootstrap.libp2p.io/p2p/QmbLHAnMoJPWSCR5Zhtx6BHJX9KiKNN6tpvbUcqanj75Nb",
                    "/dnsaddr/bootstrap.libp2p.io/p2p/QmcZf59bWwK5XFi76CZX8cbJ4BhTzzA3gU1ZjYZcYW3dwt",
                    "/ip4/104.131.131.82/tcp/4001/p2p/QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ"
                ],
            }),
        ],
        services: {
            dht: kadDHT({
                protocol: '/ipfs/kad/1.0.0',
                peerInfoMapper: removePrivateAddressesMapper
            }),
            identify: identify({
                agentVersion: `cre-24/1 ${libp2pInfo.name}/${libp2pInfo.version} UserAgent=${globalThis.process.version}`
            })
        },
        connectionEncryption: [noise()],
        streamMuxers: [yamux(), mplex()]
    });

    // write peer id out for reuse on next run
    fs.writeFileSync(peerIdFile, JSON.stringify({
        id: uint8ArrayToString(node.peerId.toBytes(), 'base58btc'),
        privKey: uint8ArrayToString(node.peerId.privateKey, 'base64pad'),
        pubKey: uint8ArrayToString(node.peerId.publicKey, 'base64pad'),
    }, null, 2))

    return node
}

const [
    publisherNode,
    resolverNode
] = await Promise.all([
    getNode('publisher'),
    getNode('resolver')
])

async function provide (cid) {
    const start = Date.now()
    const providers = []

    console.info('start provide of', cid.toString())
    printTableStats('publisher', publisherNode)

    try {
        await publisherNode.contentRouting.provide(cid, {
            onProgress: evt => {
                if (evt.detail.name === 'FINAL_PEER') {
                    console.info(`published provider record to ${evt.detail.peer.id} after ${Date.now() - start}ms`)

                    providers.push(evt.detail.peer.id.toString())
                }

                // uncomment to see all query steps
                //console.info('publish', evt.type, evt.detail)
            }
        })

        console.info(`stored provider records with ${providers.length} peers in ${Date.now() - start}ms`)
    } catch (err) {
        console.info(`provide failed after ${Date.now() - start}ms with message:`, err.message)
        throw err
    }

    printTableStats('publisher', publisherNode)
}

async function resolve (cid) {
    const start = Date.now()

    console.info('start resolve of', cid.toString())
    printTableStats('resolver', resolverNode)

    try {
        const providers = []
        let firstProvider

        for await (const provider of resolverNode.contentRouting.findProviders(cid, {
            signal: AbortSignal.timeout(120000)
        })) {
            console.info(`found provider ${provider.id} after ${Date.now() - start}ms`)
            providers.push(provider)

            if (firstProvider == null) {
                firstProvider = Date.now() - start
            }
        }

        console.info(`found ${providers.length} providers in ${Date.now() - start}ms, first provider found in ${firstProvider}ms`)
    } catch (err) {
        console.info(`finding providers failed after ${Date.now() - start}ms with message:`, err.message)
    }

    printTableStats('resolver', resolverNode)
}

while (true) {
    const cid = CID.createV1(raw.code, Digest.create(sha256.code, crypto.randomBytes(32)))
    await provide(cid)
    console.info('------')
    await resolve(cid)
    console.info('------')

    console.info('wait before starting provide')
    // wait 5s before providing again
    await delay(5000)
}

function printTableStats (type, node) {
    let size = 0
    let buckets = 0
    let maxDepth = 0

    function count (bucket, prefix = '') {
      prefix += bucket.prefix

      if (bucket.depth > maxDepth) {
        maxDepth = bucket.depth
      }

      if (bucket.peers != null) {
        buckets++
        size += bucket.peers.length
        return
      }

      count(bucket.left, prefix)
      count(bucket.right, prefix)
    }

    count(node.services.dht.routingTable.kb.root)

    console.info(type, 'routing table size', size, 'buckets', buckets, 'average occupancy', Math.round(size / buckets), 'max depth', maxDepth)
}
{
  "name": "content-routing-example",
  "version": "1.0.0",
  "type": "module",
  "description": "",
  "main": "index.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "keywords": [],
  "author": "",
  "dependencies": {
    "@chainsafe/libp2p-noise": "^15.0.0",
    "@chainsafe/libp2p-yamux": "^6.0.2",
    "@helia/delegated-routing-v1-http-api-client": "^3.0.1",
    "@libp2p/autonat": "next",
    "@libp2p/bootstrap": "next",
    "@libp2p/circuit-relay-v2": "next",
    "@libp2p/dcutr": "next",
    "@libp2p/identify": "next",
    "@libp2p/kad-dht": "next",
    "@libp2p/mplex": "next",
    "@libp2p/peer-id-factory": "next",
    "@libp2p/ping": "next",
    "@libp2p/tcp": "next",
    "@libp2p/upnp-nat": "next",
    "@libp2p/webrtc": "next",
    "@libp2p/websockets": "next",
    "datastore-level": "^10.1.8",
    "libp2p": "next",
    "multiformats": "^13.1.0"
  }
}

The changes I've made to the repro are:

  1. Start two nodes, a publisher and a resolver.
    • They are not connected so will work independently using the public network
  2. The publisher publishes provider records for a random CID
    • This means we will be requesting a different set of peers from the routing table each time
  3. The resolver then tries to resolve provider records for the CID
    • Again, because the CID is random, we will contact different peers each time
  4. Print routing table stats after each operation
  5. Goto 2

What I see is:

  1. The first provide starts with an empty routing table, so can take a minute or so to complete
  2. Subsequent provides depend on the diversity of the routing table
    • E.g. if it's well populated with peers KAD-close to the random CID that's being published it's fast, otherwise it takes longer
  3. Over time the provide times settle down as the routing table becomes more diverse (e.g. has a wider spread of KAD-ID values)

It starts with a provide that takes about a minute, then it starts to speed up but there's still the odd outlier with 30-40 second publish times. Once there's 10k+ or so peers in the routing table publish times can be under a second but are mostly in the range of 3-10 seconds.

@jtsmedley can you please try the above and report back if you see the same results?

@achingbrain achingbrain reopened this May 14, 2024
@dhuseby dhuseby added the need/author-input Needs input from the original author label May 14, 2024
@achingbrain achingbrain removed the status/in-progress In progress label May 14, 2024
@jtsmedley
Copy link
Author

  • Provides seem to be working much better overall.
  • When TCP is enabled as a transport, provides are much slower (~30 seconds). Is that expected?
    • Provides using only WebSockets as a transport seemed to all be in the 2-3 second range at maximum. 👍

@achingbrain
Copy link
Member

Support for WebSockets on the network is very sparse so if that's the only transport you're using it's likely the query isn't running to completion.

When you are seeing a 30s publish, what is the size of the routing table?

Typically you'll see the query time come down as the size of the table goes up because you have to make fewer network hops to find the closest peers to a given key.

To get a decently diverse routing table can take 20-30 minutes, though if you're using a datastore that persists between restarts most of the routing table should be restored after a node restart.

Copy link
Contributor

Oops, seems like we needed more information for this issue, please comment with more details or this issue will be closed in 7 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/stale need/author-input Needs input from the original author
Projects
Status: 🧱Blocked
Development

No branches or pull requests

3 participants