[Bug] g.E().count() the result is not as expectedThe result is not as expected #2515

dongpodu · 2024-04-10T12:47:58Z

Bug Type (问题类型)

others (please edit later)

Before submit

我已经确认现有的 Issues 与 FAQ 中没有相同 / 重复问题 (I have confirmed and searched that there are no similar problems in the historical issue and documents)

Environment (环境信息)

Server Version: 1.0.0 (Apache Release Version)
Backend: RocksDB x nodes, HDD or SSD
OS: xx CPUs, xx G RAM, Ubuntu 2x.x / CentOS 7.x
Data Size: xx vertices, xx edges

Expected & Actual behavior (期望与实际表现)

请问当hugegraph存储是memory时，存的数据量是不是有限制，我同步了100w多条边，但通过hubble的g.E().count()查看边却只有2w条

Vertex/Edge example (问题点 / 边数据举例)

No response

Schema [VertexLabel, EdgeLabel, IndexLabel] (元数据结构)

No response

dosubot · 2024-04-10T12:48:35Z

Hey there @dongpodu! Great to see you diving into HugeGraph's depths again. Hope you're ready for another adventure in debugging! 🚀

The discrepancy between the expected and actual edge counts when using HugeGraph with memory as the storage backend is likely due to the limitations of the InMemoryDBStore. This store is constrained by the available system memory, which means the amount of data it can handle is directly related to the memory capacity of your system. If the system's memory is insufficient for the volume of data being processed, it could result in incomplete data storage or processing, leading to the observed discrepancy in edge counts.

The g.E().count() query in Hubble, designed to count edges efficiently, interacts with the InMemoryDBStore through methods that fetch edges based on defined criteria. However, the accuracy of this count can be affected by the volume of data and the complexity of queries, potentially leading to discrepancies if the data exceeds memory capacities or if there are limitations in the query processing logic.

In essence, the nature of in-memory storage systems suggests that the amount of data HugeGraph can handle with memory as the storage backend is limited by the available system memory. This could be the reason for the discrepancy you observed. If you have further questions or need more information, feel free to ask.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

JackyYangPassion · 2024-04-10T14:18:37Z

As shown in the screenshot,
it is recommended to use local Debug to check the underlying
protected final Map<Id, BackendEntry> store = new ConcurrentHashMap<>() map size()
or add log.info("edge num", this.store.size())

dongpodu · 2024-04-11T02:58:15Z

firstly ,thks for your reply, but the system free memory is 40GB,which is totally enough for one million edges to save,so this is not the reason.

dongpodu · 2024-04-11T03:20:29Z

thks,i will try later!

JackyYangPassion · 2024-04-11T03:53:20Z

You can turn on the Debug log or single-step debugging to see if there are any bugs in the query path.

g.E().Count() Query path in source code：
HugeGraphStep:edgesCount()
-----> StandardHugeGraph:queryNumber()
---->InMemoryDBStore:queryNumber()

    public Number queryNumber(Query query) {
        InMemoryDBTable table = this.table(InMemoryDBTable.tableType(query));
        Number result = table.queryNumber(null, query);
        LOG.debug("[store {}] get result({}) for number query: {}",
                  this.store, result, query);
        return result;
    }

dongpodu · 2024-04-23T02:51:41Z

i made a mistake, the amount of edges synced to hugegraph is less than 1 million. so the problem mentioned above is not the problem. but i found another big problem.the memory store seems not to store huge amount of edges or vertices. when i sync less than 1 million edges to hugegraph, the server response more and more slowly, and young gc happens frequently. when the amount is up to 0.6 million, the server cannot service to others.

imbajin · 2024-04-23T03:05:11Z

i made a mistake, the amount of edges synced to hugegraph is less than 1 million. so the problem mentioned above is not the problem. but i found another big problem.the memory store seems not to store huge amount of edges or vertices. when i sync less than 1 million edges to hugegraph, the server response more and more slowly, and young gc happens frequently. when the amount is up to 0.6 million, the server cannot service to others.

@dongpodu use rocksdb for the stand-alone mode (u could mount a tmpfs/ramfs as a device/disk for it(rocksdb) if u aim for using it in memory-mode), memory backend mainly used to run CI and small datasets(not for production environment)

@dosu u could learn the context & tell other users next time

dosubot · 2024-04-23T03:07:24Z

Switching to RocksDB for the stand-alone mode and considering mounting a tmpfs/ramfs as a device/disk for RocksDB if the goal is to use it in memory-mode is a practical solution. The memory backend is primarily intended for CI and handling small datasets, not for production environments. This approach can help mitigate issues related to server responsiveness and frequent young garbage collection (GC) occurrences when dealing with large volumes of data.

Details

About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

dongpodu added the bug Something isn't working label Apr 10, 2024

dosubot bot added question Further information is requested rocksdb RocksDB backend labels Apr 10, 2024

JackyYangPassion changed the title ~~[Bug] describe the main problem~~ [Bug] g.E().count() the result is not as expectedThe result is not as expected Apr 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] g.E().count() the result is not as expectedThe result is not as expected #2515

[Bug] g.E().count() the result is not as expectedThe result is not as expected #2515

dongpodu commented Apr 10, 2024

dosubot bot commented Apr 10, 2024 •

edited

About Dosu

JackyYangPassion commented Apr 10, 2024

dongpodu commented Apr 11, 2024

dongpodu commented Apr 11, 2024

JackyYangPassion commented Apr 11, 2024

dongpodu commented Apr 23, 2024

imbajin commented Apr 23, 2024

dosubot bot commented Apr 23, 2024

Details

[Bug] g.E().count() the result is not as expectedThe result is not as expected #2515

[Bug] g.E().count() the result is not as expectedThe result is not as expected #2515

Comments

dongpodu commented Apr 10, 2024

Bug Type (问题类型)

Before submit

Environment (环境信息)

Expected & Actual behavior (期望与实际表现)

Vertex/Edge example (问题点 / 边数据举例)

Schema [VertexLabel, EdgeLabel, IndexLabel] (元数据结构)

dosubot bot commented Apr 10, 2024 • edited

Sources

About Dosu

JackyYangPassion commented Apr 10, 2024

dongpodu commented Apr 11, 2024

dongpodu commented Apr 11, 2024

JackyYangPassion commented Apr 11, 2024

dongpodu commented Apr 23, 2024

imbajin commented Apr 23, 2024

dosubot bot commented Apr 23, 2024

Details

dosubot bot commented Apr 10, 2024 •

edited