Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] g.E().count() the result is not as expectedThe result is not as expected #2515

Open
1 task done
dongpodu opened this issue Apr 10, 2024 · 8 comments
Open
1 task done
Labels
bug Something isn't working question Further information is requested rocksdb RocksDB backend

Comments

@dongpodu
Copy link

Bug Type (问题类型)

others (please edit later)

Before submit

  • 我已经确认现有的 IssuesFAQ 中没有相同 / 重复问题 (I have confirmed and searched that there are no similar problems in the historical issue and documents)

Environment (环境信息)

  • Server Version: 1.0.0 (Apache Release Version)
  • Backend: RocksDB x nodes, HDD or SSD
  • OS: xx CPUs, xx G RAM, Ubuntu 2x.x / CentOS 7.x
  • Data Size: xx vertices, xx edges

Expected & Actual behavior (期望与实际表现)

请问当hugegraph存储是memory时,存的数据量是不是有限制,我同步了100w多条边,但通过hubble的g.E().count()查看边却只有2w条

Vertex/Edge example (问题点 / 边数据举例)

No response

Schema [VertexLabel, EdgeLabel, IndexLabel] (元数据结构)

No response

@dongpodu dongpodu added the bug Something isn't working label Apr 10, 2024
Copy link

dosubot bot commented Apr 10, 2024

Hey there @dongpodu! Great to see you diving into HugeGraph's depths again. Hope you're ready for another adventure in debugging! 🚀

The discrepancy between the expected and actual edge counts when using HugeGraph with memory as the storage backend is likely due to the limitations of the InMemoryDBStore. This store is constrained by the available system memory, which means the amount of data it can handle is directly related to the memory capacity of your system. If the system's memory is insufficient for the volume of data being processed, it could result in incomplete data storage or processing, leading to the observed discrepancy in edge counts.

The g.E().count() query in Hubble, designed to count edges efficiently, interacts with the InMemoryDBStore through methods that fetch edges based on defined criteria. However, the accuracy of this count can be affected by the volume of data and the complexity of queries, potentially leading to discrepancies if the data exceeds memory capacities or if there are limitations in the query processing logic.

In essence, the nature of in-memory storage systems suggests that the amount of data HugeGraph can handle with memory as the storage backend is limited by the available system memory. This could be the reason for the discrepancy you observed. If you have further questions or need more information, feel free to ask.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@dosubot dosubot bot added question Further information is requested rocksdb RocksDB backend labels Apr 10, 2024
@JackyYangPassion
Copy link
Contributor

image
  1. As shown in the screenshot,
  2. it is recommended to use local Debug to check the underlying
    protected final Map<Id, BackendEntry> store = new ConcurrentHashMap<>() map size()
    or add log.info("edge num", this.store.size())

@JackyYangPassion JackyYangPassion changed the title [Bug] describe the main problem [Bug] g.E().count() the result is not as expectedThe result is not as expected Apr 10, 2024
@dongpodu
Copy link
Author

firstly ,thks for your reply, but the system free memory is 40GB,which is totally enough for one million edges to save,so this is not the reason.

@dongpodu
Copy link
Author

thks,i will try later!

@JackyYangPassion
Copy link
Contributor

You can turn on the Debug log or single-step debugging to see if there are any bugs in the query path.

g.E().Count() Query path in source code:
HugeGraphStep:edgesCount()
-----> StandardHugeGraph:queryNumber()
---->InMemoryDBStore:queryNumber()

    public Number queryNumber(Query query) {
        InMemoryDBTable table = this.table(InMemoryDBTable.tableType(query));
        Number result = table.queryNumber(null, query);
        LOG.debug("[store {}] get result({}) for number query: {}",
                  this.store, result, query);
        return result;
    }

@dongpodu
Copy link
Author

i made a mistake, the amount of edges synced to hugegraph is less than 1 million. so the problem mentioned above is not the problem. but i found another big problem.the memory store seems not to store huge amount of edges or vertices. when i sync less than 1 million edges to hugegraph, the server response more and more slowly, and young gc happens frequently. when the amount is up to 0.6 million, the server cannot service to others.

@imbajin
Copy link
Member

imbajin commented Apr 23, 2024

i made a mistake, the amount of edges synced to hugegraph is less than 1 million. so the problem mentioned above is not the problem. but i found another big problem.the memory store seems not to store huge amount of edges or vertices. when i sync less than 1 million edges to hugegraph, the server response more and more slowly, and young gc happens frequently. when the amount is up to 0.6 million, the server cannot service to others.

@dongpodu use rocksdb for the stand-alone mode (u could mount a tmpfs/ramfs as a device/disk for it(rocksdb) if u aim for using it in memory-mode), memory backend mainly used to run CI and small datasets(not for production environment)

@dosu u could learn the context & tell other users next time

Copy link

dosubot bot commented Apr 23, 2024

Switching to RocksDB for the stand-alone mode and considering mounting a tmpfs/ramfs as a device/disk for RocksDB if the goal is to use it in memory-mode is a practical solution. The memory backend is primarily intended for CI and handling small datasets, not for production environments. This approach can help mitigate issues related to server responsiveness and frequent young garbage collection (GC) occurrences when dealing with large volumes of data.

Details

About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested rocksdb RocksDB backend
Projects
None yet
Development

No branches or pull requests

3 participants