Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reuse vector from pool #15733

Open
wants to merge 18 commits into
base: main
Choose a base branch
from
Open

Conversation

ouyuanning
Copy link
Contributor

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

issue https://github.com/matrixorigin/MO-Cloud/issues/1803

What this PR does / why we need it:

https://github.com/matrixorigin/MO-Cloud/issues/1803

@matrix-meow matrix-meow added the size/M Denotes a PR that changes [100,499] lines label Apr 26, 2024
@matrix-meow
Copy link
Contributor

@ouyuanning Thanks for your contributions!

Here are review comments for file pkg/container/batch/batch_test.go:

Pull Request Review:

Title:

The title of the pull request "reuse vector from pool" is clear and indicates that the changes involve reusing vectors from a pool.

Body:

The body of the pull request is minimal and lacks detailed information. It mentions the type of PR as an improvement and references the related issue #1803. However, it does not provide a clear explanation of what the PR does or why it is needed. It would be beneficial to include more context in the body to help reviewers understand the purpose of the changes.

Changes in pkg/container/batch/batch_test.go:

  1. Problem - Potential Resource Leak:

    • The changes in the TestBatch_ReplaceVector function show the modification of vector creation from &vector.Vector{} to vector.NewVecFromReuse().
    • While reusing vectors from a pool can be efficient, it is crucial to ensure that the vectors are properly released or reset after use to prevent resource leaks.
    • The code snippet does not show how the vectors are handled after being used, which could lead to memory leaks if not managed correctly.
  2. Suggestion - Resource Management:

    • Ensure that after using the vectors obtained from the pool, they are properly reset or released back to the pool to avoid memory leaks.
    • Add comments or documentation to clarify the lifecycle of the vectors obtained from the pool and how they should be handled after use.
  3. Optimization - Code Consistency:

    • Consider refactoring the code to ensure consistency in how vectors are obtained and managed throughout the test cases.
    • If there are multiple test cases that require vector reuse, consider centralizing the logic for obtaining and releasing vectors to improve maintainability and reduce code duplication.
  4. Security Concern - Data Sensitivity:

    • If the vectors being reused contain sensitive data or state, ensure that proper measures are in place to prevent data leakage between test cases.
    • Implement safeguards to sanitize or reset the vectors to a clean state before reuse to avoid potential data exposure.

General Suggestions:

  • Provide a more detailed description in the pull request body to explain the rationale behind the changes and the impact on the codebase.
  • Consider adding unit tests to cover the scenarios where vectors are reused from the pool to ensure the correctness of the implementation.
  • Conduct thorough testing, including edge cases, to validate the behavior of reusing vectors from the pool.

By addressing the resource management concerns, ensuring code consistency, and considering data sensitivity, the pull request can be enhanced to improve the efficiency and reliability of vector reuse in the codebase.

Here are review comments for file pkg/container/batch/types.go:

Pull Request Review:

Title:

The title of the pull request is clear and concise, indicating the intention to reuse a vector from a pool.

Body:

The body of the pull request provides information about the type of PR (Improvement), the specific issue it addresses (issue #1803), and a link to the related GitHub issue for more context.

Changes in pkg/container/batch/types.go:

  1. Addition of Import Statement:

    • An import statement for vector package seems to be missing in the changes. It is referenced in the code but not imported. This could lead to compilation errors.
  2. Code Modification:

    • The code modification in the unmarshalBinaryWithAnyMp function replaces the creation of a new vector with vector.NewVecFromReuse(). This change suggests reusing vectors from a pool instead of creating new ones every time.

Suggestions for Improvement:

  1. Import Statement:

    • Ensure that the vector package is imported in the file to resolve any potential compilation issues. Add github.com/matrixorigin/matrixone/pkg/container/vector to the import statements.
  2. Error Handling:

    • It's important to handle errors that may occur during the reuse of vectors from the pool. Make sure to check and appropriately handle any errors returned by vector.NewVecFromReuse().
  3. Code Optimization:

    • Consider adding comments to explain the purpose of reusing vectors from the pool for better code readability.
  4. Testing:

    • Verify that the reuse of vectors from the pool does not introduce any unexpected behavior or performance issues. Consider adding test cases to cover this functionality.
  5. Security Concerns:

    • Ensure that reusing vectors from the pool does not introduce any security vulnerabilities, especially related to data integrity or memory safety.

By addressing the suggestions above, the pull request can be improved in terms of code correctness, maintainability, and reliability.

Here are review comments for file pkg/container/vector/reuse.go:

Pull Request Review:

Title:

The title of the pull request is clear and concise, indicating that the goal is to reuse a vector from a pool.

Body:

The body of the pull request is minimal, providing a link to the related GitHub issue. It would be beneficial to include a brief description of the changes made and why they are necessary for better context.

Changes in pkg/container/vector/reuse.go:

  1. Lack of Error Handling: The NewVecFromReuse function allocates a vector from the pool but does not handle potential errors that may occur during the allocation process. It's important to consider error handling to ensure robustness in the code.

    Suggestion: Add error handling mechanism to manage any errors that may arise during the allocation of the vector from the pool.

  2. Commented Out Code: There are commented-out sections of code within the NewVecFromReuse function. While it's common to leave commented code for reference, it's essential to remove unnecessary commented code before merging changes to maintain code cleanliness.

    Suggestion: Remove commented-out code sections that are no longer needed for clarity and maintainability.

  3. Unused Fields: The Vector struct has commented-out fields like AllocMsg, FreeMsg, PutMsg, GetMsg, and OnUsed. If these fields are no longer required, it's better to remove them to avoid confusion and reduce unnecessary code complexity.

    Suggestion: Remove unused fields from the Vector struct to streamline the code and improve readability.

  4. Missing Documentation: The NewVecFromReuse function lacks documentation comments explaining its purpose, parameters, and return values. Proper documentation is essential for understanding the function's behavior and usage.

    Suggestion: Add meaningful documentation comments to the NewVecFromReuse function to describe its functionality, parameters, and return values clearly.

  5. Code Optimization:

    • Consider optimizing the NewVecFromReuse function for better performance if there are any redundant operations or allocations that can be avoided.
    • Ensure that the code follows consistent formatting and adheres to the project's coding standards for maintainability.

Overall Suggestions:

  • Enhance error handling in the NewVecFromReuse function to handle potential errors during vector allocation.
  • Remove unnecessary commented-out code and unused fields from the Vector struct to declutter the codebase.
  • Add comprehensive documentation comments to the NewVecFromReuse function for better code understanding.
  • Optimize the code for performance and maintainability by reviewing and refining the implementation.

By addressing these issues and suggestions, the pull request can be improved in terms of code quality, readability, and maintainability.

Here are review comments for file pkg/container/vector/tools.go:

Pull Request Review:

Title:

The title of the pull request is clear and concise, indicating the intention to reuse a vector from a pool.

Body:

The body of the pull request provides information about the type of PR (Improvement), the specific issue it addresses (issue #1803), and a link to the related issue for more context.

Changes in pkg/container/vector/tools.go:

  1. Problem - Code Duplication:

    • The code for creating a new Vector struct in the ProtoVectorToVector function is duplicated.

    Suggestion:

    • To avoid code duplication, consider refactoring the code by creating a function that initializes a Vector struct with the provided values.
  2. Security Concern - Memory Management:

    • The rvec struct is being initialized with cantFreeData and cantFreeArea set to true.

    Suggestion:

    • Ensure that setting cantFreeData and cantFreeArea to true is necessary for the specific use case. If not, review the memory management strategy to prevent memory leaks.
  3. Optimization - Reusing Vector from Pool:

    • The change introduces the use of NewVecFromReuse() to reuse a vector from a pool instead of creating a new one each time.

    Suggestion:

    • Verify that reusing vectors from a pool improves performance and memory usage. Consider benchmarking to measure the impact of this change.
  4. Code Readability:

    • The code formatting could be improved for consistency, such as ensuring proper indentation and alignment of code blocks.

Overall Suggestions:

  • Refactor the code to eliminate duplication and improve maintainability.
  • Review the memory management strategy to ensure efficient resource utilization.
  • Validate the performance impact of reusing vectors from a pool.
  • Enhance code readability by maintaining consistent formatting.

By addressing these issues and suggestions, the pull request can contribute to a more efficient and maintainable codebase.

Here are review comments for file pkg/container/vector/vector_test.go:

Pull Request Review:

Title:

The title of the pull request is clear and concise, indicating the intention to reuse vectors from a pool.

Body:

The body of the pull request provides information about the type of PR, the related issue, and a link to the issue for more context. It would be beneficial to have a more detailed explanation of why reusing vectors from a pool is necessary or how it improves the codebase.

Changes in pkg/container/vector/vector_test.go:

  1. Code Duplication:

    • The changes in the file vector_test.go show repetitive code where NewVecFromReuse() is assigned to variable w multiple times. This leads to code duplication and reduces maintainability.
    • Suggestion: Instead of assigning NewVecFromReuse() to w multiple times, consider assigning it once and reusing the same variable throughout the test functions.
  2. Variable Naming:

    • The variable name w is used for the vector, which is not very descriptive. It would be better to use a more meaningful variable name to improve readability and understanding of the code.
  3. Reuse of Vectors:

    • The pull request aims to reuse vectors from a pool by calling NewVecFromReuse() instead of creating a new vector instance every time. This is a good practice to optimize memory usage and improve performance.
  4. Testing:

    • The changes seem to be focused on testing the functionality related to marshaling and unmarshaling vectors. It's important to ensure that the tests cover all relevant scenarios and edge cases.

Security Concerns:

  • The changes in the pull request do not introduce any apparent security concerns based on the provided information.

Overall Suggestions:

  • Refactor the code to remove code duplication by reusing the same variable for NewVecFromReuse() assignment.
  • Consider using more descriptive variable names to enhance code readability.
  • Add more detailed explanations in the PR body to provide a better understanding of the changes and their benefits.

By addressing the code duplication and improving variable naming, the codebase can become more maintainable and easier to understand. Additionally, providing a more detailed explanation in the PR body can help reviewers and future developers grasp the significance of the changes.

Here are review comments for file pkg/vm/engine/memoryengine/operations.go:

Pull Request Review:

Title:

The title of the pull request is clear and concise, indicating the intention to reuse a vector from a pool.

Body:

The body of the pull request is minimal, providing a link to the related issue and mentioning the purpose of the changes made.

Changes in operations.go:

  1. Problem - Memory Leak Potential:

    • The change introduces a potential memory leak issue. The code snippet shows the creation of a new vector using vector.NewVecFromReuse(), but it seems that the vector is not properly handled or released after its use.
  2. Suggestion - Memory Management:

    • To address the memory leak issue, ensure that the vector is properly managed and released after its use. It's important to follow memory management best practices to avoid memory leaks.
  3. Optimization - Reuse Mechanism:

    • Consider implementing a proper reuse mechanism for vectors to efficiently utilize memory resources. This can involve returning the vector to the pool after its use instead of creating a new one each time.
  4. Code Readability:

    • While the change itself is straightforward, consider adding comments or documentation to explain the purpose of reusing vectors from the pool for better code readability and maintainability.

Overall Suggestions:

  • Ensure proper memory management practices are followed to prevent memory leaks.
  • Implement a robust mechanism for reusing vectors to optimize memory usage.
  • Enhance code readability by adding comments or documentation where necessary.

By addressing the memory management issue and optimizing the reuse mechanism, the codebase can be improved in terms of efficiency and maintainability.

@ouyuanning
Copy link
Contributor Author

本地tpcc 10仓10线程,会有20%的性能回退。要再看看

@jensenojs
Copy link
Contributor

左边就是reuse-vector, 右边是main, 然后跑的是mo-tpcc的10 warehouse, 100 terminals, 3 runMins

基于如下的commit对比, 跑在深圳的台式机上

d912eea72 (HEAD, upstream/main) fix deadlock by log wait tool long (#16159)
企业微信截图_15d58025-7ae2-41ae-8934-57c334e2da82

@ouyuanning
Copy link
Contributor Author

ouyuanning commented May 17, 2024

参考每日回归的tpcc
企业微信截图_f2169d86-d2cf-4431-bbe9-a356fb96ecf1

根据这里跑daily中tpcc的情况看。
https://github.com/matrixorigin/mo-nightly-regression/actions/runs/9109140447/job/25041954746
tpcc_10_10: 3776.47 跟daily 中5月13日的高点基本一致
tpcc_10_100: 10710.73 比daily 中5月13日的高点低20%
tpcc_100_100: 12696.64 比daily中5月13日的高点略高
由于跑我的分支的版本是基于5月15日左右的main的,daily中那天的分数都很低。会不太好与5月15日左右的daily比较

@ouyuanning ouyuanning removed the request for review from nnsgmsone May 23, 2024 01:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement size/M Denotes a PR that changes [100,499] lines
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants