Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Niu/multiprovider merge #14373

Merged
merged 111 commits into from
May 16, 2024
Merged

Niu/multiprovider merge #14373

merged 111 commits into from
May 16, 2024

Conversation

NiuYawei
Copy link
Contributor

Before requesting gatekeeper:

  • Two review approvals and any prior change requests have been resolved.
  • Testing is complete and all tests passed or there is a reason documented in the PR why it should be force landed and forced-landing tag is set.
  • Features: (or Test-tag*) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.
  • Commit messages follows the guidelines outlined here.
  • Any tests skipped by the ticket being addressed have been run and passed in the PR.

Gatekeeper:

  • You are the appropriate gatekeeper to be landing the patch.
  • The PR has 2 reviews by people familiar with the code, including appropriate owners.
  • Githooks were used. If not, request that user install them and check copyright dates.
  • Checkpatch issues are resolved. Pay particular attention to ones that will show up on future PRs.
  • All builds have passed. Check non-required builds for any new compiler warnings.
  • Sufficient testing is done. Check feature pragmas and test tags and that tests skipped for the ticket are run and now pass with the changes.
  • If applicable, the PR has addressed any potential version compatibility issues.
  • Check the target branch. If it is master branch, should the PR go to a feature branch? If it is a release branch, does it have merge approval in the JIRA ticket.
  • Extra checks if forced landing is requested
    • Review comments are sufficiently resolved, particularly by prior reviewers that requested changes.
    • No new NLT or valgrind warnings. Check the classic view.
    • Quick-build or Quick-functional is not used.
  • Fix the commit message upon landing. Check the standard here. Edit it to create a single commit. If necessary, ask submitter for a new summary.

kjacque and others added 30 commits April 23, 2024 14:06
A change further up in the stack revealed that "ERROR" wasn't
accepted as a log mask string at the engine level.

Signed-off-by: Kris Jacque <kris.jacque@intel.com>
#14126)

The test creates 50 containers for each of the 10 pools 20 times
(10 x 50 x 20). Creating many containers serially takes significant
amount of time, so use threads to create the containers in parallel.

Tested the speed up of run_test_create_delete() (Just this method.
Not the entire test) with 3 x 50 x 3 and took 732 sec in serial, but
only 261 sec in parallel.

Also reduce iteration to 2 and reduce timeout.

Signed-off-by: Makito Kano <makito.kano@intel.com>
Create a separate pre read buffer so that it is not tied to the
kernel buffer size, use 4Mb as size threshold for pre-read.
Pre-allocate buffers at startup, not first read.
Change to on-by-default for fresh directories.
Do not disable is file is opened but not read from - this could be
the kernel cache doing it's job.

Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
Update actions/upload-artifact used in ossf-scorecard due to deprecation
notice.

Signed-off-by: Dalton Bohning <dalton.bohning@intel.com>
Signed-off-by: Joseph Moore <joseph.moore@intel.com>
to fix coverity issue 2555533

Signed-off-by: Lei Huang <lei.huang@intel.com>
…14190)

Use TestPool.get_space_per_target instead of the pydaos.raw API call.
Remove the no longer used pydaos.raw target_query and supporting code.

Signed-off-by: Dalton Bohning <dalton.bohning@intel.com>
1. Fix to return real error rather than return ENOMEM which is very confusing.
2. skip not started pool when creating migrating pools.
3. skip up targets when updating cont prop.

Signed-off-by: Wang Shilong <shilong.wang@intel.com>
Remove references to wiki, jira and other links that are now
on daos.io. Merge cloud content to installation section.
Update Copyright to 2024.

Signed-off-by: Johann Lombardi <johann.lombardi@gmail.com>
Fix path walk of the pil4dfs's dentry cache.
Fix pil4dfs rename() function.
Add enable/disable feature of the pil4dfs's dentry cache.
Add new functional test of the pil4dfs's dentry cache,

Signed-off-by: Cedric Koch-Hofer <cedric.koch-hofer@intel.com>
Co-authored-by: Lei Huang <lei.huang@intel.com>
required_src was added to avoid conflicts on the file
during feature development. It is not necessary any longer
(and wrong since ddb has moved from src to src/utils now).

Signed-off-by: Johann Lombardi <johann.lombardi@gmail.com>
When setting these previously I thought they only appeared in
debugger output so they have names which are only meaningful
in that context, but the thread names are also visable in
ps and top and having a process called "main" does not make
sense here.

Do not rename the main dfuse thread, and use a dfuse prefix
for other thread names.

Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
That will drop partial modification, remove the pinned DTX entry,
evict related stale cache.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Avoid using storage: auto on vm tests until DAOS-15233 can be addressed.

Signed-off-by: Phil Henderson <phillip.henderson@intel.com>
Add registering calls for a container destroy for each TestContainer
object created by the test.  Using the register cleanup method will
ensure proper order of operations when tearing down the test case.

Signed-off-by: Phil Henderson <phillip.henderson@intel.com>
vos: Add version param to pool create

In a DAOS pool using the old pool global version, we need to create new
VOS pools using the old DF version. See the Jira ticket for the details.
This patch adds a version parameter to vos_pool_create and
vos_pool_create_ex.

rsvc: Create rsvc with VOS DF version (#14156)

If a pool with an old layout version is served by a DAOS version with a
new default layout version, for instance, a 2.4-layout pool served by
DAOS 2.5, then any new VOS pools created for this DAOS pool must use the
old layout, or downgrading back to the old DAOS version would become
impossible.

Signed-off-by: Li Wei <wei.g.li@intel.com>
- SEP is currently not supported by any active provider.
- Remove how we expose SEP as it's setting is based on sockets provider limitations

Signed-off-by: Alexander A Oganezov <alexander.a.oganezov@intel.com>
Co-authored-by: Kris Jacque <kris.jacque@intel.com>
…ce stats (#14168)

NEW devices should be ignored Rather than causing a failure, situation
occurs when number of targets is less than the number of SSDs.

Signed-off-by: Tom Nabarro <tom.nabarro@intel.com>
Add missing ':avocado: recursive' from test class docstrings.

Signed-off-by: Phil Henderson <phillip.henderson@intel.com>
Support filenames with spaces when generating stack traces from core
files detected after running tests.

Signed-off-by: Phil Henderson <phillip.henderson@intel.com>
- Fix mem leak for coverity 2555536

Signed-off-by: Alexander A Oganezov <alexander.a.oganezov@intel.com>
The coverity tool gets confused about the use of assert in the
debug version of the logging macros and can think a lock is
being unlocked twice which it reports as a API usage error.

Disable the complex macros for coverity to reduce the instances
of false positives in the tool.

Fixes coverity ID 1975167 and others.

Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
Inject more faults in the non-baseline workload loops
(10% / 20% fault rate change to 33% / 50%), so there is
more separation in baseline loop timing compared to the
fault-injection loops timing.

Also, turn down engine logging during execution of the timed
metadata workloads in co_op_dup_timing(). Restore to the
originally-configured setting after the timed operations.
This is done with the additoin of a new tests dmg helper function,
dmg_server_set_logmasks(), called from co_op_dup_timing().

Signed-off-by: Kenneth Cain <kenneth.c.cain@intel.com>
Originally use parameters "-g 11 -t 7 -o 3 -a 3 -d 3" for daos_gen_io_conf
will generate 437 cmd lines that includes 54 exclude/add cmd each will
trigger one rebuild. The total time 2100 Second possibly not enough to
run those cmds (most time spend for the 54 rebuilds).
This patch reduce the parameters "-g 11 -t 4 -o 3 -a 2 -d 2" will
generate 181 cmd lines includes 24 exclude/rebuild cmds to reduce
testing time.
Reduce the timeout value accordingly.

Signed-off-by: Xuezhao Liu <xuezhao.liu@intel.com>
The recovery/container_list_consolidation.py test orphans a container so
we need to indicate to the TestContainer object that we don't need to
call a daos container destroy during tearDown.

Signed-off-by: Phil Henderson <phillip.henderson@intel.com>
Fixes coverity ID 2555535

Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
"dmg system cleanup" will cleanup the pools and containers so skip
teardown cleanup.

Signed-off-by: Dalton Bohning <dalton.bohning@intel.com>
CID: 2555531 Unchecked return value

Signed-off-by: Tom Nabarro <tom.nabarro@intel.com>
…#14239)

For certain situations a zero value NVMe namespace ID will be returned
in dmg output, in this case it should be omitted from display output as
valid values are non-zero.

Signed-off-by: Tom Nabarro <tom.nabarro@intel.com>
In SV overwerite case, the btr_update_record() will defer free
the original record and allocate new record for record replacing,
however, btr_node_tx_add() is mistakenly skipped in btr_update(),
that leads to:
1. In md-on-ssd mode, tree node changes are missed in WAL.
2. In pmem mode, tree node snapshot is missed in undo log.

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
daltonbohning and others added 3 commits May 14, 2024 11:00
The group argument was removed by #14201.

Signed-off-by: Dalton Bohning <dalton.bohning@intel.com>
Adds a new /svc group under each pool which contains
the following set of metrics:
  * leader (gauge): Current pool service leader rank
  * map_version (counter): Current pool map version
  * open_pool_handles (gauge): Current count of open handles
  * total_ranks (gauge): Number of ranks in pool map
  * degraded_ranks (gauge): Number of ranks with disabled targets
  * total_targets (gauge): Number of targets in pool map
  * disabled_targets (gauge): Number of targets marked disabled
  * draining_targets (gauge): Number of targets in draining state

For non-leader ranks, the service metrics will have zero
values. Telemetry consumers may positively identify the
current leader by checking the value of map_version, which
will always be non-zero for the leader.

Signed-off-by: Michael MacDonald <mjmac@google.com>
@NiuYawei NiuYawei requested review from a team as code owners May 15, 2024 08:02
Copy link

Errors are component not formatted correctly,Ticket number prefix incorrect,PR title is malformatted. See https://daosio.atlassian.net/wiki/spaces/DC/pages/11133911069/Commit+Comments,Unable to load ticket data
https://daosio.atlassian.net/browse/Niu/multiprovider

Copy link
Contributor

@tanabarr tanabarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most Go changes seem to be related to recent PRs having landed to master. No strange conflict related issues noticed. Should be careful that when feature branch is merged into master no unintended reverts slip through. Go changes LGTM.

Copy link
Contributor

@daltonbohning daltonbohning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1 just so we make a decision.
@jolivier23 This is good candidate for "Create a merge commit" instead of "Squash and merge", right?
The benefit to a merge commit is feature/multiprovider will contain the exact same commit SHAs as master, so doing a diff between feature/multiprovider and master makes more sense.

@kjacque
Copy link
Contributor

kjacque commented May 15, 2024

-1 just so we make a decision. @jolivier23 This is good candidate for "Create a merge commit" instead of "Squash and merge", right? The benefit to a merge commit is feature/multiprovider will contain the exact same commit SHAs as master, so doing a diff between feature/multiprovider and master makes more sense.

This is what I was planning to do. Glad to see that ability was added to our repo!

Copy link
Contributor

@frostedcmos frostedcmos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cart changes are minimal and lgtm

@kjacque kjacque merged commit 4d1db74 into feature/multiprovider May 16, 2024
61 of 69 checks passed
@kjacque kjacque deleted the niu/multiprovider-merge branch May 16, 2024 00:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet