Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3select: replacing a "naked loop" with std::copy_n #42

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

ronen-fr
Copy link

... and fixing some extra copying

Signed-off-by: Ronen Friedman rfriedma@redhat.com

@yuvalif
Copy link
Collaborator

yuvalif commented Dec 14, 2020

  • we dont have CI for this repo, but please make sure that unit tests are passing
  • please make sure that the submodule reference is updated in the ceph repo after the fix is merged
  • the "extra copy" issue exists in many other places in the code. to minimize the number of ceph submodule updates, could you please go through the rest of them?
    at least the easy ones like const std::string& functions args and const auto& in loops?

@galsalomon66
Copy link

  • we dont have CI for this repo, but please make sure that unit tests are passing
  • please make sure that the submodule reference is updated in the ceph repo after the fix is merged
  • the "extra copy" issue exists in many other places in the code. to minimize the number of ceph submodule updates, could you please go through the rest of them?
    at least the easy ones like const std::string& functions args and const auto& in loops?

there are a lot of tests, and it keeps growing.

  1. unit-tests which cover the s3select-engine itself.
  2. s3tests which validate integration with RGW.

the plan is to complete with s3select functionalities(as describe in AWS) , and to have first RGW-integration with parquet.
later, as separate effort, a performance "session".
one of the issues is to implement zero-copy (including string_view).

@yuvalif
Copy link
Collaborator

yuvalif commented Dec 14, 2020

  • we dont have CI for this repo, but please make sure that unit tests are passing
  • please make sure that the submodule reference is updated in the ceph repo after the fix is merged
  • the "extra copy" issue exists in many other places in the code. to minimize the number of ceph submodule updates, could you please go through the rest of them?
    at least the easy ones like const std::string& functions args and const auto& in loops?

there are a lot of tests, and it keeps growing.

  1. unit-tests which cover the s3select-engine itself.
  2. s3tests which validate integration with RGW.

the plan is to complete with s3select functionalities(as describe in AWS) , and to have first RGW-integration with parquet.
later, as separate effort, a performance "session".
one of the issues is to implement zero-copy (including string_view).

when making changes to this repo you can only run the unit tests.
after this PR is merged, and the submodule is updated in the ceph rtepo, you can also build the new ceph and run the s3select tests

@galsalomon66
Copy link

  • we dont have CI for this repo, but please make sure that unit tests are passing
  • please make sure that the submodule reference is updated in the ceph repo after the fix is merged
  • the "extra copy" issue exists in many other places in the code. to minimize the number of ceph submodule updates, could you please go through the rest of them?
    at least the easy ones like const std::string& functions args and const auto& in loops?

there are a lot of tests, and it keeps growing.

  1. unit-tests which cover the s3select-engine itself.
  2. s3tests which validate integration with RGW.

the plan is to complete with s3select functionalities(as describe in AWS) , and to have first RGW-integration with parquet.
later, as separate effort, a performance "session".
one of the issues is to implement zero-copy (including string_view).

when making changes to this repo you can only run the unit tests.
after this PR is merged, and the submodule is updated in the ceph rtepo, you can also build the new ceph and run the s3select tests

indeed.
with small correction, the PR merge and sub-module update is necessary for teuthology process , it's also possible to run private branch for s3-test in teuthology, before updating sub-module into ceph-repo.

it's always possible to test RGW integration (including valgrind) before merging the sub-module into ceph-repo.

include/s3select.h Outdated Show resolved Hide resolved
include/s3select_oper.h Outdated Show resolved Hide resolved
include/s3select.h Outdated Show resolved Hide resolved
include/s3select.h Outdated Show resolved Hide resolved
@@ -1451,27 +1451,27 @@ bool base_statement::is_function()
}
}

bool base_statement::is_aggregate_exist_in_expression(base_statement* e) //TODO obsolete ?
bool base_statement::is_aggregate_exist_in_expression() const //TODO obsolete ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed,
it better to change this interface, to avoid conflicts.
(this specific recursion function, should probably remove)

@@ -1191,7 +1188,6 @@ class csv_object : public base_s3object
bool m_aggr_flow = false; //TODO once per query
bool m_is_to_aggregate;
bool m_skip_last_line;
size_t m_stream_length;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you remove that?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it appears not in use, (using other variable to mark end of-stream)

{
l.__val.dbl = __op((double)l.__val.num, r.__val.dbl);
l.type = value_En_t::FLOAT;
if (l.type != r.type) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what actually changed from here to the end of the function?
if its only indentation, please revert. this is difficult to review and does not add value here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem was that the indentation was wrong (not just a style issue).
Line 824 started a block with ind' of 4 instead of 6, which affected the rest of the function.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mentioned previously i'm using a utility which arrange those issues.
astyle --style=stroustrup --style=break --align-pointer=type --align-reference=type --pad-comma --add-braces --indent=spaces=2

it will be done on dedicated PR, since it create a lot of differences

@ronen-fr
Copy link
Author

In the latest version I hope I have restored old formatting for all lines I did not change
for other reasons

@@ -1087,20 +1087,20 @@ void push_data_type::operator()(s3select* self, const char* a, const char* b) co

if(cast_operator("int"))
{
self->getAction()->dataTypeQ.push_back("int");
self->getAction()->dataTypeQ.emplace_back("int");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, it performs better.
should note, those functions (AST builder) used for building the AST and are not intensive.

@galsalomon66
Copy link

there are currently several PR's on work, i will wait with those changes until one of them is merge.

s3select_functions* m_s3select_functions;
variable m_result;

void _resolve_name()
void _resolve_name() // todo: separate the object types used when constructing the __function vs when m_func_impl should be fixed

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

functions are created upon first use (first row) , kind of late-binding.
thus, it can not be const.
to avoid late-binding, function should be created upon building the AST, it means also to move semantic analysis to earlier stage.
this may happen on later phase, it's quite comprehensive change.

@yuvalif
Copy link
Collaborator

yuvalif commented Dec 31, 2020

@ronen-fr can you please resolve conflicts?

... and fixing some extra copying

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
const-correctness, overrides, extra copying, ...

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
@ronen-fr ronen-fr force-pushed the wip-ronenf-s3-sug branch 3 times, most recently from 24113a1 to 342b2b8 Compare December 31, 2020 14:42
... and some undoing of formatting changes

(and rebasing...)

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
@ronen-fr
Copy link
Author

@yuvalif rebased.
Also - I've undone more formatting changes.

galsalomon66 added a commit that referenced this pull request Jan 1, 2021
…e /reolve_name/ of functions is now done upon semnatic phase before execution; semantic phase should be more the name-resolving(will be on later phase); [ #42 ]

Signed-off-by: gal salomon <gal.salomon@gmail.com>
galsalomon66 added a commit that referenced this pull request Jan 1, 2021
#46)

* The semantic analysis for nested-calls of non-aggregate functions with aggregate function was wrong, this PR fix several issues. first it rejects illegal queries, second it executes correctly these use-cases.

Signed-off-by: gal salomon <gal.salomon@gmail.com>

* semantic analysis rejects illegal aggregation queries; it identifying wrong combinations of aggregation projection and non-aggregation projection

Signed-off-by: gal salomon <gal.salomon@gmail.com>

* marking subtree as runnable; changes to execution flow to support correct execution of nested&complex aggregation queries

Signed-off-by: gal salomon <gal.salomon@gmail.com>

* adding const qualifier to methods/variables; as part of this effort the /reolve_name/ of functions is now done upon semantic phase before execution; semantic phase should be more the name-resolving(will be on later phase);  [ #42 ]

Signed-off-by: gal salomon <gal.salomon@gmail.com>
@yuvalif
Copy link
Collaborator

yuvalif commented Jan 3, 2021

@yuvalif rebased.
Also - I've undone more formatting changes.

changes of the last commit looks good.
but it seems like another rebase is needed :-(
lets merge after rebase is done, so code won't diverge again.

there are several active PR's in parallel (at least 3), those merged PR are blocker for some.
as for this PR, i already merged great part of it, manually.
if its possible to use the clang-utility, it could save effort.

{
_resolve_name();
const_cast<__function*>(this)->_resolve_name();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i removed the call to /resolve_name/

@yuvalif
Copy link
Collaborator

yuvalif commented Jan 3, 2021

@yuvalif rebased.
Also - I've undone more formatting changes.

changes of the last commit looks good.
but it seems like another rebase is needed :-(
lets merge after rebase is done, so code won't diverge again.

there are several active PR's in parallel (at least 3), those merged PR are blocker for some.
as for this PR, i already merged great part of it, manually.

@galsalomon66 this is usually not recommended, it makes the rebase work much harder. ideally, you merge the urgent bug fix PRs, and then we rebase this PR and merge it

if its possible to use the clang-utility, it could save effort.

once this PR is merged I'll create a PR to add some "github actions" to this repo.
this could serve as an example on adding clang (or another) static analysis tool.

@galsalomon66
Copy link

@yuvalif rebased.
Also - I've undone more formatting changes.

changes of the last commit looks good.
but it seems like another rebase is needed :-(
lets merge after rebase is done, so code won't diverge again.
there are several active PR's in parallel (at least 3), those merged PR are blocker for some.
as for this PR, i already merged great part of it, manually.

@galsalomon66 this is usually not recommended, it makes the rebase work much harder. ideally, you merge the urgent bug fix PRs, and then we rebase this PR and merge it

if its possible to use the clang-utility, it could save effort.

once this PR is merged I'll create a PR to add some "github actions" to this repo.
this could serve as an example on adding clang (or another) static analysis tool.

@yuvalif , lets done with it, sooner the better.
(this PR was not in plan, it must be reviewed, some parts were under discussions, the #39 "waited" for a month)

@@ -399,7 +400,7 @@ struct s3select : public bsc::grammar<s3select>
}
}

if (aggr_flow == true)
if (aggr_flow)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

== true/false is style issue

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'== true' is a bit over the line dividing 'style' and 'irregular and strange'...

And, BTW - the 'if' statement semantics is 'if the expression in parens evaluates to 'true' then do ...'.
No new semantic meaning is conveyed by the '== true'. It just adds to the reader's burden.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed.
there is nothing logical or practical about /x == true/
i probably wanted to remind myself its a boolean and not some number
will change that.

this style appears in our code-base.
[ find ./src -name '.cc' -o -name '.h' | xargs egrep '== false|== true' | grep -v s3select | wc ] ;

galsalomon66 added a commit that referenced this pull request Feb 21, 2021
Signed-off-by: gal salomon <gal.salomon@gmail.com>
galsalomon66 added a commit that referenced this pull request Feb 21, 2021
Signed-off-by: gal salomon <gal.salomon@gmail.com>
@@ -474,42 +470,43 @@ struct _fn_to_int : public base_function

struct _fn_to_float : public base_function

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge to master #64

galsalomon66 added a commit that referenced this pull request Feb 22, 2021
* fixing text to float convertion (ronen-fr #42)

Signed-off-by: gal salomon <gal.salomon@gmail.com>

* fix integer conversion; add compile-warning flag

Signed-off-by: gal salomon <gal.salomon@gmail.com>
galsalomon66 added a commit that referenced this pull request Feb 22, 2021
* adding object to handle awscli response,first phase. adding warning flags;

Signed-off-by: gal salomon <gal.salomon@gmail.com>

* fixing text to float conversion (ronen-fr #42) (#64)

Signed-off-by: gal salomon <gal.salomon@gmail.com>

* fix integer conversion; add compile-warning flag

Signed-off-by: gal salomon <gal.salomon@gmail.com>

* fix stdin mode; fix compile warnings

Signed-off-by: gal salomon <gal.salomon@gmail.com>
{
{"null",reserve_word_en_t::S3S_NULL},{"NULL",reserve_word_en_t::S3S_NULL},
{"nan",reserve_word_en_t::S3S_NAN},{"NaN",reserve_word_en_t::S3S_NAN}
};

bool is_reserved_word(std::string & token)
static bool is_reserved_word(std::string & token)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for what purpose is the static qualifier ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to indicate that this is actually a standalone function that does not need an instance of the object.
in theory, you could have taken the function declaration outside if the class.
if this is likely to change (e.g. in the future this function will look into data members of the class) than keep it without the "static" keyword.
if not, than either add "static" or take out of the class.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm aware for that role of static.
the is_reserved_word is not stand for itself (should be a member)

{
return m_reserved_words.find(token) != m_reserved_words.end() ;
}

reserve_word_en_t get_reserved_word(std::string & token)
static reserve_word_en_t get_reserved_word(std::string & token) // todo consider returning std::optional

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for what purpose is the static qualifier ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants