Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

typo in kth order statistic #1215

Open
lrvideckis opened this issue Jan 1, 2024 · 7 comments
Open

typo in kth order statistic #1215

lrvideckis opened this issue Jan 1, 2024 · 7 comments

Comments

@lrvideckis
Copy link
Contributor

lrvideckis commented Jan 1, 2024

"deterministic linear solution is implemented in C++ standard library as std::nth_elemen"

https://cp-algorithms.com/sequences/k-th.html

but I think the STL implements the nondeterministic one/deterministic nlogn one

https://github.com/gcc-mirror/gcc/blob/9693459e030977d6e906ea7eb587ed09ee4fddbd/libstdc%2B%2B-v3/include/bits/stl_algo.h#L1913,L1933

  • looks like kth order on random pivots, then if depth is too large, switch to make heap; pop min element k times
@lrvideckis
Copy link
Contributor Author

On a side note, I read up about the deterministic linear median of medians; it's pretty cool

@lrvideckis
Copy link
Contributor Author

Also I was wondering if it's possible to implement median of medians iteratively. It seems pretty hard

@lrvideckis
Copy link
Contributor Author

I think it would be cool to switch the code on https://cp-algorithms.com/sequences/k-th.html to median of medians

I mean is anyone using the current code, noting that the stl already implements quick select? Also the stl's has a better worst case, which can matter: hacks on codeforces

@lrvideckis
Copy link
Contributor Author

@lrvideckis
Copy link
Contributor Author

#include <bits/stdc++.h>
using namespace std;

//swaps (k-le)th smallest number of a[le...ri) to a[k], i.e. (k-le)th index in range
void kth_order(vector<int>& a, int le, int k, int ri) {
    assert(le <= k && k < ri);
    if(ri - le <= 5) {
        nth_element(begin(a) + le, begin(a) + k, begin(a) + ri);
        return;
    }
    int siz = 0;
    for(int i = le; i < ri; i += 5) {
        int block_siz = min(5, ri - i);
        int block_mid = i + block_siz / 2;
        int block_ri = i + block_siz;
        nth_element(begin(a) + i, begin(a) + block_mid, begin(a) + block_ri);
        swap(a[block_mid], a[le + siz++]);
    }
    kth_order(a, le, le + siz / 2, le + siz);
    int pivot = a[le + siz / 2];
    int middle1 = partition(begin(a) + le, begin(a) + ri, [pivot](int elem) -> bool {
        return elem < pivot;
    }) - begin(a);
    int middle2 = partition(begin(a) + middle1, begin(a) + ri, [pivot](int elem) -> bool {
        return elem == pivot;
    }) - begin(a);
    if(k < middle1) kth_order(a, le, k, middle1);
    if(middle2 <= k) kth_order(a, middle2, k, ri);
}

mt19937 rng(chrono::steady_clock::now().time_since_epoch().count());
template <class T> inline T get_rand(T le, T ri) {
    assert(le <= ri);
    return uniform_int_distribution<T>(le, ri)(rng);
}

int main() {
    cin.tie(0)->sync_with_stdio(0);

    while(1) {
        int n = get_rand(0, 1000);
        cout << "test, n: " << n << endl;
        vector<int> a(n);
        for(int i = 0; i < n; i++) a[i] = get_rand(INT_MIN, INT_MAX);
        vector<int> sorted(a);
        sort(begin(sorted), end(sorted));
        for(int i = 0; i < n; i++) {
            vector<int> cpy(a);
            kth_order(cpy, 0, i, n);
            assert(cpy[i] == sorted[i]);
        }
    }

    return 0;
}

@mhayter
Copy link
Contributor

mhayter commented Jan 7, 2024

You seem to be correct gcc does a worst case $O(n \log n)$. Your other addition may also be valuable with some context (explanation). Please consider a pull request for your credit. (I'll probably submit one in a week for the text change if you don't.)

@lrvideckis
Copy link
Contributor Author

lrvideckis commented Jan 25, 2024

another interesting point: if you do quicksort as in https://en.cppreference.com/w/cpp/algorithm/partition where you have 2 middles: middle1 and middle2(where every value in range [middle1, middle2) equals the pivot; and you recurse on ranges [first, middle1) and [middle2, last))

and if you use median of medians to get exact median in deterministic linear time,

then quicksort is now bounded by O(n log(number of distinct values)) (<= O(nlogn))

so for example, sorting std::string's (of lower case letters only) is now O(n log(26))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants