Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unpacking in reverse #510

Open
chris-martin opened this issue Feb 21, 2023 · 5 comments
Open

Unpacking in reverse #510

chris-martin opened this issue Feb 21, 2023 · 5 comments

Comments

@chris-martin
Copy link

chris-martin commented Feb 21, 2023

The best way I've come up with so far to make a list that traverses a Text backwards is:

import Data.Text (Text, unsnoc)

unpackReverse :: Text -> String
unpackReverse = maybe [] (\(xs, x) -> x : unpackReverse xs) . unsnoc

Is there a way to make this substantially more efficient? Would it be worth adding to the library?

@Lysxia
Copy link
Contributor

Lysxia commented Feb 21, 2023

Maybe with foldr?

@chris-martin
Copy link
Author

chris-martin commented Feb 21, 2023

I've been very confused by the foldr documentation.

from right to left [...] evaluation actually traverses the Text from left to right

If I have a Text of length n and I force the first m characters from the reverse-unpacked String, would that be O(n) or O(m) with a foldr approach? Internally, foldr is using stream, not reverseStream, so I would think it's not actually decoding from the right.

@chris-martin
Copy link
Author

chris-martin commented Feb 21, 2023

I suppose I have the same question/request about packing as well. We have unfoldr but not unfoldl, and so I'm not sure how better to reverse-pack any more efficiently than Text.reverse . Text.pack. Since packing is strict anyway, I don't think it matters as much as the unpacking question, but still it seems like one ought to be able to do better.

@chris-martin
Copy link
Author

chris-martin commented Feb 21, 2023

I don't see any substantial time difference between forward/reverse unpacking, so I suppose you're right about foldr.

import Criterion.Main
import qualified Data.Text as Text

forward = take 200 . Text.unpack

backward = take 200 . Text.foldr (:) []

main = defaultMain
  [ bgroup "forward"
      [ env (pure $ Text.replicate 1000 "abc") $ \t -> bench "1000" $ nf forward t
      , env (pure $ Text.replicate 1000000 "abc") $ \t -> bench "1000000" $ nf forward t
      , env (pure $ Text.replicate 1000000000 "abc") $ \t -> bench "1000000000" $ nf forward t
      ]
  , bgroup "backward"
      [ env (pure $ Text.replicate 1000 "abc") $ \t -> bench "1000" $ nf backward t
      , env (pure $ Text.replicate 1000000 "abc") $ \t -> bench "1000000" $ nf backward t
      , env (pure $ Text.replicate 1000000000 "abc") $ \t -> bench "1000000000" $ nf backward t
      ]
  ]
benchmarking forward/1000
time                 2.441 μs   (2.396 μs .. 2.496 μs)
                     0.998 R²   (0.996 R² .. 0.999 R²)
mean                 2.418 μs   (2.396 μs .. 2.454 μs)
std dev              94.17 ns   (67.51 ns .. 130.5 ns)
variance introduced by outliers: 52% (severely inflated)

benchmarking forward/1000000
time                 2.529 μs   (2.482 μs .. 2.586 μs)
                     0.997 R²   (0.995 R² .. 0.999 R²)
mean                 2.520 μs   (2.489 μs .. 2.582 μs)
std dev              139.6 ns   (97.44 ns .. 199.7 ns)
variance introduced by outliers: 69% (severely inflated)

benchmarking forward/1000000000
time                 2.466 μs   (2.435 μs .. 2.501 μs)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 2.455 μs   (2.440 μs .. 2.476 μs)
std dev              59.03 ns   (41.70 ns .. 77.29 ns)
variance introduced by outliers: 29% (moderately inflated)

benchmarking backward/1000
time                 2.499 μs   (2.449 μs .. 2.552 μs)
                     0.997 R²   (0.996 R² .. 0.999 R²)
mean                 2.505 μs   (2.469 μs .. 2.574 μs)
std dev              160.0 ns   (97.61 ns .. 236.9 ns)
variance introduced by outliers: 74% (severely inflated)

benchmarking backward/1000000
time                 2.490 μs   (2.454 μs .. 2.529 μs)
                     0.998 R²   (0.998 R² .. 0.999 R²)
mean                 2.480 μs   (2.458 μs .. 2.515 μs)
std dev              91.78 ns   (59.58 ns .. 151.2 ns)
variance introduced by outliers: 49% (moderately inflated)

benchmarking backward/1000000000
time                 2.467 μs   (2.447 μs .. 2.495 μs)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 2.469 μs   (2.453 μs .. 2.491 μs)
std dev              60.28 ns   (46.57 ns .. 75.09 ns)
variance introduced by outliers: 30% (moderately inflated)

Perhaps all that I needed here was more clear documentation.

@Lysxia
Copy link
Contributor

Lysxia commented Feb 21, 2023

I had a brainfart moment. For a lazy reverse-into-string function, right now, the best safe solution seems to be using unsnoc as you did. Another way is to use Data.Text.Unsafe.reverseIter, which might be faster, or maybe not.

Ideally, I think a better solution would be provided by foldl (not foldr), but right now because all the folds go through stream, they all do left-to-right traversals. My point is that we could implement a lazy foldl as a right-to-left traversal, and that would give you lazy reverse-into-string without having to write a recursive function yourself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants