Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize encodeUtf8 by checking if the byte array is already pinned #471

Open
noughtmare opened this issue Sep 29, 2022 · 2 comments
Open

Comments

@noughtmare
Copy link
Contributor

noughtmare commented Sep 29, 2022

The byte array underlying Text might already be pinned:

A byte array [and by extension Text] can be pinned as a result of three possible causes:

  1. It was allocated by newPinnedByteArray#. [Text is never constructed using this primitive]
  2. It is large. Currently, GHC defines large object to be one that is at least as large as 80% of a 4KB block (i.e. at least 3277 bytes).
  3. It has been copied into a compact region. The documentation for ghc-compact and compact describes this process.

Especially 2 seems likely and a case where we could save a significant amount of time by avoiding copying. Could we use isByteArrayPinned# to check if the byte array is already pinned and in that case avoid copying?

@Bodigrim
Copy link
Contributor

encodeUtf8 :: Text -> ByteString
encodeUtf8 (Text arr off len)
| len == 0 = B.empty
-- It would be easier to use Data.ByteString.Short.fromShort and slice later,
-- but this is undesirable when len is significantly smaller than length arr.
| otherwise = unsafeDupablePerformIO $ do
marr@(A.MutableByteArray mba) <- unsafeSTToIO $ A.newPinned len
unsafeSTToIO $ A.copyI len marr 0 arr off
let fp = ForeignPtr (byteArrayContents# (unsafeCoerce# mba))
(PlainPtr mba)
pure $ B.fromForeignPtr fp 0 len

Might be worth a try, but take care not to retain the original pinned ByteArray if Text has been sliced.

@noughtmare
Copy link
Contributor Author

Ah, I didn't think about slicing. I guess it is best to only reuse the original byte array if it is both pinned and not sliced. Although perhaps it would be acceptable to reuse the original byte array if the slice is still like 90% of the original size, but I don't think that is particularly common.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants