Skip to content

An efficient compact, immutable byte string type (both strict and lazy) suitable for binary or 8-bit character data.

License

Notifications You must be signed in to change notification settings

dcoutts/bytestring

 
 

Repository files navigation

------------------------------------------------------------------------
               ByteString : Fast, packed strings of bytes
------------------------------------------------------------------------

This library provides the Data.ByteString library -- strict and lazy
byte arrays manipulable as strings -- providing very time and space
efficient string and IO operations.

For very large data requirements, or constraints on heap size,
Data.ByteString.Lazy is provided, a lazy list of bytestring chunks.
Efficient processing of multi-gigabyte data can be achieved this way.

Requirements:
        > Cabal
        > GHC 6.4 or greater, or hugs

Building:
        > runhaskell Setup.lhs configure --prefix=/f/g
        > runhaskell Setup.lhs build
        > runhaskell Setup.lhs install

After installation, you can run the testsuite as follows:
    
        > cd tests ; make
    or
        > cd tests ; make hugs

For the full test and benchmark suite, you need GHC and Hugs:

        > cd tests ; make everything

Authors:
    ByteString was derived from the GHC PackedString library,
    originally written by Bryan O'Sullivan, and then by Simon Marlow.
    It was adapted, and greatly extended for darcs by David Roundy, and
    others. Don Stewart cleaned up and further extended the implementation.
    Duncan Coutts wrote much of the .Lazy code. Don, Duncan and Roman
    Leshchinskiy wrote the fusion system.

------------------------------------------------------------------------

Performance, some random numbers (with GHC):

This table compares the performance of common operations ByteString,
from various string libraries.

Size of test data: 21256k, Linux 3.2Ghz P4

                          FPS7       SPS     PS      [a]    
++                        0.028      !       !       1.288   
length                    0.000      0.000   0.000   0.131   
pack                      0.303      0.502   0.337   -       
unpack                    3.319*     1.630   7.445   -       
compare                   0.000      0.000   0.000   0.000   
index                     0.000      0.000   0.000   0.000   
map                       2.762*     2.917   4.813   7.286   
filter                    0.304      2.805   0.954   0.305   
take                      0.000      0.000   0.024   0.005   
drop                      0.000      0.000   11.768  0.130   
takeWhile                 0.000      1.498   0.000   0.000   
dropWhile                 0.000      1.985   8.447   0.130   
span                      0.000      9.289   11.144  0.131   
break                     0.000      9.383   11.268  0.133   
lines                     0.052      1.114   1.367   2.790   
unlines                   0.048      !       !       10.950  
words                     1.344      2.128   5.644   4.184   
unwords                   0.016      !       !       1.305   
reverse                   0.024      12.997  13.018  1.622   
concat                    0.000      12.701  11.459  1.163   
cons                      0.016      2.064   8.358   0.131   
empty                     0.000      0.000   0.000   0.000   
head                      0.000      0.000   0.000   0.000   
tail                      0.000      0.000   14.490  0.130   
elem                      0.000      1.490   0.001   0.000   
last                      0.000      -       -       0.143   
init                      0.000      -       -       1.147   
inits                     0.414      -       -       !       
tails                     0.460      -       -       1.136   
intersperse               0.040      -       -       10.517  
any                       0.000      -       -       0.000   
all                       0.000      -       -       0.000   
sort                      0.168      -       -       !
maximum                   0.024      -       -       0.183
minimum                   0.025      -       -       0.185
replicate                 0.000      -       -       0.053   
findIndex                 0.096
find                      0.120      -       -       0.000   
elemIndex                 0.000      -       -       0.000   
elemIndicies              0.008      -       -       0.314   
foldl                     0.148
spanEnd                   0.000
snoc                      0.016
filterChar                0.031      
filterNotChar             0.124
join                      0.016      
split                     0.032      
findIndices               0.408      
splitAt                   0.000      
lineIndices               0.029      
breakOn                   0.000      
breakSpace                0.000 
splitWith                 0.329 
dropSpace                 0.000 
dropSpaceEnd              0.000 
joinWithChar              0.017
join /                    0.016 
zip                       0.960 
zipWith                   0.892 
isSubstringOf             0.039 
isPrefixOf                0.000 
isSuffixOf                0.000
count                     0.021

Key: FPS6 = FPS 0.6
     SPS  = Simon Marlow's packedstring prototype
     PS   = Data.PackedString
     [a]  = [Char]

     -    = no function exists
     !    = stack or memory exhaustion

------------------------------------------------------------------------

== Stress testing really big strings

Doing some stress testing of FPS, here are some results for 0.5G strings.

3.2Ghz box, 2G physical mem.

Size of test data: 524288k
Size of test data: 524288k
                Char8   Word8

Effectively O(1) or O(m) where m < n
    all             0.000   0.000   
    any             0.000   0.004   
    break           0.000   0.000   
    breakChar       0.000   0.000   
    breakSpace      0.000   
    compare         0.000   
    concat          0.000   
    drop            0.000   
    dropSpace       0.000   
    dropSpaceEnd    0.000   
    dropWhile       0.000   0.000   
    elem            0.000   0.000   
    elemIndex       0.000   0.000   
    elemIndexLast   0.000   0.000   
    empty           0.000   
    head            0.000   0.000   
    index           0.000   0.000   
    init            0.000   
    last            0.000   0.000   
    length          0.000   
    notElem         0.000   0.000   
    span            0.000   0.000   
    spanChar        0.000   0.000   
    spanEnd         0.000   0.000   
    splitAt         0.000   
    tail            0.000   
    take            0.000   
    takeWhile       0.000   0.000   
    isPrefixOf      0.000   
    isSuffixOf      0.000   
    addr1           0.000   
    addr2           0.000   

O(n)
    ++              0.676   
    map             6.080   5.868   
    cons            0.396   0.396   
    snoc            0.400   0.400   
    find            3.240   
    split           1.204   1.200   
    lines           2.000   
    foldl           3.804   
    unwords         0.552   
    reverse         0.884   
    findIndex       3.128   
    filterChar      0.756   0.732   
    filter/='f'     8.265   7.012   
    filterNotChar   4.456   3.388   
    join            0.400   
    sort            4.344   
    maximum         0.776   0.764   
    minimum         0.772   0.776   
    replicate       0.008   0.000   
    elemIndices     0.240   0.240   
    lineIndices     1.092   
    joinWithChar    0.400   0.400   
    isSubstringOf   0.052   
    count           0.748   

slow O(n)
    words           38.722  
    group           77.261  
    groupBy         96.226  
    inits           32.430  
    tails           23.225  
    findIndices     13.841  15.825  
    splitWith       18.445  19.225  
    zip             33.926  
    zipWith         33.562  


About

An efficient compact, immutable byte string type (both strict and lazy) suitable for binary or 8-bit character data.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Haskell 98.9%
  • C 1.1%