Python faster than compiled Haskell? Python faster than compiled Haskell? python python

Python faster than compiled Haskell?


The Original Haskell Code

There are two issues with the Haskell version:

  • You're using string IO, which builds linked lists of characters
  • You're using a non-quicksort that looks like quicksort.

This program takes 18.7 seconds to run on my Intel Core2 2.5 GHz laptop. (GHC 7.4 using -O2)

Daniel's ByteString Version

This is much improved, but notice it still uses the inefficient built-in merge sort.

His version takes 8.1 seconds (and doesn't handle negative numbers, but that's more of a non-issue for this exploration).

Note

From here on this answer uses the following packages: Vector, attoparsec, text and vector-algorithms. Also notice that kindall's version using timsort takes 2.8 seconds on my machine (edit: and 2 seconds using pypy).

A Text Version

I ripped off Daniel's version, translated it to Text (so it handles various encodings) and added better sorting using a mutable Vector in an ST monad:

import Data.Attoparsec.Text.Lazyimport qualified Data.Text.Lazy as Timport qualified Data.Text.Lazy.IO as TIOimport qualified Data.Vector.Unboxed as Vimport qualified Data.Vector.Algorithms.Intro as Iimport Control.Applicativeimport Control.Monad.STimport System.Environment (getArgs)parser = many (decimal <* char '\n')main = do    numbers <- TIO.readFile =<< fmap head getArgs    case parse parser numbers of        Done t r | T.null t -> writeFile "sorted" . unlines                                                  . map show . vsort $ r        x -> error $ Prelude.take 40 (show x)vsort :: [Int] -> [Int]vsort l = runST $ do        let v = V.fromList l        m <- V.unsafeThaw v        I.sort m        v' <- V.unsafeFreeze m        return (V.toList v')

This runs in 4 seconds (and also doesn't handle negatives)

Return to the Bytestring

So now we know we can make a more general program that's faster, what about making the ASCii -only version fast? No problem!

import qualified Data.ByteString.Lazy.Char8 as BSimport Data.Attoparsec.ByteString.Lazy (parse,  Result(..))import Data.Attoparsec.ByteString.Char8 (decimal, char)import Control.Applicative ((<*), many)import qualified Data.Vector.Unboxed as Vimport qualified Data.Vector.Algorithms.Intro as Iimport Control.Monad.STparser = many (decimal <* char '\n')main = do    numbers <- BS.readFile "rands"    case parse parser numbers of        Done t r | BS.null t -> writeFile "sorted" . unlines                                                   . map show . vsort $ rvsort :: [Int] -> [Int]vsort l = runST $ do        let v = V.fromList l        m <- V.unsafeThaw v        I.sort m        v' <- V.unsafeFreeze m        return (V.toList v')

This runs in 2.3 seconds.

Producing a Test File

Just in case anyone's curious, my test file was produced by:

import Control.Monad.CryptoRandomimport Crypto.Randommain = do  g <- newGenIO :: IO SystemRandom  let rs = Prelude.take (2^20) (map abs (crandoms g) :: [Int])  writeFile "rands" (unlines $ map show rs)

If you're wondering why vsort isn't packaged in some easier form on Hackage... so am I.


In short, don't use read. Replace read with a function like this:

import NumericfastRead :: String -> IntfastRead s = case readDec s of [(n, "")] -> n

I get a pretty fair speedup:

~/programming% time ./test.slow./test.slow  9.82s user 0.06s system 99% cpu 9.901 total~/programming% time ./test.fast./test.fast  6.99s user 0.05s system 99% cpu 7.064 total~/programming% time ./test.bytestring./test.bytestring  4.94s user 0.06s system 99% cpu 5.026 total

Just for fun, the above results include a version that uses ByteString (and hence fails the "ready for the 21st century" test by totally ignoring the problem of file encodings) for ULTIMATE BARE-METAL SPEED. It also has a few other differences; for example, it ships out to the standard library's sort function. The full code is below.

import qualified Data.ByteString as BSimport Data.Attoparsec.ByteString.Char8import Control.Applicativeimport Data.Listparser = many (decimal <* char '\n')reallyParse p bs = case parse p bs of    Partial f -> f BS.empty    v -> vmain = do    numbers <- BS.readFile "data"    case reallyParse parser numbers of        Done t r | BS.null t -> writeFile "sorted" . unlines . map show . sort $ r


More a Pythonista than a Haskellite, but I'll take a stab:

  1. There's a fair bit of overhead in your measured runtime just reading and writing the files, which is probably pretty similar between the two programs. Also, be careful that you've warmed up the cache for both programs.

  2. Most of your time is spent making copies of lists and fragments of lists. Python list operations are heavily optimized, being one of the most-frequently used parts of the language, and list comprehensions are usually pretty performant too, spending much of their time in C-land inside the Python interpreter. There is not a lot of the stuff that is slowish in Python but wicked fast in static languages, such as attribute lookups on object instances.

  3. Your Python implementation throws away numbers that are equal to the pivot, so by the end it may be sorting fewer items, giving it an obvious advantage. (If there are no duplicates in the data set you're sorting, this isn't an issue.) Fixing this bug probably requires making another copy of most of the list in each call to qs(), which would slow Python down a little more.

  4. You don't mention what version of Python you're using. If you're using 2.x, you could probably get Haskell to beat Python just by switching to Python 3.x. :-)

I'm not too surprised the two languages are basically neck-and-neck here (a 10% difference is not noteworthy). Using C as a performance benchmark, Haskell loses some performance for its lazy functional nature, while Python loses some performance due to being an interpreted language. A decent match.

Since Daniel Wagner posted an optimized Haskell version using the built-in sort, here's a similarly optimized Python version using list.sort():

mylist = [int(x.strip()) for x in open("data")]mylist.sort()open("sorted", "w").write("\n".join(str(x) for x in mylist))

3.5 seconds on my machine, vs. about 9 for the original code. Pretty much still neck-and-neck with the optimized Haskell. Reason: it's spending most of its time in C-programmed libraries. Also, TimSort (the sort used in Python) is a beast.