Efficient way to shuffle lists within a list in Python
@TerryH - you need not
.shuffle()
the RAM-memory-content ofaListOfSTRINGs
at all, it ought be just enough to generate anp.random.permutation( len( aListOfListsOfSTRINGs[ ith ] ) )
so as to create ad-hoc, at a cost of butO(1)
~260 [us]
per list, spent ALAP, a new random order, right-sized for an indirect access to thestr
-members of the ith-aListOfSTRINGs
( why moving RAM-I/O expensive data so as to "read"-in-order somewhere later, when no data need ever be touched, until ALAP "reading" from cache-served block(s) using an indirect-addressing of the components? )
For the Real-World costs of a wish to go parallel, you may like this post, with an interactive graph-tool.
As @user2357112 supports Monica commented below,
shuffling was aimed to take place rather inside aListOfSTRINGs
, not on aListOfListsOfSTRINGs
, Mea Culpa
Q : "can I achieve the shuffling faster"?
Yes. A lot. ...150 x
times - well under 2.5 [s]
are achievable with the right-enough tools
Q : "... how I can make it shuffling operation more efficient ?"
The in-place .shuffle()
takes less than ~ 23 [s]
on list( L )
over 16,000,000 items in plain Py2.7 tools
from zmq import Stopwatch; aClk = Stopwatch() #_______________________ a [us] Stopwatchpass; import random#_____________L creation ~ 2.7 [s]___________________________________________aClk.start(); L = [ strID for strID in xrange( int( 16E6 ) ) ]; aClk.stop()2721084print L[:5] #___________________________________________________________proof[0, 1, 2, 3, 4]#_____________random.shuffle( L )______________________________________+proofaClk.start(); random.shuffle( L ); aClk.stop(); print "0:5\t", L[:5]214732610:5 [13868243, 13087869, 13207292, 9344202, 1853783]#_____________random.shuffle( L )______________________________________+proofaClk.start(); random.shuffle( L ); aClk.stop(); print "0:5\t", L[:5]225739220:5 [837396, 15032889, 10942767, 14571341, 4867854]#_______________________________________________________________________proof>>> len( L )16000000
The in-place .shuffle()
takes under ~ 48 [s]
on list( L )
over 16,000,000 items in plain Py3.5 tools.
$ conda activate py3$ python...aClk.start(); L = [ strID for strID in range( int( 16E6 ) ) ]; aClk.stop()1959052#_____________random.shuffle( L )______________________________________+proofaClk.start(); random.shuffle( L ); aClk.stop(); print( "0:5\t", L[:5] )451048060:5 [15744525, 10635923, 14530509, 10535840, 1465987]#_____________random.shuffle( L )______________________________________+proofaClk.start(); random.shuffle( L ); aClk.stop(); print( "0:5\t", L[:5] )471393580:5 [884437, 15420153, 9957947, 8118734, 11960914]
Let's go get The Real Performance boosted :
import numpy as np#____________L_as_a32______________16E6________________________~ 74 [ms]>>> aClk.start(); a32 = np.arange( 16E6, dtype = np.int32 ); aClk.stop()74054#_____________np.random.shuffle( a32-bit )______________________________+proofaClk.start(); np.random.shuffle( a32 ); aClk.stop(); print "0:5\t", a32[:5]24007860:5 [ 2487493 14646705 13717283 5602561 7934593]aClk.start(); np.random.shuffle( a32 ); aClk.stop(); print "0:5\t", a32[:5]23683810:5 [ 4841042 12882529 12298351 2198866 7054284]aClk.start(); np.random.shuffle( a32 ); aClk.stop(); print "0:5\t", a32[:5]23690110:5 [14595649 7239135 3339593 9517600 6506681]#_____________np.random.shuffle( a64-bit )______________________________+proofaClk.start(); np.random.shuffle( a64 ); aClk.stop(); print "0:5\t", a64[:5]24244870:5 [ 3234133 9224551 971604 13027484 806393]aClk.start(); np.random.shuffle( a64 ); aClk.stop(); print "0:5\t", a64[:5]23868730:5 [ 3212124 10644428 8192909 2234984 13103406]aClk.start(); np.random.shuffle( a64 ); aClk.stop(); print "0:5\t", a64[:5]23760650:5 [ 5624301 7741070 8859092 12287465 11721315]
If indeed going to get The Ultimate Performance :
- maintain all
str
data as-is, stored in justaListOfSTRINGs
- each new
aListOfSTRINGs
append at a constant cost ofO(1)
to a non-re-shuffled, linearly growing, constant-order storage -aListOfListsOfSTRINGs
- instead of paying awfully high memory-I/O costs of shuffling that storage making
aListOfListsOfSTRINGs
, rather maintain aaListOfORDINALs
( be it a plain-list
or anumpy
-array, where just appending alen( aListOfListsOfSTRINGs )
, whenever a new member-aListOfSTRINGs
got in ) - enjoy very fast and very efficient in-place
aListOfORDINALs.shuffle()
, well under23 [s]
in Py2.7 or< 50 [s]
in Py3.5 - access all
str
-s as
aListOfListsOfSTRINGs[aListOfORDINALs[Nth_L_inLoLoStr]][Nth_str_inLoStr]
at superfast times at costs ofO(1)
to get the actualstr
-s