Understanding this line: list_of_tuples = [(x,y) for x, y, label in data_one] Understanding this line: list_of_tuples = [(x,y) for x, y, label in data_one] numpy numpy

Understanding this line: list_of_tuples = [(x,y) for x, y, label in data_one]


list_of_tuples = [(x,y) for x, y, label in data_one]

(x, y) is a tuple <-- linked tutorial.

This is a list comprehension

    [(x,y) for x, y, label in data_one]#   ^                                 ^#   |       ^comprehension syntax^    |# begin list                       end list   

data_one is an iterable and is necessary for a list comprehension. Under the covers they are loops and must iterate over something.

x, y, label in data_one tells me that I can "unpack" these three items from every element that is delivered by the data_one iterable. This is just like a local variable of a for loop, it changes upon each iteration.

In total, this says:

Make a list of tuples that look like (x, y) where I get x, y, and label from each item delivered by the iterable data_one. Put each x and y into a tuple inside a list called list_of_tuples. Yes I know I "unpacked" label and never used it, I don't care.


Both ways are correct and work. You could probably relate the first way with the way things are done in C and other languages. This is, you basically run a for loop to go through all of the values and then append it to your list of tuples.

The second way is more pythonic but does the same. If you take a look at [(x,y) for x, y, label in data_one] (this is a list comprehension) you will see that you are also running a for loop on the same data but your result will be (x, y) and all of those results will form a list. So it achieves the same thing.

The third way (added as a response of the comments) uses a slice method.

I've prepared a small example similar to yours:

data = [(1, 2, 3), (2, 3, 4), (4, 5, 6)]def load_data():    list_of_tuples = []    for x, y, label in data:        list_of_tuples.append((x,y))    return list_of_tuplesdef load_data_2():    return [(x,y) for x, y, label in data]def load_data_3():    return [t[:2] for t in data]

They all do the same thing and return [(1, 2), (2, 3), (4, 5)] but their runtime is different. This is why a list comprehension is a better way to do this.

When i run the first method load_data() i get:

%%timeitload_data()1000000 loops, best of 3: 1.36 µs per loop

When I run the second method load_data_2() I get:

%%timeitload_data_2()1000000 loops, best of 3: 969 ns per loop

When I run the third method load_data_3() I get:

%%timeit load_data_3()1000000 loops, best of 3: 981 ns per loop

The second way, list comprehension, is faster!


The "improved" version uses a list comprehension. This makes the code declarative (describing what you want) rather than imperative (describing how to get what you want).

The advantages of declarative programming are that the implementation details are mostly left out, and the underlying classes and data-structures can perform the operations in an optimal way. For example, one optimisation that the python interpreter could make in your example above, would be to pre-allocate the correct size of the array list_of_tuples rather than having to continually resize the array during the append() operation.

To get you started with list comprehensions, I'll explain the way I normally start to write them. For a list L write something like this:

output = [x for x in L]

For each element in L, a variable is extracted (the centre x) and can be used to form the output list (the x on the left). The above expression effectively does nothing, and output the same as L. Imperatively, it is akin to:

output = []for x in L:    output.append(x)

From here though, you could realise that each x is actually a tuple that could be unpacked using tuple assignment:

output = [x for x, y, label in L]

This will create a new list, containing only the x element from each tuple in the list.

If you wanted to pack a different tuple in the output list, you just pack it on the left-hand side:

output = [(x,y) for x, y, label in L]

This is basically what you end up with in your optimised version.

You can do other useful things with list comprehensions, such as only inserting values that conform to a specific condition:

output = [(x,y) for x, y, label in L if x > 10]

Here is a useful tutorial about list comprehensions that you might find interesting: http://treyhunner.com/2015/12/python-list-comprehensions-now-in-color/