pandas DataFrame: select a set of columns including a sequence of columns
UPDATE: No need to use numpy.hstack
, you can just call numpy.r_
as below
Use iloc
+ numpy.r_
:
In [20]: df = DataFrame(randn(10, 3), columns=list('abc'))In [21]: dfOut[21]: a b c0 0.228163 -1.311485 -1.3356041 0.292547 -1.636901 0.0017652 0.744605 -0.325580 0.2050033 -0.580471 -0.531553 -0.7406974 0.250574 1.076019 -0.5949155 -0.148449 0.076951 -0.6535956 -1.065314 -0.166018 -1.4715327 1.133336 -0.529738 -1.2138418 -1.715281 -2.058831 0.1132379 -0.382412 -0.072540 0.294853[10 rows x 3 columns]In [22]: df.iloc[:, r_[:2]]Out[22]: a b0 0.228163 -1.3114851 0.292547 -1.6369012 0.744605 -0.3255803 -0.580471 -0.5315534 0.250574 1.0760195 -0.148449 0.0769516 -1.065314 -0.1660187 1.133336 -0.5297388 -1.715281 -2.0588319 -0.382412 -0.072540[10 rows x 2 columns]
To concatenate integer ranges use numpy.r_
:
In [35]: df = DataFrame(randn(10, 6), columns=list('abcdef'))In [36]: df.iloc[:, r_[:2, 2:df.columns.size:2]]Out[36]: a b c e0 -1.358623 -0.622909 0.025609 -1.1663031 0.527027 0.310530 2.892384 0.1904512 -0.251138 -1.246113 0.738264 0.0620783 -1.716028 0.419139 0.060225 -1.1915274 -1.308635 0.045396 -0.599367 -0.2024915 -0.620343 0.796364 -0.008802 0.1600206 0.199739 0.111816 -0.278119 1.0513177 -0.311206 0.090348 -0.237887 0.9582158 0.363161 2.449031 1.023352 0.7438539 0.039451 -0.855733 -0.836921 -0.835078[10 rows x 4 columns]
Now you can use similar syntax in python:
>>> from datar.all import c, f, select>>> from datar.datasets import starwars>>> >>> starwars name height mass hair_color skin_color eye_color birth_year sex gender homeworld species <object> <float64> <float64> <object> <object> <object> <float64> <object> <object> <object> <object>0 Luke Skywalker 172.0 77.0 blond fair blue 19.0 male masculine Tatooine Human1 C-3PO 167.0 75.0 NaN gold yellow 112.0 none masculine Tatooine Droid2 R2-D2 96.0 32.0 NaN white, blue red 33.0 none masculine Naboo Droid3 Darth Vader 202.0 136.0 none white yellow 41.9 male masculine Tatooine Human.. ... ... ... ... ... ... ... ... ... ... ...4 Leia Organa 150.0 49.0 brown light brown 19.0 female feminine Alderaan Human82 Rey NaN NaN brown light hazel NaN female feminine NaN Human83 Poe Dameron NaN NaN brown light brown NaN male masculine NaN Human84 BB8 NaN NaN none none black NaN none masculine NaN Droid85 Captain Phasma NaN NaN unknown unknown unknown NaN NaN NaN NaN NaN86 Padmé Amidala 165.0 45.0 brown light brown 46.0 female feminine Naboo Human[87 rows x 11 columns]>>> >>> starwars >> select(c(1, f[3:5], 7)) name mass hair_color skin_color birth_year <object> <float64> <object> <object> <float64>0 Luke Skywalker 77.0 blond fair 19.01 C-3PO 75.0 NaN gold 112.02 R2-D2 32.0 NaN white, blue 33.03 Darth Vader 136.0 none white 41.9.. ... ... ... ... ...4 Leia Organa 49.0 brown light 19.082 Rey NaN brown light NaN83 Poe Dameron NaN brown light NaN84 BB8 NaN none none NaN85 Captain Phasma NaN unknown unknown NaN86 Padmé Amidala 45.0 brown light 46.0[87 rows x 5 columns]>>> >>> # even with column names>>> starwars >> select(c(f.name, f[f.mass:f.skin_color], f.birth_year)) name mass hair_color skin_color birth_year <object> <float64> <object> <object> <float64>0 Luke Skywalker 77.0 blond fair 19.01 C-3PO 75.0 NaN gold 112.02 R2-D2 32.0 NaN white, blue 33.03 Darth Vader 136.0 none white 41.9.. ... ... ... ... ...4 Leia Organa 49.0 brown light 19.082 Rey NaN brown light NaN83 Poe Dameron NaN brown light NaN84 BB8 NaN none none NaN85 Captain Phasma NaN unknown unknown NaN86 Padmé Amidala 45.0 brown light 46.0[87 rows x 5 columns]
I am the author of the datar
package. Feel free to submit issues if you have any questions.