Auto Search list and Scrape table Auto Search list and Scrape table selenium selenium

Auto Search list and Scrape table


I presume you have got the names from excel sheet so I used a name list and using python request module and get the page text and then use beautiful soup to get table content and Then I have use pandas to get the info in dataframe.

Code:

import requestsimport pandas as pdfrom bs4 import BeautifulSoupplayernames=['Dominique Jones', 'Joe Young', 'Darius Adams', 'Lester Hudson', 'Marcus Denmon', 'Courtney Fortson']for name in playernames:  fname=name.split(" ")[0]  lname=name.split(" ")[1]  url="https://basketball.realgm.com/search?q={}+{}".format(fname,lname)  print(url)  r=requests.get(url)  soup=BeautifulSoup(r.text,'html.parser')  table=soup.select_one(".tablesaw ")  dfs=pd.read_html(str(table))  for df in dfs:      print(df)

Output:

https://basketball.realgm.com/search?q=Dominique+Jones            Player Pos   HT  ...  Draft Year          College               NBA0  Dominique Jones   G  6-4  ...        2010    South Florida  Dallas Mavericks1  Dominique Jones   G  6-2  ...        2009          Liberty                 -2  Dominique Jones  PG  5-9  ...        2011  Fort Hays State                 -[3 rows x 8 columns]https://basketball.realgm.com/search?q=Joe+Young      Player Pos   HT  ... Draft Year           College             NBA0  Joe Young   F  6-6  ...       2007        Holy Cross               -1  Joe Young   G  6-0  ...       2009          Canisius               -2  Joe Young   G  6-2  ...       2015            Oregon  Indiana Pacers3  Joe Young   G  6-2  ...       2009  Central Missouri               -[4 rows x 8 columns]https://basketball.realgm.com/search?q=Darius+Adams         Player Pos   HT  ...  Draft Year              College  NBA0  Darius Adams  PG  6-1  ...        2011         Indianapolis    -1  Darius Adams   G  6-0  ...        2018  Coast Guard Academy    -[2 rows x 8 columns]https://basketball.realgm.com/search?q=Lester+Hudson      Season       Team  GP  GS   MIN  ...   STL   BLK    PF   TOV    PTS0  2009-10 *  All Teams  25   0   5.3  ...  0.32  0.12  0.48  0.56   2.321  2009-10 *        BOS  16   0   4.4  ...  0.19  0.12  0.44  0.56   1.382  2009-10 *        MEM   9   0   6.8  ...  0.56  0.11  0.56  0.56   4.003    2010-11        WAS  11   0   6.7  ...  0.36  0.09  0.91  0.64   1.644  2011-12 *  All Teams  16   0  20.9  ...  0.88  0.19  1.62  2.00  10.885  2011-12 *        CLE  13   0  24.2  ...  1.08  0.23  2.00  2.31  12.696  2011-12 *        MEM   3   0   6.5  ...  0.00  0.00  0.00  0.67   3.007    2014-15        LAC   5   0  11.1  ...  1.20  0.20  0.80  0.60   3.608     CAREER        NaN  57   0  10.4  ...  0.56  0.14  0.91  0.98   4.70[9 rows x 23 columns]https://basketball.realgm.com/search?q=Marcus+Denmon    Season Team        Location  GP  GS  ...  STL  BLK    PF   TOV    PTS0  2012-13  SAN       Las Vegas   5   0  ...  0.4  0.0  1.60  0.20   5.401  2013-14  SAN       Las Vegas   5   1  ...  0.8  0.0  2.20  1.20  10.802  2014-15  SAN       Las Vegas   6   2  ...  0.5  0.0  1.50  0.17   5.003  2015-16  SAN  Salt Lake City   2   0  ...  0.0  0.0  0.00  0.00   0.004   CAREER  NaN             NaN  18   3  ...  0.5  0.0  1.56  0.44   6.17[5 rows x 24 columns]https://basketball.realgm.com/search?q=Courtney+Fortson      Season       Team  GP  GS   MIN   FGM  ...   AST  STL  BLK    PF   TOV   PTS0  2011-12 *  All Teams  10   0   9.5  1.10  ...  1.00  0.3  0.0  0.50  1.00  3.501  2011-12 *        HOU   6   0   8.2  1.00  ...  0.83  0.5  0.0  0.33  0.83  3.002  2011-12 *        LAC   4   0  11.5  1.25  ...  1.25  0.0  0.0  0.75  1.25  4.253     CAREER        NaN  10   0   9.5  1.10  ...  1.00  0.3  0.0  0.50  1.00  3.50[4 rows x 23 columns]


You have to have url list with players and scrape the pages using beautiful soup.

import urllib2from bs4 import BeautifulSoupsoup = BeautifulSoup(urllib2.urlopen('http://example.com').read())