R How to read a file from google drive using R R How to read a file from google drive using R r r

R How to read a file from google drive using R


Try

temp <- tempfile(fileext = ".zip")download.file("https://drive.google.com/uc?authuser=0&id=1AiZda_1-2nwrxI8fLD0Y6e5rTg7aocv0&export=download",  temp)out <- unzip(temp, exdir = tempdir())bank <- read.csv(out[14], sep = ";")str(bank)# 'data.frame': 4119 obs. of  21 variables: # $ age           : int  30 39 25 38 47 32 32 41 31 35 ... # $ job           : Factor w/ 12 levels "admin.","blue-collar",..: 2 8 8 8 1 8 1 3 8 2 ... # $ marital       : Factor w/ 4 levels "divorced","married",..: 2 3 2 2 2 3 3 2 1 2 ... # <snip>

The URL should correspond to the URL that you use to download the file using your browser.

As @Mako212 points out, you can also make use of the googledrive package, substituting drive_download for download.file:

library(googledrive)temp <- tempfile(fileext = ".zip")dl <- drive_download(  as_id("1AiZda_1-2nwrxI8fLD0Y6e5rTg7aocv0"), path = temp, overwrite = TRUE)out <- unzip(temp, exdir = tempdir())bank <- read.csv(out[14], sep = ";")


  • The google drive share link is not the direct file link, so 1. download.file 2. RCurl first method in accepted answer only download the web page showing the file, not file itself. You can edit the downloaded file and see it's a html file.

  • You can find out the actual direct link to file with this. With the direct link all the regular download methods will work.

  • For very detailed discussions about getting the direct link or downloading it, see this question.

  • Google drive api require client to sign in, so googledrive package also ask you to sign in google if not already signed in.


You can do all this with the googledrive package.

It's a two-step process where you first find the folder in order to get it's ID, and then query for all files with that folder as the parents.

dir = drive_find(pattern='my_folder', type='folder')query = paste('"', dir$id, '"',  ' in parents', sep='')drive_find(q=query)

Note that drive_find may return multiple folders if you have multiple folders all named "my_folder" in different parts of Drive, so you may need to modify the query to be more specific (i.e. by searching by a parent folder). I would suggest throwing in a check that only one folder is returned by just doing nrow(dir) == 1. You can also change the query to use regex to indicate that it should only return an exact match on the folder name. In that case, replace the drive_find command with

drive_find(pattern='^my_folder$', type='folder')

You can find more details on parameters for drive_find at the documentation.