R How to read a file from google drive using R
Try
temp <- tempfile(fileext = ".zip")download.file("https://drive.google.com/uc?authuser=0&id=1AiZda_1-2nwrxI8fLD0Y6e5rTg7aocv0&export=download", temp)out <- unzip(temp, exdir = tempdir())bank <- read.csv(out[14], sep = ";")str(bank)# 'data.frame': 4119 obs. of 21 variables: # $ age : int 30 39 25 38 47 32 32 41 31 35 ... # $ job : Factor w/ 12 levels "admin.","blue-collar",..: 2 8 8 8 1 8 1 3 8 2 ... # $ marital : Factor w/ 4 levels "divorced","married",..: 2 3 2 2 2 3 3 2 1 2 ... # <snip>
The URL should correspond to the URL that you use to download the file using your browser.
As @Mako212 points out, you can also make use of the googledrive
package, substituting drive_download
for download.file
:
library(googledrive)temp <- tempfile(fileext = ".zip")dl <- drive_download( as_id("1AiZda_1-2nwrxI8fLD0Y6e5rTg7aocv0"), path = temp, overwrite = TRUE)out <- unzip(temp, exdir = tempdir())bank <- read.csv(out[14], sep = ";")
The google drive share link is not the direct file link, so
1. download.file
2. RCurl
first method in accepted answer
only download the web page showing the file, not file itself. You can edit the downloaded file and see it's a html file.You can find out the actual direct link to file with this. With the direct link all the regular download methods will work.
For very detailed discussions about getting the direct link or downloading it, see this question.
Google drive api require client to sign in, so googledrive package also ask you to sign in google if not already signed in.
You can do all this with the googledrive
package.
It's a two-step process where you first find the folder in order to get it's ID, and then query for all files with that folder as the parents.
dir = drive_find(pattern='my_folder', type='folder')query = paste('"', dir$id, '"', ' in parents', sep='')drive_find(q=query)
Note that drive_find
may return multiple folders if you have multiple folders all named "my_folder" in different parts of Drive, so you may need to modify the query to be more specific (i.e. by searching by a parent folder). I would suggest throwing in a check that only one folder is returned by just doing nrow(dir) == 1
. You can also change the query to use regex to indicate that it should only return an exact match on the folder name. In that case, replace the drive_find
command with
drive_find(pattern='^my_folder$', type='folder')
You can find more details on parameters for drive_find
at the documentation.