Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to recursively download all sharepoint doc files from folders and subfolders? #98

Closed
AakashBasu opened this issue Apr 2, 2019 · 11 comments
Labels

Comments

@AakashBasu
Copy link

I have a requirement where I've to recursively download all the files from root folder, subfolder and sub of subfolder to Nth.

How can I go about it? Is there a method to list folders in a particular folder? Also, how can I list folders in the root Document Library? @vgrem @Bachatero

@AakashBasu
Copy link
Author

Got a link, which says we retrieve the entire folder and file structure tree using a query.

First answer of this link: https://sharepoint.stackexchange.com/questions/159105/with-rest-recursively-retrieve-file-and-folder-directory-structure

I am trying to replicate this from the above link using your api: /_api/web/Lists/GetByTitle('Documents')/Items?$select=FileLeafRef,FileRef

But when I try this using below code:

folder = ctx.web.lists.get_by_title('Documents')
folder = folder.get_items('$select=FileLeafRef,FileRef')

It fails with an error: "'str' object has no attribute 'payload'"

What to do?

@Bachatero
Copy link

Hi,

you might use approach of calling proc, which recursively calls itself, e.g.:

def printAllContents(ctx, relativeUrl):

try:
    
    libraryRoot = ctx.web.get_folder_by_server_relative_url(relativeUrl)
    ctx.load(libraryRoot)
    ctx.execute_query()

    folders = libraryRoot.folders
    ctx.load(folders)
    ctx.execute_query()

    for myfolder in folders:
        print("Folder name: {0}".format(myfolder.properties["ServerRelativeUrl"]))
        printAllContents(ctx, relativeUrl + '/' + myfolder.properties["Name"])
        
    files = libraryRoot.files
    ctx.load(files)
    ctx.execute_query()

    for myfile in files:
        #print("File name: {0}".format(myfile.properties["Name"]))
        print("File name: {0}".format(myfile.properties["ServerRelativeUrl"]))
except:
    
    print('Problem printing out list of folders')   
    sys.exit(1)

m.

@Bachatero
Copy link

... you may then, for instance, download each file using ServerRelativeUrl which gets printed out ...

@AakashBasu
Copy link
Author

I posted my query here: in a more structured way.

@AakashBasu
Copy link
Author

FYI: The JSON there is just for representational/understanding purpose.

@Bachatero
Copy link

I'm not sure what you are getting at. I think the proc I've listed an example of just does that...recursively lists all folders/subfolders and files within these folders and subfolders...

@Bachatero
Copy link

Example of downloading the files as you go down the tree recursively...

outputDir = "d:\output"

def printAllContents(ctx, relativeUrl):

    try:
        
        libraryRoot = ctx.web.get_folder_by_server_relative_url(relativeUrl)
        ctx.load(libraryRoot)
        ctx.execute_query()

        folders = libraryRoot.folders
        ctx.load(folders)
        ctx.execute_query()

        for myfolder in folders:
            #print("Folder name: {0}".format(myfolder.properties["Name"]))
            print("Folder name: {0}".format(myfolder.properties["ServerRelativeUrl"]))
            printAllContents(ctx, relativeUrl + '/' + myfolder.properties["Name"])
            
        files = libraryRoot.files
        ctx.load(files)
        ctx.execute_query()

        for myfile in files:
            print("File name: {0}".format(myfile.properties["ServerRelativeUrl"]))
            pathList = myfile.properties["ServerRelativeUrl"].split('/')
            fileDest = outputDir + "/"+ pathList[-1]
            downloadFile(ctx, fileDest, myfile.properties["ServerRelativeUrl"])
            
    except:
        
        print('Problem printing out list of folders')   
        sys.exit(1)

@lihtiandxc
Copy link
Contributor

folder_list = []

def get_folder_relativeUrl(context, folder_relativeUrl):

    libraryRoot = context.web.get_folder_by_server_relative_url(folder_relativeUrl)
    folders = libraryRoot.folders
    context.load(folders)
    context.execute_query()

    for cur_folder in folders:
        folder_list.append(cur_folder.properties["ServerRelativeUrl"])
        get_folder_relativeUrl(context, cur_folder.properties["ServerRelativeUrl"])

    return folder_list

this way will give you the flat list contains Parent folder and Nth sub folders..

however, this is slower in term of performance.

@vgrem vgrem added the question label Feb 22, 2020
@vgrem
Copy link
Owner

vgrem commented Feb 22, 2020

Greetings,

since this question has been answered I propose to close it

@adpatil3
Copy link

Example of downloading the files as you go down the tree recursively...

outputDir = "d:\output"

def printAllContents(ctx, relativeUrl):

    try:
        
        libraryRoot = ctx.web.get_folder_by_server_relative_url(relativeUrl)
        ctx.load(libraryRoot)
        ctx.execute_query()

        folders = libraryRoot.folders
        ctx.load(folders)
        ctx.execute_query()

        for myfolder in folders:
            #print("Folder name: {0}".format(myfolder.properties["Name"]))
            print("Folder name: {0}".format(myfolder.properties["ServerRelativeUrl"]))
            printAllContents(ctx, relativeUrl + '/' + myfolder.properties["Name"])
            
        files = libraryRoot.files
        ctx.load(files)
        ctx.execute_query()

        for myfile in files:
            print("File name: {0}".format(myfile.properties["ServerRelativeUrl"]))
            pathList = myfile.properties["ServerRelativeUrl"].split('/')
            fileDest = outputDir + "/"+ pathList[-1]
            downloadFile(ctx, fileDest, myfile.properties["ServerRelativeUrl"])
            
    except:
        
        print('Problem printing out list of folders')   
        sys.exit(1)

Error message : NameError: name 'downloadFile' is not defined

@Bachatero
Copy link

Bachatero commented Mar 16, 2023

An example of a downloadFile proc which you are missing:

def downloadFile(ctx, fileDest, fileName, relativeUrl):

    """
        Downloads file from relative url
    """

    try:
        #check local directory if exists 1st
        #createLocalDirectory(fileDest)
        #check file exists on sharepoint
        #myExitCode = checkFileExists(ctx, relativeUrl)
        #if myExitCode == 0:
        myFile = fileDest + "\\" + fileName
        with open(myFile, "wb") as localFile:                        
            response = File.open_binary(ctx, relativeUrl)
            localFile.write(response.content) 
            localFile.close()

    except:
        
        print('Problem downloading file:', fileName)
        sys.exit(1) 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants