Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to list all files in Sharepoint #307

Closed
noppGithub opened this issue Dec 16, 2020 · 7 comments
Closed

Unable to list all files in Sharepoint #307

noppGithub opened this issue Dec 16, 2020 · 7 comments
Labels

Comments

@noppGithub
Copy link

noppGithub commented Dec 16, 2020

I can print the web title, but can not list all files in the "Document" folder using the method found in the link below.

System info:
  os: MacOSX
  python: Python 3.8.5
  lib version: 2.3.1

#98

What I could change to get a list of all files in "Documents" or "Folder1"?
img

The error is below

('-2147024894, System.IO.FileNotFoundException', 'File Not Found.', "404 Client Error: Not Found for url: https://abc.sharepoint.com/sites/AProjectName/_api/Web/getFolderByServerRelativeUrl('sites%2FAProjectName%2F')")

My script that can print the web title, but can not list all files.

from office365.runtime.auth.user_credential import UserCredential
from office365.sharepoint.client_context import ClientContext

def printAllContents(ctx, relativeUrl):
    try:
        libraryRoot = ctx.web.get_folder_by_server_relative_url(relativeUrl)
        ctx.load(libraryRoot)
        ctx.execute_query()

        folders = libraryRoot.folders
        ctx.load(folders)
        ctx.execute_query()

        for myfolder in folders:
            #print("Folder name: {0}".format(myfolder.properties["Name"]))
            print("Folder name: {0}".format(myfolder.properties["ServerRelativeUrl"]))
            printAllContents(ctx, relativeUrl + '/' + myfolder.properties["Name"])
            
        files = libraryRoot.files
        ctx.load(files)
        ctx.execute_query()

        for myfile in files:
            print("File name: {0}".format(myfile.properties["ServerRelativeUrl"]))
            # pathList = myfile.properties["ServerRelativeUrl"].split('/')
            # fileDest = outputDir + "/"+ pathList[-1]
            # downloadFile(ctx, fileDest, myfile.properties["ServerRelativeUrl"])
            
    except Exception as e:
        print(e)
        pass 

site_url = "https://abc.sharepoint.com/sites/AProjectName"
username = "myusername@abc.com"
password = "PGK=x12345677"
ctx = ClientContext(site_url).with_credentials(UserCredential(username,password))
web = ctx.web
ctx.load(web)
ctx.execute_query()
print("Web title: {0}".format(web.properties['Title']))

# try to find the right relativeurl
# Failed here
urls_try = [
    '/sites/AProjectName/',
    '/sites/AProjectName/Documents/',
    '/Documents',
    '/sites/team/Shared Documents/'
]
for relative_url in urls_try:
    printAllContents(ctx,relative_url)
@vgrem vgrem added the question label Dec 16, 2020
@noppGithub
Copy link
Author

@vgrem Could you please share with me some guide to solve this question?

@vgrem vgrem closed this as completed in 16d115f Jan 7, 2021
@vgrem
Copy link
Owner

vgrem commented Jan 7, 2021

Greetings @noppGithub!

First and foremost would like to confirm, your example for enumerating and printing files looks good (printAllContents) and from the flowing urls should be valid for addressing list url:

  • option 1 : /sites/team/Shared Documents/ - by server relative url, format /site-or-web-path/list-url
  • option 2: Shared Documents - by server relative url of List without Site/Web path included

Note: it is assumed the url of Documents library is Shared Documents, at least the default library has it.

Now you might be wondering why the case /sites/team/Shared Documents/ is not working as expected? Turns out the method Web.get_folder_by_server_relative_url got broken in version 2.3.1 in a way that library could no longer be addressed by url in format /site-or-web-path/list-url since first delimiter / gets ignored by this method and 404 error gets returned.

It's been resolved in the referenced commit and the error should no longer occur (the latest version from GitHub needs to be grabbed). Alternately library could still be addressed by option 2:

libraryRoot = ctx.web.get_folder_by_server_relative_url('Shared Documents')

And last but no least, there are another options available to enumerate files in library, for example i recommend to consider this approach:

def printAllContents(ctx, relativeUrl):
    library = ctx.web.get_list(relativeUrl)
    all_items = library.items.filter("FSObjType eq 0").expand(["File"]).get().execute_query()
    for item in all_items:  # type: ListItem
        cur_file = item.file
        print("File name: {0}".format(cur_file.serverRelativeUrl))
  • It targets ListItem in List instead of File in Folder but File is retrieved as associated property. Filter FSObjType eq 0 ensures that items for Files are getting returned and folders excluded
  • from performance perspective it could be more beneficial since no need to perform request per folder.

Thank you for providing the detailed description, it helped to pin-point the issue!

Cheers,
Vadim

@vgrem vgrem reopened this Jan 7, 2021
@vgrem vgrem closed this as completed Jan 7, 2021
@noppGithub
Copy link
Author

@vgrem Thank you so much for the update, I will try to use your guide with your version 2.3.1

Cheers,
nopp

@xiaosagemisery
Copy link

@vgrem Thank you so much for the update, I will try to use your guide with your version 2.3.1

Cheers,
nopp

any enhancement in version 2.3.1 for enumerating files?

@anais-surrusca
Copy link

Hello,
Being stucked on this problem too :/ Is there any change with newest version @vgrem ?

@milosz-k
Copy link

@vgrem Your solution works! I set relativeUrl='Shared Documents' and the result of my print looks like this:
File name: /sites/{mysite}/{folder1}/{folder2}/Shared Documents/test3/test13.txt
File name: /sites/{mysite}/{folder1}/{folder2}/Shared Documents/test/test12.txt

(I have 2 folders , in Shared Documents which contains sequentially <test12.txt>, <test13.txt>)
At the moment it is easy to manage my sharepoint f.e. using split functions. Thanks a lot.

@KanrongYu
Copy link

The printAllContents() function @noppGithub has created works great in my case!

But I am not able to use the printAllContents() one from @vgrem as my file names are like /sites/{mysite}/Shared Documents/{folder1}/{folder2}/file_names, and setting relativeUrl='Shared Documents/{folder1}/{folder2}/' gives me '-2147024860, Microsoft.SharePoint.SPQueryThrottledException', 'The attempted operation is prohibited because it exceeds the list view threshold.', "500 Server Error: Internal Server Error for url" error.

There might be 5000+ files under 'Shared Documents', is this the reason library.items.filter() failed here? (though there are only 10ish files under 'Shared Documents/{folder1}/{folder2}/') Is there any workaround?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants