Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: proposal for filtering WalkerHandler directories #154

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

aturley
Copy link
Member

@aturley aturley commented Aug 14, 2019

This proposal proposes that `WalkerHandler.apply` should return a list
of directories to walk, rather than modifying the list that is passed
to it.
@dipinhora
Copy link

Have you considered changing the dir_entries from an array to a list (in WalkerHandler, Directory.entries and other places) as an alternative?

@aturley
Copy link
Member Author

aturley commented Aug 15, 2019

@dipinhora using a list would get around the awkwardness around removing items from the array. Two things come to mind with that approach:

  1. What would the performance impact be? I think using an array is probably going to be faster than a list in terms of creation and traversal, though not necessarily in terms of removal.
  2. The idea of modifying a structure that is passed into the function rather than returning a value feels a little strange. As far as I can recall, this is the only place in the stdlib where we do something like that. So whether we're passing in an array or a list, I don't think the handler code should be modifying it.

@dipinhora
Copy link

@aturley

  1. re: performance, the list create code in Directory.entries isn't written with performance in mind currently as it is not pre-allocating an array of whatever number of entries it will need (https://github.com/ponylang/ponyc/blob/master/packages/files/directory.pony#L92). Given that, I don't think that the list creation using an array will necessarily be faster. Agreed that traversal using an array would be faster. Either way, I'm not sure that how performant this code needs to be but it would likely require thinking more holistically about it.
  2. I find it amusing that you mentioned the performance concern in item 1 and then didn't want to modify the list in place for this item which would likely be more performant when using a list as compared to creating a new list or array. Regardless, I'm ambivalent to whether the handler code modifies it in place or has to make a copy but I would prefer to keep the option of being able to modify in place.

Probably a better option altogether would be to have a function that provides a true/false or similar for filtering each entry when passed to it instead of passing the full list to the handler so there's no modification of the list nor the need to create a new list either. When the function returns a true the walk happens and when the function returns a false the walk is skipped.

@aturley
Copy link
Member Author

aturley commented Aug 16, 2019

@dipin

There are tradeoffs here between performance and various abstractions. If we truly wanted the best performance we would write it in assembly. But, since we've chose to use Pony I'm guessing we're all at least implicitly willing to make certain performance tradeoffs. One of the questions here is "which tradeoffs between performance and ease of use and consistency are we willing to make?"

Can you provide an example of what you're thinking of with the true/false filtering? I have an idea of how that might work but I'd like to see if we're thinking of the same thing.

@dipinhora
Copy link

@aturley Yes, of course. Sorry if my earlier comment came across as antagonistic. That wasn't my intent.

The following is something along the lines of what I meant by the true/false filtering:

class MyWalkerHandler is WalkerHandler
  fun ref apply(dir_path: FilePath, dir_entry: String): Bool =>
    let skip_directories_with_string = "IGNORE_ME"
    if dir_entry.contains(skip_directories_with_string) then
      false
    else
      true
    end

with the following versions of WalkerHandler interface:

interface WalkerHandler
  """
  A handler for `FilePath.walk`.
  """
  fun ref apply(dir_path: FilePath, dir_entry: String): Bool

and Filepath.walk:

fun val walk(handler: WalkHandler ref, follow_links: Bool = false) =>
    """
    Walks a directory structure starting at this.

    `handler(dir_path, dir_entry)` will be called for each directory
    starting with this one. The handler returns true/false for which subdirectories
    to walk or skip.
    """
    try
      with dir = Directory(this)? do
        var entries: Array[String] ref = dir.entries()?
        for e in entries.values() do
          if WalkHandler(this, e) then
            let p = this.join(e)?
            let info = FileInfo(p)?
            if info.directory and (follow_links or not info.symlink) then
              p.walk(handler, follow_links)
            end
          end
        end
      end
    else
      return
    end

@SeanTAllen
Copy link
Member

I've never used the API in question. I know that @aturley has used it a lot. I trust his judgment on the changes that are needed. I'm sure we have a lot of APIs in the standard library that could use some love.
Particularly in the file related sections of the standard library.

Base automatically changed from master to main February 8, 2021 22:15
@SeanTAllen
Copy link
Member

@aturley are you interested in continuing with this RFC?

@ponylang-main ponylang-main added the discuss during sync Should be discussed during an upcoming sync label Feb 9, 2022
@SeanTAllen SeanTAllen removed the discuss during sync Should be discussed during an upcoming sync label Feb 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants