Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azcopy sync on IOT edge blob #1031

Closed
abhaynahar opened this issue Jun 1, 2020 · 63 comments
Closed

Azcopy sync on IOT edge blob #1031

abhaynahar opened this issue Jun 1, 2020 · 63 comments

Comments

@abhaynahar
Copy link

Which version of the AzCopy was used? - 10.4.3

Which platform are you using? Mac and Linux

What command did you run?

./azcopy sync "/Home/data" "http:/127.0.0.1:11002/exptest/demo?[SAS]"

What problem was encountered?

When trying syn files from local system into iot edge blob storage using azcopy sync
I get the following error
INFO: Cannot infer destination location of http://127.0.0.1:11002/exptest/demo?[SAS]. Please specify the --from-to switch. Valid values are two-word phases of the form BlobLocal, LocalBlob etc. Use the word 'Blob' for Blob Storage, 'Local' for the local file system, 'File' for Azure Files, and 'BlobFS' for ADLS Gen2. If you need a combination that is not supported yet, please log an issue on the AzCopy GitHub issues list.

error parsing the input given by the user. Failed with error Unable to infer the source '/Home/data' / destination 'http://127.0.0.1:11002/exptest/demo?[SAS]

I tried adding --from-to switch then the error I get is
Error: unknown flag: --from-to

Then I created a local host entry on the linux vm
vi /etc/hosts
and added the following line
127.0.0.1 exptest.blob.core.windows.net

Then the sync command I used was
./azcopy sync "/Home/data" "https://exptest.blob.core.windows.net:11002/demo?[SAS]"

the output was
3 Files Scanned at Source, 0 Files Scanned at Destination and the program was stuck here for ever

How can we reproduce the problem in the simplest way?

Follow the instructions here to https://docs.microsoft.com/en-us/azure/iot-edge/how-to-deploy-blob launch a iot edge blob on a linux VM
The log into the VM and try to sync a file into blob storage running on the VM

./azcopy sync "/Home/data" "http:/127.0.0.1:11002/exptest/demo?[SAS]"

Have you found a mitigation/solution?

NO

Any help is much appreciated

@InteXX
Copy link

InteXX commented Jun 24, 2020

@abhaynahar

Any luck on this? I'm running into the same problem. It may be because we're trying to sync to the emulator and not a real storage account.

I'll test for that tomorrow and let you know what I find out.

@InteXX
Copy link

InteXX commented Jun 24, 2020

@abhaynahar

It's definitely a bug when connecting to an emulator:

func inferArgumentLocation(arg string) common.Location {
  if arg == pipeLocation {
    return common.ELocation.Pipe()
  }
  if startsWith(arg, "http") {
    // Let's try to parse the argument as a URL
    u, err := url.Parse(arg)
    // NOTE: sometimes, a local path can also be parsed as a url. To avoid thinking it's a URL, check Scheme, Host, and Path
    if err == nil && u.Scheme != "" && u.Host != "" {
      // Is the argument a URL to blob storage?
      switch host := strings.ToLower(u.Host); true {
      // Azure Stack does not have the core.windows.net
      case strings.Contains(host, ".blob"):
        return common.ELocation.Blob()
      case strings.Contains(host, ".file"):
        return common.ELocation.File()
      case strings.Contains(host, ".dfs"):
        return common.ELocation.BlobFS()
      case strings.Contains(host, benchmarkSourceHost):
        return common.ELocation.Benchmark()
        // enable targeting an emulator/stack
      case IPv4Regex.MatchString(host):               // <-- BUG here
        return common.ELocation.Unknown()             // <-- This is what gets returned in case of storage emulator URL
      }

      if common.IsS3URL(*u) {
        return common.ELocation.S3()
      }
    }
  }

  return common.ELocation.Local()
}

Source

For a moment I was hopeful that AzureCLI might work for this, but it turns out that AzureCLI uses AzCopy under the hood.

So we wait.

@thisiscmt
Copy link

FYI, the code in validators.go is also the cause of issue 713. Sure would be nice to get this fixed, it could potentially clear out several emulator-related issues.

@promisepreston
Copy link

I am currently experiencing this exact same issue. Really frustrating.

@gapra-msft
Copy link
Member

Hi, Apologies for the delayed response here. Are you still unable to sync to emulator? I just attempted to run a similar command and did not hit the failure to parse the URL.

@InteXX
Copy link

InteXX commented Nov 17, 2023

@gapra-msft

I'm still getting an error when attempting to sync to Azurite on a remote server (on my LAN):

azcopy sync "D:\Dev\Data" "http://server5:10000/test/data"

INFO: The parameters you supplied were Source: 'd:\Dev\Data' of type Local, and Destination: 'http://server5:10000/test/data' of type Local
INFO: Based on the parameters supplied, a valid source-destination combination could not automatically be found

I also tried:

azcopy sync "D:\Dev\Data" "http://server5:10000/devstoreaccount1/test/data"

...as that syntax is indicated in the auto-generated code by Storage Explorer. Same error.

Note that the usage guidance doesn't provide an example for syncing to an emulator, i.e. what the URL should contain:

azcopy sync

Sync an entire directory including its subdirectories (note that recursive is by default on):

  • azcopy sync "/path/to/dir" "https://[account].blob.core.windows.net/[container]/[path/to/virtual/dir]"
    or
  • azcopy sync "/path/to/dir" "https://[account].blob.core.windows.net/[container]/[path/to/virtual/dir]" --put-md5

In any case, the relevant source code hasn't changed since my original post on this thread of Jun 24, 2020. The line numbers have changed, yes, but not code surrounding the bug.

@gapra-msft
Copy link
Member

@InteXX could you try manually specifying the --from-to CLI parameter? In this case it would be LocalBlob

@InteXX
Copy link

InteXX commented Dec 11, 2023

@gapra-msft

azcopy sync --from-to "D:\Dev\Data" LocalBlob

Results in:

unknown flag: --from-to

@gapra-msft
Copy link
Member

@InteXX sorry I was a little unclear. Could you run this?

azcopy sync "D:\Dev\Data" "http://server5:10000/devstoreaccount1/test/data" --from-to=LocalBlob

@InteXX
Copy link

InteXX commented Dec 11, 2023

Same result:

unknown flag: --from-to

@InteXX
Copy link

InteXX commented Dec 11, 2023

It appears I'm running an old version: 10.4.3

I'm working out how to update...

@InteXX
Copy link

InteXX commented Dec 11, 2023

Cannot perform sync due to error: Login Credentials missing. No SAS token or OAuth token is present and the resource is not public

@gapra-msft
Copy link
Member

@InteXX if you are running against Azurite, you need to provide SAS or OAuth token or make your resource in Azurite public

@InteXX
Copy link

InteXX commented Dec 11, 2023

Cannot perform sync due to error: cannot list files due to reason GET http://server5:10000/devstoreaccount1
--------------------------------------------------------------------------------
RESPONSE 400: 400 The value for one of the HTTP headers is not in the correct format.
ERROR CODE: InvalidHeaderValue
--------------------------------------------------------------------------------
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Error>
  <Code>InvalidHeaderValue</Code>
  <Message>The value for one of the HTTP headers is not in the correct format.
RequestId:efe0df55-5790-4985-8b9e-856b23cb109c
Time:2023-12-11T22:18:10.816Z</Message>
  <HeaderName>x-ms-version</HeaderName>
  <HeaderValue>2023-08-03</HeaderValue>
</Error>
--------------------------------------------------------------------------------

I seem to remember this from a while back, from somewhere else.

How to fix that header value?

@gapra-msft
Copy link
Member

Are you running the latest Azurite version? I believe its v3

@InteXX
Copy link

InteXX commented Dec 11, 2023

I'm running v3.28.0. Is that the latest?

I found that error message problem: Azure/azure-sdk-for-net#14257

@gapra-msft
Copy link
Member

Yes, this should be the latest version.

@gapra-msft
Copy link
Member

gapra-msft commented Dec 12, 2023

I run my Go code in GoLand, but VSCode has extensions to build and run go code as well. Since Azurite has the same static key, for now I should just be able to generate the URL for you to test. This should last a day. Just replace the 127.0.0.1 with server5
http://127.0.0.1:10000/devstoreaccount1/data/?se=2023-12-13T00%3A41%3A49Z&sig=L3lnQmsfXMYk6gOB6gLSfyOYLSCWOOIuxtZHCU5axQQ%3D&sp=rwl&spr=https%2Chttp&srt=sco&ss=b&st=2023-12-12T00%3A41%3A49Z&sv=2020-02-10

@InteXX
Copy link

InteXX commented Dec 12, 2023

Thanks, that got me started. At least there's no error message now.

But the sync hangs on the last of the three files in that folder—and it's always a different file. I made sure the files aren't in use, and I even copied them to a different source folder. Same result.

Odd...

@gapra-msft
Copy link
Member

I can try and take a look at the log to see if theres anything causing a hang

@InteXX
Copy link

InteXX commented Dec 12, 2023

I'm getting the memory error again.

Could you regenerate that SAS and this time exclude Service and Object from the resources?

@InteXX
Copy link

InteXX commented Dec 12, 2023

@gapra-msft
Copy link
Member

Is this the extension?

https://marketplace.visualstudio.com/items?itemName=golang.go

Yes that should be right. You need the code I pasted above saved as a file named main.go, then also a file called go.mod with the following contents to grab dependencies.

module awesomeProject

go 1.20

require github.com/Azure/azure-sdk-for-go/sdk/storage/azblob v1.1.0

require (
github.com/Azure/azure-sdk-for-go/sdk/azcore v1.6.0 // indirect
github.com/Azure/azure-sdk-for-go/sdk/internal v1.3.0 // indirect
golang.org/x/net v0.10.0 // indirect
golang.org/x/text v0.9.0 // indirect
)

@gapra-msft
Copy link
Member

This should also have more info on how to set up VSCode for Go https://code.visualstudio.com/docs/languages/go

@InteXX
Copy link

InteXX commented Dec 12, 2023

could not import github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blob (no required module provides package "github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blob"

@InteXX
Copy link

InteXX commented Dec 12, 2023

image

@gapra-msft
Copy link
Member

Could you run 'go mod tidy' in the directory where the go.mod and main.go files are?

@InteXX
Copy link

InteXX commented Dec 12, 2023

Got it. The code is running (thanks), but now I'm getting the auth error again.

@InteXX
Copy link

InteXX commented Dec 12, 2023

What's the difference between the SAS generated with this code and the SAS generated by Azure Storage Explorer?

@gapra-msft
Copy link
Member

What's the difference between the SAS generated with this code and the SAS generated by Azure Storage Explorer?

The only obvious difference I see is the SAS version (sv query parameter) being used to generate the SAS, Go SDK uses 2020-02-10 and Storage Explorer is 2023-01-03

@InteXX
Copy link

InteXX commented Dec 12, 2023

It's worth noting that the one time I did get it to run without an error (link here), azcopy sync displayed the same symptom as in the original report by @abhaynahar, to wit: application hang.

When I try now using that same SAS, though, I get the memory error:

panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x0 addr=0x18 pc=0xe5bd12]

goroutine 109 [running]:
github.com/Azure/azure-storage-azcopy/v10/cmd.(*blobTraverser).parallelList.func1({0xf21d00?, 0xc0005a0260}, 0xc0005a0270, 0xc000088620)
        D:/a/1/s/cmd/zc_traverser_blob.go:332 +0x2b2
github.com/Azure/azure-storage-azcopy/v10/common/parallel.(*crawler).processOneDirectory(0xc000318780, {0x132b990, 0xc0000922c0}, 0xf)
        D:/a/1/s/common/parallel/TreeCrawler.go:177 +0x338
github.com/Azure/azure-storage-azcopy/v10/common/parallel.(*crawler).workerLoop(0xc000318780, {0x132b990, 0xc0000922c0}, 0x0?, 0x0?)
        D:/a/1/s/common/parallel/TreeCrawler.go:104 +0xb1
created by github.com/Azure/azure-storage-azcopy/v10/common/parallel.(*crawler).runWorkersToCompletion
        D:/a/1/s/common/parallel/TreeCrawler.go:93 +0x50

Log:

2023/12/12 20:00:51 AzcopyVersion  10.22.0
2023/12/12 20:00:51 OS-Environment  windows
2023/12/12 20:00:51 OS-Architecture  amd64
2023/12/12 20:00:51 Log times are in UTC. Local time is 12 Dec 2023 11:00:51
2023/12/12 20:00:51 ==> REQUEST/RESPONSE (Try=1/15.7136ms, OpTime=29.5754ms) -- RESPONSE STATUS CODE ERROR
   HEAD http://server5:10000/devstoreaccount1/data?se=2023-12-13T00%3A41%3A49Z&sig=-REDACTED-&sp=rwl&spr=https%2Chttp&srt=sco&ss=b&st=2023-12-12T00%3A41%3A49Z&sv=2020-02-10
   Accept: application/xml
   User-Agent: AzCopy/10.22.0 azsdk-go-service.Client/v1.2.0 (go1.19.12; Windows_NT)
   X-Ms-Client-Request-Id: b4e90dc7-2772-449d-7639-790d342be167
   x-ms-version: 2023-08-03
   --------------------------------------------------------------------------------
   RESPONSE Status: 400 Bad Request
   Connection: keep-alive
   Date: Tue, 12 Dec 2023 20:00:51 GMT
   Keep-Alive: timeout=5
   Server: Azurite-Blob/3.28.0
Response Details: 

2023/12/12 20:00:51 ==> REQUEST/RESPONSE (Try=1/3.5246ms, OpTime=3.6248ms) -- RESPONSE SUCCESSFULLY RECEIVED
   GET http://server5:10000/devstoreaccount1?comp=list&delimiter=%2F&include=metadata&prefix=data%2F&restype=container&se=2023-12-13T00%3A41%3A49Z&sig=-REDACTED-&sp=rwl&spr=https%2Chttp&srt=sco&ss=b&st=2023-12-12T00%3A41%3A49Z&sv=2020-02-10
   X-Ms-Request-Id: [f3c64d5f-dfe1-4970-a10f-e4590d76cae3]

@gapra-msft
Copy link
Member

@InteXX I took a look at the Azurite thread above, were you able to resolve your issue?

@InteXX
Copy link

InteXX commented Jan 23, 2024

Yes, it works now.

If I may, I'd like to suggest an update to the documentation—both online and command line usage—so that others will be less likely to encounter this problem in the future.

Thank you.

@gapra-msft
Copy link
Member

Absolutely, could you share more details on what you'd like updated? I can create a new issue and tag it appropriately.

@InteXX
Copy link

InteXX commented Jan 25, 2024

Certainly, I'd love to.

Firstly, the SAS generator in Azure Storage Explorer generates incorrect URL syntax for a successful azcopy sync session when connecting to Azurite. Granted that's probably beyond the scope of this repo, but users still need to know this fact. The URL that's generated is in the form:

http://<hostname>:10000/devstoreaccount1/data?<SAS>

This is incorrect. The syntax should be instead:

http://devstoreaccount1.<hostname>:10000/data?<SAS>

For this to work the DNS record must exist, e.g. assuming we're hosting Azurite on server5 the hostname will be devstoreaccount1.server5. This can be an entry in the local hosts file as well—as long as the supplied hostname resolves to the machine that's hosting Azurite.

Also, this CLI switch is mandatory:

--from-to=LocalBlob

In the end, my complete successful command was:

azcopy sync "D:\Dev\Data" "http://devstoreaccount1.server5:10000/data/?sv=2018-03-28&spr=https%2Chttp&st=2024-01-25T00%3A26%3A20Z&se=2024-01-26T00%3A26%3A20Z&sr=c&sp=rwl&sig=a0xAslzjcTzZQ55z6RRrNULXGGp2af90xXF42lm3Z2M%3D" --from-to=LocalBlob

That's what is unclear in the documentation, in both online and CLI usage sources... the proper URL syntax to use when connecting locally to Azurite. All of the provided examples assume a remote connection with Azure.

I've searched the online documentation here for any mention of Azurite and I'm coming up empty. Nor am I finding Azurite syntax in the CLI usage text.

These documentation oversights may extend to other azcopy commands as well; I haven't looked. Perhaps a significant overhaul is in order—I'll of course leave that judgment to you and your team.

Let me know if you have any questions about my findings on the matter. Thank you for your consideration.

@gapra-msft
Copy link
Member

Hi @InteXX I've created issue #2565 to track the documentation request. I will get this item prioritized appropriately after speaking with the team. Closing this particular issue as its resolved. Please subscribe to the other issue in case you'd like updates.

@InteXX
Copy link

InteXX commented Feb 5, 2024

That looks good, and thank you, but my supplied URL syntax didn't make it through for some reason. A minor edit is in order.

These URLs:

http://:10000/devstoreaccount1/data?
http://devstoreaccount1.:10000/data?

...should instead be:

http://<hostname>:10000/devstoreaccount1/data?<SAS>
http://devstoreaccount1.<hostname>:10000/data?<SAS>

It appears the brackets are throwing things off. Perhaps wrapping them in code formatting will allow the bracketed text.

@InteXX
Copy link

InteXX commented Feb 5, 2024

Gauri, I've been thinking...

Maybe this isn't a documentation issue at all. Maybe it's an issue internal to azcopy sync itself.

I'll concede that I haven't tested my theory here against every azcopy command, but all of the Azure Storage Exporer-generated syntax I've encountered uses this URL syntax, even when connecting to Azurite:

http://<hostname>:10000/devstoreaccount1/data?<SAS>

azcopy sync is the only command I've seen that requires the alternative:

http://devstoreaccount1.<hostname>:10000/data?<SAS>

Perhaps the documentation (and Azure Storage Explorer) are correct and azcopy sync's internal logic is incorrect? Shouldn't they be consistent?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants