New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Erroneous string format using mosquitto_sub/_pub #15123
Comments
It is the encoding stored in Unfortunately, console / Windows Terminal windows for PowerShell still default to the active OEM code page (even though the |
If I now add in that line to my UTF-8 script:
the output now looks like this:
but when I do the following in another terminal window:
I still receive this:
What is curious here, for me, is the inconsistency between the 'echo' (which is in fact superfluous, we could change that line to $message) and what I receive from the MQTT subscription using mosquitto_sub. In Linux this does not happen - the equivalent script:
gives me the same from the echo as it does from the mosquitto_sub. Is it the mosquitto_sub/pub the issue here? If so I will get in touch with them. |
What I meant is: |
Modified script:
mosquitto_sub is still giving me erroneous output:
I mentioned that in Linux this inconsistancy between 'echo' and what I recieve from mosquitto_sub doesn't occur - I would also like to mention that when I run pwsh on Linux using the above script (without the call to [Console]) I obtain the same, that is, inconsistent and erroneous mosquitto_sub output. I would expect the same binaries on Linux (mosquitto_sub, mosquitto_pub) to behave identically in the two cases of using bash and pwsh to call them. |
On Windows (only), display output can work properly even when captured output due to an encoding mismatch does not (this difference across platforms is outside PowerShell's control and won't go away). For correct programmatic processing (capturing in a variable, sending through the pipeline to another command), the program's actual output encoding must match So the questions are (I know nothing about
The fact that your output shows If true, it would share this - nonstandard - behavior with Python - or perhaps it Anyway, to (temporarily) use ANSI encoding, run the following (you should restore the original settings afterwards): [Console]::OutputEncoding = [System.Text.Encoding]::GetEncoding([cultureinfo]::CurrentCulture.TextInfo.ANSICodePage)
# To switch to ANSI in *all* aspects
$OutputEncoding = [Console]::InputEncoding = [Console]::OutputEncoding = [System.Text.Encoding]::GetEncoding([cultureinfo]::CurrentCulture.TextInfo.ANSICodePage) Note, however, that if you had truly switched |
The short of it:
In the meantime, you can use helper function # Download and define advanced function Invoke-WithEncoding in the current session.
irm https://gist.github.com/mklement0/ef57aea441ea8bd43387a7d7edfc6c19/raw/Invoke-WithEncoding.ps1 | iex Using a Python command as an example, you could then use the following, which - thanks to the # Outputs *already-decoded* output, so if the output *prints* fine, then *decoding* worked fine too.
PS> Invoke-WithEncoding { python -c "print('ºC')" } -Encoding Ansi -WindowsOnly
ºC Note that A similar function focused on diagnostic output is As an aside: You may have just used this char. as an example or perhaps you chose it for better appearance, but note that the symbol you're using is |
I should mention one more solution, available since Windows 10 but still in beta as of this writing: You can switch to UTF-8 system-wide, which effectively sets both the OEM and the ANSI code page to UTF-8 ( |
Finally, note that it is possible to make Python use UTF-8, namely by either setting environment variable [Console]::InputEncoding = [Console]::OutputEncoding = [System.Text.Utf8Encoding]::new()
$env:PYTHONUTF8=1
(python -c "print('ºC')") # properly decodes the output to 'ºC'
|
It seems that mosquitto_sub and mosquitto_pub are written in C - here is the source for the former, and here for the latter. Part of my issue has been resolved by using your ANSI suggestion - my mosquitto_pub.ps1 file looks like this:
Notepad++ reports this file as Encoding -> UTF-8 My mosquitto_sub.ps1 file looks like this:
Again, Notepad++ reports this file as Encoding -> UTF-8. I run mosquitto_pub.ps1 in a PowerShell 7.1.3 tab of Windows Terminal, and I run mosquitto_sub.ps1 in a second PowerShell 7.1.3 tab of Windows Terminal. The output of mosquitto_pub.ps1 - the echo - looks like this:
While the output of mosquitto_sub.ps1 looks like this:
This seems to have resolved the 'º' issue, however, it does seem as though I'm still losing the quotation marks as the string moves from one to the other. |
I see, @chris-steema. The double quotes disappearing is a separate problem, which, unfortunately, has been a problem with PowerShell's argument-passing to external programs since v1, due to lack of escaping of embedded The workaround for now is to manually mosquitto_pub -h test.mosquitto.org -t tofol/test -m ($message -replace '"', '\"') I presume that this fundamental problem hasn't been fixed to date so as not to break such existing workarounds, but a fix is finally, coming:
Relevant issues and comments:
|
Thank you @mklement0. I closed my instance of Windows Terminal, and when I reopened it the code in my last message didn't work. It seems as though setting the $OutputEncoding is a requirement - for posterity then, correctly working ps1 files in a new instance of Windows Terminal with two open tabs: mosquitto_pub.ps1:
mosquitto_sub.ps1:
|
I'm not sure I understand how
As for a persistent encoding change:
|
No, I was wrong to suggest $OutputEncoding comes into play. Looking closer, I see I get the results I expect with the following two pub/sub ps1 files (saved as UTF-8):
Note that in a new instance of Windows Terminal with two PowerShell 7 tabs, setting OutputEncoding only in the sub ps1 file - with no Console settings at all in the pub file - is sufficient for me to receive the output I expect. I'm not sure of the significance of this with respect to attributions of erroneous behavior to the mosquitto_pub/mosquitto_sub executable files. |
Makes sense, @chris-steema.
I wouldn't call it erroneous, just nonstandard - I presume it was a deliberate decision, as for Python, to use the relatively more widely used ANSI code page over the OEM code page, whose use is limited to consoles. As an aside: It's important to remember that there's no such thing as the ANSI or the OEM code page, given that multiple, language-specific varieties exist and that it is the host system's configuration that determines the active variety. Unfortunately, though, that decision to use ANSI makes use in consoles problematic. As for where the workaround is necessary in your scenario:
|
Good point. I'll see if I can find anything. Thanks.
Yes, the conversion of byte streams to .NET strings I can imagine very clearly. Great stuff, thanks to your patient explanations I think I now have a pretty clear idea of what's going on. Powershell's inability to escape embedded quotes was a red herring for me, as I had imagined that that and what turned out to be an encoding issue were related, which they aren't. As far as I'm concerned we can close this issue, but I won't do so now just in case you'd prefer to keep it open for whatever reason. |
I'm glad to hear the explanations were helpful - this is tricky business, for sure. I think it's fine to close this issue, as I've just posted a question & answer on Stack Overflow that summarizes the problem and the solution. It's probably mostly a hypothetical concern, but note that the solution there uses By contrast, |
A curiosity is the same test done from pwsh (Linux) to PowerShell 7 (Windows) and the reverse.
So from Linux using pwsh I can run mosquitto_pub.sh1 as it is in this message of mine, but when I run mosquitto_sub.sh1 on Windows I get mangled output unless I do:
This is different to the case in which both ps1 files are running on Windows. However, I can't get the reverse to show me correct output - that is, running mosquitto_pub.sh1 on Windows and mosquitto_sub.sh1 on Linux. I've tried a good number of combinations of [Console]::OutputEncoding/InputEncoding, but the 'º' always gets mangled - in fact it gets mangled to the same character you can see in the above $PSVersionTable output after '4.0' of PSCompatibleVersions. P.S. the following output run on Linux using pwsh is interesting:
|
It makes sense to me that Mosquitto uses UTF-8 on Unix-like platforms, which
Unix-like platforms don't use code pages, so this information is purely informative there, I think: the active locale, as reflected in
If you run Invoke-WithEncoding.ps1 -Encoding Ansi { mosquitto_sub -h test.mosquitto.org -t tofol/test } If not, what is it? |
That is curious - does it hang with I suggest inspecting the raw byte output; can you infer what actual encoding is used? sh -c 'mosquitto_sub -h test.mosquitto.org -t tofol/test > out.txt' && Format-Hex out.txt |
It is strange behavior, however, using a different Windows machine (but the same Ubuntu one) it seems to have disappeared, and everything now works as expected without any modifications to |
Intriguing - would be good to understand what that configuration is. Also: if you run everything on the other Windows machine alone, does One possible explanation is that the other Windows machine has system-wide UTF-8 support turned on, which you can verify by opening a |
Yes, this is the case: one machine returns |
That is to be expected: the system-wide UTF-8 supports sets both the OEM and the ANSI code page to Activating system-wide UTF-8 is definitely advisable in general to make encoding problems go away, but it definitely also has the potential to break existing code. For instance, Windows PowerShell scripts that rely on BOM-less text files getting read as ANSI-encoded suddenly interpret such files as UTF-8-encoded, in effect causing all non-ASCII characters to turn into |
As for |
Great! Then the only unexplained event is mosquitto_pub.ps1 running (without Not sure how much energy you have left to work out what's going on here ::smiley:: |
I've run this now, and the contents of out.txt look like this:
Which is to say, correct and expected format. |
Re most recent comment: that suggests that the publisher was a Windows machine with system-wide UTF-8 support. Re the comment before that: 😁 In other words: you need to know the publisher's encoding in order to decode properly. And since Mosquitto appears to have no way to explicitly control the encoding on Windows, you're left with two choices:
|
It may suggest that, but I've run the test a number of times now, and as you can see the pwsh instance in the Ubuntu terminal is reading 'º' as
The machine that returns 850 using cmd.exe |
To narrow this down, let's eliminate incidental factors:
In the first case, PowerShell's behavior makes sense to me, In the second case, make sure that you restore However, I now realize why it didn't work for you: it tried to wait for I've fixed Please try |
|
Thanks. So it looks like everything works as expected now, correct? |
Yes, in these circumstances it does, thank you. We have an app written in .NET Core/5 that runs on a 'Unix-like system' and which collates information by subscribing to MQTT brokers. Our clients use MQTT to relay information from their sensors to our system via that route. In the case that our clients use Windows systems that are not UTF-8 enabled - or in fact any other system which isn't - this issue could present problems to us. At least now we understand exactly where such problems could be coming from ::smiley:: |
Understood, @chris-steema. Personally, I suggest asking the Mosquitto people to implement UTF-8 at least as an opt-in on Windows, via an environment variable and/or command-line option, analogous to what Python has done. I think it's fine to close this issue now. |
Yes, I will consider getting in touch with the Eclipse team. Meanwhile, thank you very much again @mklement0 for all your help. |
My pleasure, @chris-steema; I certainly learned a few things myself. |
Steps to reproduce
Run the following *.ps1 script in a text file with UTF-8 ecoding:
Now in another terminal window, subscribe to the mosquitto topic:
mosquitto_sub -h test.mosquitto.org -t tofol/test
The output is unexpected:
{time:2021-03-30T12:30:24.0266957+02:00, value:3, label:║C}
Expected behavior
Expected behavior is seem by running the following script:
This outputs:
{"time":"2021-03-30T12:31:48.2728626+02:00", "value":33, "label":"ºC"}
Environment data
The text was updated successfully, but these errors were encountered: