Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support NetBsd and OpenIndiana #2811

Open
wants to merge 10 commits into
base: devel
Choose a base branch
from

Conversation

mstone2001
Copy link

This will be a combined 2 + 3 for my OS Pull request. After testing it out, NetBsd requires all of the changes I made for OpenIndiana regarding the SIGINT problem with waitpid, so the actual differences in the changes for the two is minimal. So for these two it's best to perform the changes in one pull request to avoid a huge overlap and possible breakage. I wanted to get the initial branch done, then will look at some of the previous comments. I know of two that I need to address, the formatting issues in the code in some places, will run the script to clean those up, and also change the default pam option back to enabled, and add a --enable-native flag to disable pam. This will keep the current default behavior.

Common:

  • g_sck_get_peer_cred: Added NetBSDs and OpenIndiana's flavor of this.
  • Getting strange EINTRs, standard way of handling this is to retry on EINTR. Modified the g_waitpid function to do the automatic retry. This resolved a number of strange errors. This should be a standard way to handle EINTR, so I did not #ifdef it. I tried it on Linux platforms and did not see any issue with it. Also modified a few calls doing a direct waitpid to call the g_waitpid instead.
  • Modified alarm registration to also handle SIGINT. Again, believe this should not need to be #ifdef, but could be.
  • While testing, added an extra parameter to the new xwait program to allow the wait time to be configurable.

NetBSD:

  • g_sleep: NetBSD has a quirk where you can't usleep for more than 1,000,000 microseconds( 1 second ), so created a hybrid sleep, usleep method for netbsd.
  • service scripts are the same as for freebsd, added checks for this to install.
  • Note I used pkgsrc to install all required libraries. Also note that building against native BSD pam libraries causes undefined symbols, so I built against openpam.

OpenIndiana:

  • Have to re-register for SIGCHLD after signal is handled.
  • Build flags required -D_XOPEN_SOURCE=600 so posix calls are allowed. Also need this for xorgxrdp.
  • Added manifest and method files for SMF for services.
  • Note this code should work on Solaris, but I do not have access to any Solaris environments so can't test.
  • Commented out line specifying config file when running Xorg, doesn't work on OpenIndiana.
  • One issue. I could not get PAM to work on OpenIndiana, I had numerous problems. It looks like enable-pam defaults to yes, and I couldn't find a way to turn it off. So I changed enable-pam to default to no.

@matt335672
Copy link
Member

@mstone2001 - thanks for this.

While I was reading down, I remembered the mess that the signals are in at the moment, and a while ago thinking how hard it would be to fit Sys-V semantics into xrdp.

Your comments above are a good reminder that this really needs to be fixed, in particular, your paragraph starting Getting strange EINTRs.

My apologies if you know a lot of the details below - I suspect you do. I'm trying to fill in the gaps for other readers of this PR.

xrdp uses signal() to specify signal handlers. This is a really old interface specified by ANSI C and isn't portable. A reasonably good summary of UNIX signal handling flavours can be found here.

Linux signal handling (see system(7)) tends to follow BSD semantics by default, which means that porting from Linux to FreeBSD has been relatively straightforward. Solaris is not however BSD-based (not since version 1.x at any rate). The default action of signal() is different and requires workarounds such as the re-establishing of SIGCHLD handlers as commented above. Note that these workarounds may not be reliable as there is often a gap between a signal being recieved and the handler being re-established where the signal handler is set to the default.

I think a better way to resolve this, although it will have an impact on this PR, is to replace the use of signal() within xrdp with sigaction() instead. This is part of POSIX 2008 and is supported by Linux, *BSD, Solaris and MacOS. It's also well-specified, so all platforms should behave in the same way.

@mstone2001 - if I get rid of signal() in the devel branch, would you be happy to rebase this work on that instead? I think it may make things a lot easier on the future, particularly in terms of maintenance.

@matt335672
Copy link
Member

See also #2813 which turned out to be a lot simpler than I expected.

@matt335672
Copy link
Member

Hi @mstone2001

I've got an OpenIndiana VM up and running. It looks a lot like the Solaris 10 I remember, including some old friends like the svcs and svcadm commands.

Your patches above don't seem to have the defines in to get the right features from the includes. I've got a reasonable distance with this patch, which I've based on your mega-patch in #2803, and the OpenIndiana standards(7) manpage.

diff --git a/common/os_calls.c b/common/os_calls.c
index 9c23d8dd..ecec28a3 100644
--- a/common/os_calls.c
+++ b/common/os_calls.c
@@ -36,6 +36,7 @@
 #if defined(sun) || defined(__sun)
 #define ctid_t id_t
 #endif
+#include <limits.h>
 #include <unistd.h>
 #include <errno.h>
 #include <netinet/in.h>
diff --git a/configure.ac b/configure.ac
index 414ce3b9..5ca73a15 100644
--- a/configure.ac
+++ b/configure.ac
@@ -39,6 +39,9 @@ case $host_os in
        *darwin*)
                macos=yes
                ;;
+       *solaris*)
+               solaris=yes
+               ;;
 esac
 
 AM_CONDITIONAL(LINUX, [test "x$linux" = xyes])
@@ -46,6 +49,7 @@ AM_CONDITIONAL(FREEBSD, [test "x$freebsd" = xyes])
 AM_CONDITIONAL(OPENBSD, [test "x$openbsd" = xyes])
 AM_CONDITIONAL(NETBSD, [test "x$netbsd" = xyes])
 AM_CONDITIONAL(MACOS, [test "x$macos" = xyes])
+AM_CONDITIONAL(SOLARIS, [test "x$solaris" = xyes])
 
 AC_CHECK_SIZEOF([int])
 AC_CHECK_SIZEOF([long])
@@ -205,6 +209,10 @@ AX_GCC_FUNC_ATTRIBUTE([format])
 AX_TYPE_SOCKLEN_T
 AX_CFLAGS_WARN_ALL
 AX_APPEND_COMPILE_FLAGS([-Wwrite-strings])
+if test x$solaris = xyes; then
+    AX_APPEND_COMPILE_FLAGS([-D_XOPEN_SOURCE=600 -D__EXTENSIONS__])
+fi
+
 
 AM_COND_IF([LINUX],
   [AX_APPEND_COMPILE_FLAGS([-Werror])]) # bsd has warnings that have not been fixed yet

I'm out of time today, but I'll look further next week.

@mstone2001
Copy link
Author

mstone2001 commented Oct 6, 2023

@matt335672 Wow, thats great. It will be good to have a second pair of eyes looking at this. Has been a long time since I developed on Solaris. Updating to the Posix calls should definitely help. Strangely I found the SIGINT behavior to occur on NetBSD as well after I retested it without the OpenIndiana fixes. As for the flags, I will review them. I had experimented a couple of times during the patch. I'll look at rebasing to your fix.

@matt335672
Copy link
Member

It's all coming back to me, although somewhat slowly. I remember struggling a lot with the Solaris 10 gcc 3.0.3 compiler and the other antiquated tools on Solaris 10. That's quite a while ago now!

I'll see what minimum set of changes I need to get to a point where make check works OK on Solaris/Open Indiana. It shouldn't be much I hope. Could we maybe rebase on that and then retry NetBSD?

Regardless of how this pans out, I want to make sure you get most of the credit for this work.

Can you explain the SIGINT problem a bit more? I don't follow what you've experienced. It may be a bit more fiddling with the signal setup will fix it.

@matt335672
Copy link
Member

I've added tests for the updated signal calls now which should test for the problems you've encountered above with signals. This is now merged into devel

I've also got a branch in my repo which is based on devel:-

https://github.com/matt335672/xrdp/tree/oi_changes

In this branch, make check works fine on Openindiana, so I think that should address all your signal issues above. If it doesn't, we should be able to devise failing tests for the failure conditions. The fixes can then be verified as being necessary and sufficient.

I'm going to look into your PAM issues too. I've got some dim memories in my head of necessary changes in this area, related to the conversation function. There's a tool xrdp-authtest which is built in the build. This might help to sort this out.

I'm currently grappling with NetBSD. It's a bit alien to me.

@mstone2001
Copy link
Author

@matt335672 Thanks, I will take a look at rebasing to your changes. Regarding the SIGINT problem, what I found was the call to waitpid and alarm would get interrupted. Since it didn't appear the loop expected this to happen, it would get into a weird state. I checked a bunch of forums, and the solution that seemed to be standard in all cases was just to retry the op after getting interrupted. So I modified the method such that if the call returns error EINT, just retry the operation. This seemed to eliminate all of the problems I was seeing, and I believe it is always safe to do this. I found this to occur both on NetBSD and OpenIndiana. Really not sure why they exhibit similar behavior, it surprised me when I discovered it.
Just to explain my NetBSD setup, a manilla NetBSD install gives you practically nothing. After setting up NetBSD, I use pkgsrc in order to get a real package manager and install all of the things required for development. Getting Xorg to work is a bit of a trick as well. They use modular-Xorg, so there is no monolithic Xserver package. I have a list of packages I installed to get my Xorg setup working. I didn't have much luck with the bundled Xorg shipped with NetBSD. If you need some help with this let me know, NetBSD doesn't have as large a user/developer base as FreeBSD does, so the edges are rough. Once I take a look at the OpenIndiana part of things I'll rebuild everything on NetBSD and see if it works.

@matt335672
Copy link
Member

Thanks - that's useful.

I understand the waitpid() instance and I've now added a test which invokes precisely this situation - a SIGALRM is received while the program is sat in a waitpid() call. The POSIX docs are a bit muddy on this, but if a signal is received with a SA_RESTART handler, waitpid() should be restarted. The test makes this explicit.

I'm not keen on adding a blanket restart where not required. If an unexpected signal is received we need to know about it, and a retry in this case may not be helpful.

I still don't understand what is happening with alarm() though, and why you've modified g_set_alarm(). Any help on that would be useful.

I will most definitely need your help with NetBSD I think!

@matt335672
Copy link
Member

I've sorted out PAM, I think. At least the authtest program works OK now. I haven't tried running up yet.

Log an info message instead of error if waitpid returns ECHILD.

Need to re-register for SIGCHLD on Solaris.

Retry on EINTR.

Retry SIGINT in waitpid.

Fix issues with SIGINT on solaris.

Add -w option to waitforx to configure wait time. Fix call to usleep for NetBSD, where value must be < 1000000

Update flags for Solaris.

Support Additional OS.

Update xrdp to support netbsd, openindiana.

Support Additional OS.

Update xrdp to support netbsd.
@mstone2001
Copy link
Author

So I synced my branch with the latest development branch and have pulled in the sigaction changes. I'll retest and see if the behavior I ran into no longer occurs. If so, then the change to retry the SIGINT should not be necessary, although I'd still like to have all the waitpid calls make one call into os_calls.c. I like the pattern you have established having the os call wrappers as it really helps to localize any platform specific changes. And in case I do run into any problems I can localize the fix here.

@matt335672
Copy link
Member

See how the testing goes. The retry in g_waitpid() should not be necessary - any problems in this area should be picked up by the unit tests as I think it's important with signal handling to really understand what is going on. Having the unit tests codifies the way we are using signals.

Agreed that it makes more sense to call into wrappers where present, although we are making an effort to move away from unnecessary wrappers for functions in the C99 standard (e.g. malloc, strlen, etc).

A couple more comments:-

  1. As we all do, you've added a few unnecessary spaces at the end of lines in places while editing. The github diffs make these reasonably clear but you can also use something like find . \( -name Makefile.am -o -name \*.[ch] \) -exec grep -n ' $' {} +
  2. You've added a fair few logging calls to os_calls.c This is fine generally (I'm sure some are missing anyway), but it shouldn't be necessary to have a LOG() and LOG_DEVEL() for the same message(?)
  3. Looking at the NetBSD sleep got me thinking. Using sleep for this is probably a bad idea due to possible interactions with SIGALRM (see this page [opengroup.org]). Also the usleep we are using has similar issues and is deprecated in POSIX 2001. Could you consider providing a common implementation for all platforms using nanosleep? BTW this function is interruptible by signals, but can be coded to cope with this. I'll be happy to add some unit tests for it for you.

Particularly on 2) and 3) above feel free to come back at me.

@mstone2001
Copy link
Author

mstone2001 commented Oct 13, 2023

I retested the changes based on the updated code. The change I made for SIGALRM is indeed no longer required, so that is now removed. However, I am still having a problem with the waitpid. What appears to happen is, shortly after the waitpid call is made, it returns EINT, and this causes xrdp-sesman to think that the xrdp session died. This only seems to occur on netbsd/openindiana. I will add a platform specific #ifdef here for now for the retry operation since it only seems to be necessary on these platforms. Here is a log segment of what I see:

[2023-10-13T15:54:06.172-0400]` [INFO ] Found X server running at /tmp/.X11-unix/X10
[2023-10-13T15:54:06.196-0400] [INFO ] Starting X server on display 11: Xorg :11 -auth .Xauthority -config xrdp/xorg.conf -noreset -nolisten tcp
[2023-10-13T15:54:07.209-0400] [ERROR] waitpid returned Interrupted system call
[2023-10-13T15:54:07.217-0400] [WARN ] wait for pid 28230 returned unknown result Interrupted system call
[2023-10-13T15:54:07.222-0400] [ERROR] waitforx failed with unknown reason
[2023-10-13T15:54:07.226-0400] [ERROR] An error occurred waiting for the X server
[2023-10-13T15:54:07.264-0400] [ERROR] waitpid returned Interrupted system call
[2023-10-13T15:54:07.271-0400] [INFO ] Session on display 11 has finished.
[2023-10-13T15:54:07.276-0400] [INFO ] waitpid returned No child processes

As for the other items, for 1, yes, I need to setup a better dev environment on these platforms, right now I just use vi, and that seems to cause formatting problems like you saw. I will clean that up. For 2, Yes, I added a lot of logging when I was trying to figure out exactly where the point of failure was. I will review. For 3, yes, that makes sense. I can look at updating the code to use nano sleep instead.

@matt335672
Copy link
Member

I'll try to reproduce this from your branch next week and track down what is going on there. I've got a test for SIGHUP and waitpid in the make check stuff. That's passing on my branch, so I've either got the test wrong or something else is going on. I'm not happy with adding the retry when we have a test which indicates it's not necessary.

I use gvim a lot and that's pretty easy to set up to work with xrdp code. I can post my config if that would help. It's not big.

@mstone2001
Copy link
Author

So I believe I have taken care of 1 + 2 + 3. I also ran the script to run the astyle program as documented in the scripts directory to cleanup the source indenting/formatting issues. The nano sleep should take care of the netbsd issue with usleep. I kept the g_sleep call taking milliseconds, and did math to populate the timespec struct required by the nanosleep call. It is much cleaner that the original workaround I had for netbsd. I will also look at the test and see if I can figure out exactly what is going on. It appears something causes an Interrupt on these platforms, but I'm not sure what it is.

@mstone2001
Copy link
Author

mstone2001 commented Oct 14, 2023

Oh I should add if you are looking at the repository, I have put an #ifdef in the g_waitpid to only do the retry on NetBsd and OpenIndiana( sun ). So if you want to run without the retry, you can just comment that additional logic out.

@mstone2001
Copy link
Author

mstone2001 commented Oct 15, 2023

As per the previous comments, I have also changed the --enable-pam option to --enable-nopam, since pam should be the default. --enable-nopam will now use the verify_user.c for user authentication. If nothing is specified, pam will be used. If you worked out the pam issues and feel you can commit them, I can merge them into my branch and test on my config.

Michael Stone and others added 8 commits October 15, 2023 12:35
Run astyle to fix code formatting.
Use nanosleep instead of usleep for cross platform compatibility.
Remove extraneous devel logging.
…x code formatting. Use nanosleep instead of usleep for cross platform compatibility. Remove extraneous devel logging.

Check = xno instead of != xyes

Change --enable-pam to --enable-nopam.  Pam is the default.
@mstone2001
Copy link
Author

My #ifdef caused a compile error due to an unused label. Committed a fix.

@matt335672
Copy link
Member

Thanks,

I'd hoped to look at this yesterday, but I got somewhat bogged down in the improvements I'm making to UTF processing in #2794. Hopefully I'll manage later in the week.

At some point we'll need to do some re-arranging and rebasing of these commits prior to merging. It should be obvious when we're in a place to do that.

@mstone2001
Copy link
Author

No problem. When comes time you can let me know what you'd like me to do to cleanup the branch, this was my first time contributing to a project, so I am sure I did not do things in the most efficient manner. Also, my other two pull requests should be ready to go. Let me know if they need additional work or if you think they can be merged.

@matt335672
Copy link
Member

In terms of getting a PR ready for committing, the best way if possible is to get things re-organised so that :-

  1. each commit in the PR does one thing which is clearly explained in the commit comment.
  2. The commit sequence is based on the head of devel, or very near it.

I find git rebase -i is an extremely useful tool for this.

In terms of your PRs, let's look at the Gentoo one first as it's quite simple, and then move back to this one. The MacOS one will require inputs from others. I'll jump over to #2807 for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants