Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--device parameter on the command line not recognized #34

Open
cyclaero opened this issue Jul 28, 2017 · 5 comments
Open

--device parameter on the command line not recognized #34

cyclaero opened this issue Jul 28, 2017 · 5 comments

Comments

@cyclaero
Copy link
Contributor

cyclaero commented Jul 28, 2017

I am testing ASL on a Mac mini late 2014 with Intel Iris graphics hardware running macOS 10.12.6.

./asl-flow --devices outputs:

flow 1.0

Default computation device:
platform = Apple�
device = Intel(R) Core(TM) i5-4278U CPU @ 2.60GHz�

List of all available platforms and their devices:
Platform: Apple�
Number of devices: 2
	Intel(R) Core(TM) i5-4278U CPU @ 2.60GHz�
	Iris�

Now, trying to invoke the flow example with ./asl-flow --device Iris gives:

ASL WARNING: Requested combination of platform(Apple�) and device(Iris) not found! Using:
platform = Apple�
device = Intel(R) Core(TM) i5-4278U CPU @ 2.60GHz�.
Data initialization... Finished
Numerics initialization... Finished
Computing...Finished
Computation statistic:
Real Time = 26.8742; Processor Time = 98.7167; Processor Load = 367.329%

I found out, that the strings of the platform/device information contain trailing '\0' chars, while the parameter strings as delivered by boost do not, and the comparison (device == getDeviceName(queues[i])) on line 106 in aclHardware.cxx fails, since "Iris\0" is different from "Iris".

I changed the lines 501/502 in aslParametersManager.cxx to:

			string pf = vm["platform"].as<string>(); if (pf.back() != '\0') pf += '\0';
			string dv = vm["device"  ].as<string>(); if (dv.back() != '\0') dv += '\0';
			acl::hardware.setDefaultQueue(pf, dv);

And with that in place I can have ASL simulation run by the GPU.

Data initialization... Finished
Numerics initialization... Finished
Computing...Finished
Computation statistic:
Real Time = 13.3571; Processor Time = 1.27338; Processor Load = 9.53341%

EDIT: Now, seeing this issue message, I guess it would be better to remove the trailing '\0' chars from the strings of the platform/device information, since, quite obviously these nul's made it into the diagnostic output of the simulation tools.

BTW: The locomotive example with the default parameters 0.08/1.0/10001 crashes when run on the GPU, while it finishes on the CPU. If I change dx to 0.09, then it finishes on the GPU as well after 1200 s, while it took 20 times more on the CPU at a load of 400 %.

@AvtechScientific
Copy link
Owner

Those trailing '\0' chars are delivered by buggy OpenCL drivers. The problem is that a bad driver produces them and a good driver doesn't - with the same hardware and on the same OS. So you have to check whether they exist and if yes - strip them. We are not sure, however, that we should correct issues caused by bad drivers by making our code more clumsy.

@cyclaero
Copy link
Contributor Author

cyclaero commented Jul 30, 2017

Those trailing '\0' chars are delivered by buggy OpenCL drivers.

It is quite hard for me to understand how this could happen, since any OpenCL driver is supposed to return a C string, i.e. a C array of chars terminated by a nul char, isn´t it?

Do OpenCL drivers on Linux really provide non-terminated arrays of chars? Isn´t it rather the C++ side which gives rise for a lot of confusion about subtle differences between character arrays and strings. My impression comes from searching Google about "C++ string trailing nul". Seems that the confusion started with the switch from C++11 to C++14. On macOS 10.12.6 the code is compiled with the latest clang++ (which is C++11) are you working with a C++14-compiler?

@cyclaero
Copy link
Contributor Author

cyclaero commented Jul 31, 2017

I changed line 1128 of cl.hpp to:

        param->assign(value.begin(), (value.back() != '\0') ? value.end() : value.end()-1);

This takes care of a trailing nuls in case any buggy OpenCL driver insists on delivering it. Doesn't look too clumsy, does it?

The alternative is not to advertise the macOS as a working platform anymore, because without taking care of the nuls in the one or the other form, ASL would be useless on the Mac.

@AvtechScientific
Copy link
Owner

Hi @cyclaero,

an elegant solution, indeed. And most importantly - cl.hpp is not actually ASL, but rather an C++wrapper to OpenCL - so it is the right place to take care of the issue. Do you want to make a Pull Request with this fix (and maybe your first solved issue as well)? We'll need to test them on a non Mac platform though...

Thank you!

@cyclaero
Copy link
Contributor Author

cyclaero commented Jul 31, 2017

I had a closer look on the upstream C++-API headers on https://github.com/KhronosGroup/OpenCL-CLHPP. And it turned out, that the issue has been resolved in cl2.hpp on exactly the same place which I already suggested, however with a less C'ish and more C++'ish solution. So, I decided to go for back porting the upstream cl2 fix.

I cloned your repository to my GitHub account, I updated cl.hpp to the upstream version 1.2.9 and then back ported 2 fixes from cl2.hpp to the present cl.hpp for the specialized GetInfoHelper for string params, namely fix the trailing '\0' issue and the empty string result issue. Chances are that this would simply work on the vast majority of platforms since these fixes contain only code snippets which came from the Khronos Group.

see: cyclaero@a38f1d4

If this approach is OK for you, then I would submit a pull request, if not, then please inform.

AvtechScientific added a commit that referenced this issue Aug 2, 2017
updated cl.hpp and back ported fixes from cl2.hpp; fixes issue #34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants