Skip to content

demonstrators of different threading technologies for CMS framework

Notifications You must be signed in to change notification settings

amundson/toy-mt-framework

 
 

Repository files navigation

This project is a simple demonstration of using libdispatch as the basis of a HEP multi-threaded framemework

--------------------------------------
COMPILING
1) If you are on linux, you'll need to get and install a version of libdispatch
	https://www.heily.com/trac/libdispatch
which need libkqueue
	http://mark.heily.com/book/export/html/52
2) You will need to modify the Makefile and Makefile_single to point to your copy of the needed third party libraries
3) Then just compile
	make 
	make -f Makefile_single
	
You will wind up with three executables

BusyWaitCalibrate: 
Used to calibrate how long it takes to run a simple integration function. This is used for tests where you want a module to actually do work (as apposed to just sleep).

DispatchDemo:
Processes configuration files to run trivial modules in a way analogous to a full HEP framework. Useful for testing scalability of the system.

SingleThreadedDemo:
Similar to DispatchDemo exception only uses one thread and is implemented in the most trivial (and fastest) way possible. Used for timing comparisons with the same configurations passed to DispatchDemo.

--------------------------------------
TESTING
To see that the program works at all, you can run DispatchDemo using the simple test configuration

./DispatchDemo DispatchProcessingDemo/test_config.json

This should generate an output like (From a Core 2 Duo MacBookPro)

demo::PassFilter
demo::PassFilter
demo::PassFilter
demo::ThreadSaferGetterFilter
demo::BusyWaitProducer
demo::BusyWaitProducer
# simultaneous events:4 total # events:3000 cpu time:90.5964 real time:47.41 events/sec:63.2778

--------------------------------------
CALIBRATING

In order to simulate the effect of actually doing HEP type work, the code does a simple integration to eat up 100% of a CPU. However, how long a given loop takes needs to be calibrated for the machine being used.  This is done by running 'BusyWaitCalibrate'

Running this program gives an output similar to the following

./BusyWaitCalibrate 
43.3487
100000 0.004535 2.20507e+07
195.367
1000000 0.042997 2.32574e+07
199.928
10000000 0.441726 2.26385e+07
6.4598
100000000 5.10827 1.95761e+07

The grouping of numbers is the results from running 4 different iteration values, each differing by a factor of 10. The lone number on a line can be ignored. The the numbers on one line are
	- the number of iterations used
	- the length of time it took to do that many iterations
	- the ratio of (# of iterations)/(length of time)
From experimentation, I've found that the first 3 iterations tend to be converging to a reasonable common number but the longest iteration tends to be slower, probably because the operating system is taking some time from the job in order to do some other work.  Therefore I tend to take the largest value of the ration from the first three values.

The ratio is then used to set the value in the various configuration files. The value is
	process.options.busyWaitScaleFactor
The easy way to change the value is to just search for 'busyWaitScaleFactor' in the configuration file since it only appears once.


--------------------------------------
CONFIGURATIONS

The configuration files are JSON format files which contain the information such as
- What is the busy wait scale factor to use (process.options.busyWaitScaleFactor)
- How many events should the job 'process' (process.source.iterations)
- How many events should be processed simultaneously (process.options.nSimultaneousEvents)
- What 'modules' to run and how the various 'modules' are related

The demo framework uses standard CMS concepts:
	-Tasks are encapsulated in 'modules' where 'modules' only communicate by adding object to and reading objects from the Event
	-Modules which only read data from the Event and decide if we should continue processing an Event are called Filters
	-Filters are placed in sequence into a Path
	-The job allows multiple Paths, where each Path is considered independent and can be run in parallel
	-Modules which add data to an event are called Producers. Producers can depend on data in the Event. A Producer will be run the first time a data item is requested from a given Event.
	-Data in the event is identified by the 'label' which was assigned to the Producer which makes the data.

The list of Filters is defined in the 'process.filters' value.
The list of Producers is defined in the 'process.producers' value.
The list of Paths is defined in the 'process.paths' value.

Filters and Producers are configured with the mandatory parameters
	'@label': the label used to uniquely defined the module (and the data the module makes)
	'@type':  what C++ class to instantiate when creating the module
Each C++ class type can then have its own parameters, although most accept
	'eventTimes': how long the module runs for a given event. Specifying time for each event allows one to measure effects of correlations of times between modules. The units are seconds.
	'threadType': used by DispatchDemo to decide how thread safe to treat the module. The allowed values are
		ThreadSafeBetweenInstances: the same instance of a module can be passed multiple events simultaneously
		ThreadSafeBetweenModules: the same instance of a module can only handle one event at a time but it does not interact with any other module
		ThreadUnsafe: the module can not be run at the same time as other 'ThreadUnsafe' modules
	'toGet': A list of the data to acquire from the event. The list is pair of 'label' (which is the label of the Producer) and a 'product' which is not used now.

--------------------------------------
EXPERIMENTS

The experimental configurations are found in cms-data:
reco_minbias_busywait.config: 
This simulates doing event Reconstruction on a common and quick to process event type where each module is using real CPU time. The job has one Filter which is simulating the behavior of the output module which would write a ROOT file. This Filter is set to be 'ThreadUnsafe' to simulate present ROOT behavior. The source is set to be ThreadSafeBetweenModules so only one request is processed at a time in the module. All other modules are configured with ThreadSafeBetweenInstances so are assumed to be perfectly thread safe.

reco_minbias_sleeping.config
Similar to reco_minbias_busywait.config but instead of having modules use real CPU time, they 'sleep' instead. The libdispatch is smart enough to see when a thread is sleeping and then assign more work to the system. This configuration allows one to push the nSimultaneousEvents to much larger levels than the actual number of CPUs on the machine. I was able to run 700 nSimultaneousEvents on a 2 core machine.

reco_minbias_sleeping_1000.config
The same as reco_minbias_sleeping.config, however instead of each module having specified timing for 100 events, this has timing for 1000 events. This is only here for historic reasons.

reco_minbias_sleeping_betweenmodules.config
The same as reco_minbias_sleeping.config except the Producers have been set to only be able to process one event at a time (ThreadSafeBetweenModules). This allows scaling test for what happens if modules are not fully thread safe.

reco_minbias_sleeping_perfectIO.config
The same as reco_minbias_sleeping.config except both the source and output (i.e. the Filter) are set to completely thread safe (ThreadSafeBetweenInstances). This tests the scaling limits of the framework itself.

reco_multijet_sleeping.config
The same as reco_minbias_sleeping.config except the events take much longer to process.

reco_multijet_sleeping_1000.config
The same as reco_multijet_sleeping.config exception each module specifies 1000 event times rather than just 100. This is only here for historic reasons.

So the suggested way to proceed with a experimenting is
1) Run reco_minbias_busywait.config while changing both 'nSimultaneousEvents' and 'iterations' (trying to keep the total job time about constant) until you see the events/sec results plateau
2) Run reco_minbias_sleeping.config while changing both 'nSimultaneousEvents' and 'iterations' (trying to keep the total job time about constant) until you see the events/sec results plateau. You should be able to go to much higher nSimultaneousEvents. Also, results from 1) with the same 'nSimultaneousEvents' should be the same as 2)
3) Same as 2) but using reco_minbias_sleeping_betweenmodules.config
4) Same as 2) but using reco_minbias_sleeping_perfectIO.config

About

demonstrators of different threading technologies for CMS framework

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 90.4%
  • CMake 4.8%
  • Makefile 2.0%
  • Python 1.5%
  • TeX 1.1%
  • C 0.2%