Adaptive Execution Model¶
BOSP provides a library (bbque_rtlib
) for implementing applications
according to run-time managed Adaptive Execution Model.
This execution model drives the application through a managed execution flow,
characterized by:
Resource-awareness: the application can configure itself according to the assigned computing resources.
Runtime performance monitoring and negotiation: the application can observe the current thoughput and ask for more resources.
From the implementation perspective, the application is asked to implement a
class derived from BbqueEXC
, defined in bbque_exc.h
.
Then, the typical approach is to instantiate an object of such a class in the
main thread and invoke the Start()
member function, triggering the execution
of a control thread. This thread is responsible of the synchronizing the execution
of the application with the BarbequeRTRM management actions.
- onSetup
Initialization code. Here you should perform malloc(s), variables initializations and so on.
- onConfigure
Called when the BarbequeRTRM assigns a new Application Working Mode (AWM), i.e., the set of resources allocated for the application. The code to place here is related to whatever is required to reconfigure the application (number of threads, application parameters, data structures, …) to properly run according to resources assigned through the AWM.
- onSuspend
No resources assigned. The application must be stopped. Implement whatever needed to place in a safe and consistent state.
- onRun
This is the entry point of our task. Here must be implemented the code to execute a computational run. We strongly suggest keeping the duration of the task in a few hundreds of milliseconds, in order to make the task interruptible with a “reasonable” time granularity. This would prevent the application from being killed by the BarbequeRTRM.
- onMonitor
After a computational run, the application may check whether the level of QoS/performance/accuracy is acceptable or not. In the second case, some action could be taken.
- onRelease
Optional, but recommended member function. This is expected to contain cleanup stuff (e.g. free malloc(s)).
Tip
The setup and release methods (onSetup, onRelease) are called once during the execution of an application. The configure method (onConfigure) is called each time the application receives a new AWM; Therefore, usually, the AEM can be thought of as the following loop: onRun -> onMonitor -> onRun -> onMonitor …
Overall, the application will continuously run and monitor its execution, until a termination condition is not encountered.
Application structure¶
We provided a template, representing a basic structure of AEM-integrated application at https://github.com/HEAPLab/aem-template.
../aem_template/
├── build
├── CMakeLists.txt
├── include
│ └── AEMTemplate_exc.h
├── LICENSE
├── README
├── recipes
│ └── aem-template.recipe
└── src
├── AEMTemplate_exc.cc
├── AEMTemplate_main.cc
└── CMakeLists.txt
As we said, the application needs to implement a class derived from BbqueEXC
.
Following the content of the template, we have:
AEMTemplate_exc.h
#ifndef AEM_TEMPLATE_EXC_H_
#define AEM_TEMPLATE_EXC_H_
#include <bbque/bbque_exc.h>
using bbque::rtlib::BbqueEXC;
class AEMTemplate : public BbqueEXC {
public:
AEMTemplate(std::string const & name,
std::string const & recipe,
RTLIB_Services_t *rtlib);
private:
RTLIB_ExitCode_t onSetup();
RTLIB_ExitCode_t onConfigure(int8_t awm_id);
RTLIB_ExitCode_t onRun();
RTLIB_ExitCode_t onMonitor();
RTLIB_ExitCode_t onSuspend();
RTLIB_ExitCode_t onRelease();
};
#endif // AEM_TEMPLATE_EXC_H_
AEMTemplate_exc.cpp
#include "AEMTemplate_exc.h"
#include <iostream>
using namespace std;
AEMTemplate::AEMTemplate(std::string const & name,
std::string const & recipe,
RTLIB_Services_t *rtlib) :
BbqueEXC(name, recipe, rtlib, RTLIB_LANG_CPP)
{
cout << "New AEMTemplate::AEMTemplate() UID=" << GetUniqueID() << endl;
}
RTLIB_ExitCode_t AEMTemplate::onSetup()
{
cout << "AEMTemplate::onSetup()" << endl;
return RTLIB_OK;
}
RTLIB_ExitCode_t AEMTemplate::onConfigure(int8_t awm_id)
{
cout << "AEMTemplate::onConfigure(): proc_nr= " << proc_nr << endl;
return RTLIB_OK;
}
RTLIB_ExitCode_t AEMTemplate::onRun() {
RTLIB_WorkingModeParams_t const wmp = WorkingModeParams();
// Example: return after 5 cycles
if (Cycles() >= 5)
return RTLIB_EXC_WORKLOAD_NONE;
cout << "AEMTemplate::onRun(): Hello AEM! cycle="<< Cycles() << endl;
return RTLIB_OK;
}
RTLIB_ExitCode_t AEMTemplate::onMonitor()
{
cout << "AEMTemplate::onMonitor(): CPS=" << GetCPS() << endl;
return RTLIB_OK;
}
RTLIB_ExitCode_t AEMTemplate::onSuspend()
{
cout << "AEMTemplate::onMonitor()" << GetCPS() << endl;
return RTLIB_OK;
}
RTLIB_ExitCode_t AEMTemplate::onRelease()
{
cout << "AEMTemplate::onRelease()" << endl;
return RTLIB_OK;
}
The main file will typically have a structure similar to the provided example:
AEMTemplate_main.cc
#include <libgen.h>
#include <iostream>
#include <memory>
#include "AEMTemplate_exc.h"
using namespace std;
int main(int argc, char *argv[])
{
// Initialize RTLIb
RTLIB_Services_t *rtlib;
auto ret = RTLIB_Init(basename(argv[0]), &rtlib);
if (ret != RTLIB_OK) {
cerr << "ERROR: Did you start the BarbequeRTRM daemon? "<< endl;
return RTLIB_ERROR;
}
assert(rtlib);
// Instatiate the derived class
std::string recipe("aem-template");
auto pexc = std::make_shared<AEMTemplate>("AEMTemplate", recipe, rtlib);
if (!pexc->isRegistered()) {
cerr << "ERROR: Register failed (missing the recipe file?)" << endl;
return RTLIB_ERROR;
}
// Start the control thread (the managed application will wait
// for the resource assignment)
pexc->Start();
// Wait for the terminated of the managed application
pexc->WaitCompletion();
return EXIT_SUCCESS;
}
The sub-directory samples includes a first set of AEM-integrated samples applications.
If the user aims at developing an additional sample, to include into the BOSP, the alternative
option is to use the BOSPShell command bbque-layapp
. This launches a script though which
a new application template is created under samples and, therefore, added to the overall
BOSP configuration and building system.
Compilation¶
The templates previously mentioned already comes with suitable CMake files for properly building the application.
However, in the case the application developer needs to proceed manually, he needs to known that the necessary header files and libraries are located under the BOSP installation path as it follows:
$ tree -L 2 out/usr/
├── bin
...
├── include
│ └── bbque
│ ├── bbque_exc.h
│ ├── config.h
...
│ ├── rtlib.h
...
├── lib
│ └── bbque
│ ├── bindings
│ ├── libbbque_rtlib.so
...
Therefore, a GCC based compilation line should look like the following:
$ g++ <source files> -o <application name> -I <BOSP_PREFIX>/usr/include \
-L <BOSP_PREFIX>/usr/lib/bbque -L <BOSP_PREFIX>/usr/lib/ \
-lbbque_rtlib
The Recipe file¶
An AEM-integrated application is asked to provide a recipe file. This is an XML file providing some general information, plus a set of profiled Application Working Modes (AWMs) that the BarbequeRTRM could take into account or not, depending on the specific resource allocation policy.
The recipe must meet some requirements:
The file name must terminate with the
.recipe
extensionThe file must be installed under
<BOSP_PREFIX>/etc/bbque/recipes
The AWM IDs must be sequentially numbered, starting from 0.
At least one
<platform>
section with the ‘’id’’ matching the target platform must be provided.
Tip
As a convention, the higher is the AWM ID number, the greater is the amount of resource requirements specified.
Warning
When the application instantiates the EXC object, it provides the recipe name as an argument of the object constructor. The BarbequeRTRM checks the availability and the validity of the recipe. This means that the recipe could be one of reasons for a failed application launch.
Example:
<?xml version="1.0"?>
<BarbequeRTRM version="0.8">
<application name="MyApplication" priority="4">
<!-- Generic Linux -->
<platform id="bq.linux.*">
<awms>
<!-- AWM 0 -->
<awm id="0" name="LowQ" value="1" config-time="5">
<resources>
<cpu>
<pe qty="100"/>
<mem qty="100" units="M">
</cpu>
</resources>
</awm>
<!-- AWM 1 -->
<awm id="1" name="MedQ" value="2" config-time="5">
<resources>
<cpu>
<pe qty="200"/>
<mem qty="100" units="M">
</cpu>
</resources>
</awm>
<!-- AWM 2 -->
<awm id="2" name="HighQ" value="4" config-time="5">
<resources>
<cpu>
<pe qty="400"/>
<mem qty="150" units="M">
</cpu>
</resources>
</awm>
</awms>
</platform>
<platform id="bq.linux.*" hw="exynos_5420">
<awms>
<!-- AWM 0 -->
<awm id="0" name="LowQ" value="1" config-time="7">
<resources>
<cpu>
<pe qty="100"/>
<mem qty="100" units="M">
</cpu>
</resources>
</awm>
<!-- AWM 1 -->
<awm id="1" name="MedQ" value="2" config-time="7">
<resources>
<cpu>
<pe qty="200"/>
<mem qty="100" units="M">
</cpu>
</resources>
</awm>
<!-- AWM 2 -->
<awm id="2" name="HighQ" value="4" config-time="8">
<resources>
<cpu>
<pe qty="400"/>
<mem qty="150" units="M">
</cpu>
</resources>
</awm>
</awms>
</platform>
</application>
</BarbequeRTRM>
Here below the complete set of tags and attributes is listed. Some tags or attributes can be optional. Concerning the hierarchy of the different XML elements, please consider the example provided above.
BarbequeRTRM
The root tag, including general attributes.
recipe_version
Since the format of the recipes can change in future versions of the framework, a first validation step requires to specify the reference version of the recipe.
application
Application/EXC properties.
name
[optional]A descriptive name of the application/EXC.
priority
Static priority assigned. Generally, this is taken into account at run-time by the resource allocation policy. It is mandatory to provide a value between 0 and N, where value 0 denotes the highest priority level (critical application). The lowest possible level N can be specified in the configuration of the BOSP building.
platform
The target system. The recipe can contain more than one platform section.
id
The string identifier of the platform. The recipe must contain at least the platform section with the id matching the one of system platform (see the next section below).
hw
: [optional]This specifies the target platform from the point of view of the hardware (e.g., an SoC). The current BarbequeRTRM version supports the following hardware identifiers: “exynos_5410”, “exynos_5420”, “omap_4470”.
awms
The section listing the set of AWMs.
awm
Definition of a single AWM. A valid recipe must define at least one AWM.
id
Each AWM is identified by a number. It is strictly mandatory that the numeration starts from 0 and continue in a sequence of integer values.
name
A descriptive name for the AWM (e.g., “high-quality”,”mid-quality”, “low-quality”).
value
A preference value associated to the AWM. If the value is an expression of a performance level, in most cases the highest is the value the greater is the requirement of resources. In the example provided, a direct proportionality has been applied between the number of resources and the value associated.
config-time
[optional][optional] Specify the time spent to configure the application in the given Application Working Mode.
resources
The section listing the resource requirements of the AWM. All the children tags nested into this section are considered resource names. The hierarchy of the nesting and the ID specified are used to build the “resource path”, i.e. a namespace-style string identifying the specific resource. For instance, in the example, the recipe would produce the following resource paths:
cpu.pe
andcpu.mem
.
sys
Used to group resources into a two-level hierarchical partitioning. Specifically,
sys
references to a system, though as a single working machine or board. This is an optional tag if the target platform is not a distributed computational system. In other words, on a common desktop or embedded board we can avoid to specify it.
cpu
General-purpose processor, usually featuring a set of multiple cores sharing one or more cache memory levels.
mem
The amount of memory required. Please consider the hierarchical position to reference the correct level of memory.
pe
The number of processing requirements in terms of CPU time quota (percentage). For instance, in AWM 2 the recipe requires a 200%, meaning that we need a full usage of 2 CPU cores.
units
A qualifier for the attribute
qty
. Values actually supported are % for the processors, KB, MB, GB for the memories.
qty
The amount required.
The platform identification string¶
The attribute <platform id="">
specifies the target hardware configuration
for which the section of the application is intended.
The value of the attribute is a string, which hierarchically identifies, the host system and the acceleration platforms, which should match the selected target configurations options in Build Configuration with Kconfig.
Focusing on the host system part only, according to the current version of the BarbequeRTRM, we can specify one of the following options:
bq.linux: Linux-based system with cgroup support
bq.android: Android-based system
bq.test: Host is emulated
If the application includes kernels to accelerate, probably the target hardware
must include GPUs or HW accelerators. In such a case, we may want to specify
that the given <platform>
section is intended for a BarbequeRTRM
configuration built with the OpenCL or the nVIDIA support. For example:
bq.linux.opencl
bq.linux.nvidia
One interesting aspect to consider is that the value of the attribute can include a regular expression pattern. Therefore, the following options are perfectly acceptable:
bq.*.opencl : whatever is the host is fine, just match the OpenCL acceleration
bq.linux.* : Linux-based host with whatever is the acceleration platform
bq.* : whatever combination host system + acceleration platform is fine
Resource awareness¶
The onConfigure() execution is typically the right moment for checking the resources assigned by the BarbequeRTRM. Accordingly, the application can determine the right number of threads to spawn or set any other parameter that can be considered “resource-sensitive.
To this aim, the RTLib provides the function ‘’GetAssignedResources()’’. In the following example, we show a possible usage of the function. In particular, we imagine the application setting the number of threads equal to the number of CPU cores assigned by the resource manager.
RTLIB_ExitCode_t MyApp::onConfigure(int8_t awm_id) {
int cpu_quota, nr_cpu_cores, nr_gpus, mem;
GetAssignedResources(PROC_ELEMENT, cpu_quota);
GetAssignedResources(PROC_NR, nr_cpu_cores);
nr_threads = nr_cpu_cores;
...
GetAssignedResources(MEMORY, mem);
...
GetAssignedResources(GPU, nr_gpus);
if (nr_gpus > 0) {
// Cool... we have some GPUs!
...
}
return RTLIB_OK;
}
Run-time monitoring and negotiation¶
The execution model allows the application to monitor its own performance
(throughput) at run-time. The throughput is measured as the number of
processing cycles (onRun()/onMonitor()
executions) per seconds (CPS).
The RTLib provides the GetCPS()
function to return the cycles-per-second
mean value, computed through and exponential mean over the last cycles. The
onMonitor()
function is, in this case, a reasonable point at which placing
the function call.
RTLIB_ExitCode_t MyApp::onMonitor() {
float curr_cps = GetCPS();
...
return RTLIB_OK;
}
The performance monitoring can lead the application to react by adopting two possible approaches:
To reconfigure itself, for example, by tuning the amount of data to process during each
onRun()
executionMake the resource manager aware of the current performance goal, such that the resource assignment could follow the application-specific requests.
The second option is implemented through the usage of the SetCPSGoal(cps_min,
cps_max)
function. Through this function, the application can specify a range
of CPS that is considered the current performance goal. It is worth noticing
that setting a performance goal has a two-fold objective: on one side, we may
want to push the resource manager in order to allow the application to run as
fast as it can; on the other side, the application can find itself in some
scenarios for which all it needs is the minimum amount of resources. This can
be considered a possible approach to explicitly contribute to reducing the
power consumption of the system, which can be the case of mobile devices.
In general, the SetCPSGoal
calls can be reasonably placed in the
onSetup()
body, in order to set an initial goal, and in the onMonitor()
to redefine this goal according to application-specific conditions.
What is worth remarking, about this function, is the fact that it enables a performance monitoring and resource assignment negotiation process, automatically driven by the RTLib threads, without requiring further additions of code lines in the application.
Example:
RTLIB_ExitCode_t MyApp::onSetup() {
SetCPSGoal(2.5, 3.5);
...
return RTLIB_OK;
}
...
RTLIB_ExitCode_t MyApp::onMonitor() {
if (low_power_mode) {
SetCPSGoal(0.75, 1.25);
}
...
return RTLIB_OK;
}
Profiling¶
The BBQUE_RTLIB_OPTS
variable allows activating a number of features which
you will find very helpful during the application characterization.
Unmanaged mode¶
The applications profiling process could require to run the AEM application bypassing the BarbequeRTRM policy. We defined the unmanaged mode as the execution of an AEM-integrated application, while the resource manager daemon is not running. It is enabled as it follows:
$ BBQUE_RLTIB_OPTS='U' ./my-aem-application
This feature can ben exploited, for example, when a specific Application
Working Mode has been assigned. In such a case, give the The Recipe file
provided to the costructor of the BbqueEXC
derived object, we can specify
the AWM to select as it follows:
$ BBQUE_RLTIB_OPTS='U0' ./my-aem-application
where 0
is the identification number of the working mode.
Warning
The assignment of the working mode, in this case, does not lead to the allocation of the related set of resources, since the BarbequeRTRM daemon is not running.
Statistics¶
A set of statistics is dumped after the execution of the application, also without the need of exporting the BBQUE_RTLIB_OPTS variable.
Cumulative execution stats for 'exc_00':
TotCycles : 3
StartLatency : 511 [ms]
AwmWait : 511 [ms]
Configure : 0 [ms]
Process : 2247 [ms]
# EXC AWM Uses Cycles Total | Min Max | Avg Var
#==================================+===================+==================
exc_00 002 1 3 2247 | 749.303 749.360 | 749.333 0.001
#-------------------------+ +-------------------+------------------
exc_00 002 onRun 2247 | 749.264 749.296 | 749.287 0.000
exc_00 002 onMonitor 0 | 0.038 0.064 | 0.045 0.000
#-------------------------+--------+-------------------+------------------
exc_00 002 onConfigure 0 | 0.466 0.466 | 0.466 0.000
TotCycles are the total executed cycles
- StartLatency is the time elapsed from the application
invocation to its first cycle execution. It comprehends mainly cgroup creation, recipe parsing and schedule choice. The cgroup creation is by far the most heavy contribution in terms of elapsed time (tens milliseconds).
Configure is the time spent in the
onConfigure()
functionProcess is the time spent in the
onRun()
function
Performance counters support¶
In case we aim at retrieving the CPU performance counters, we can rely on the
perf tool integration and launch the application, by appending the value
pN
, where N is equal to 1, 2 or 3, depending on the set of performance
counters we aim at reading.
Example:
$ BBQUE_RLTIB_OPTS='p2' ./my-aem-application
[...]
11:22:04,279 - NOTICE rpc : Execution statistics:
Cumulative execution stats for 'my-aem-application':
TotCycles : 8
StartLatency : 511 [ms]
AwmWait : 511 [ms]
Configure : 0 [ms]
Process : 5992 [ms]
...
Perf counters stats for 'my-aem-application-2' (8 cycles):
0 L1-icache-loads # 0.000 M/sec ( +- 0.00% )
0.199142 task-clock # 0.000 CPUs utilized ( +- 25.68% )
0 context-switches # 0.000 M/sec ( +- 0.00% )
0 CPU-migrations # 0.000 M/sec ( +- 0.00% )
0 page-faults # 0.000 M/sec ( +- 0.00% )
136706 cycles # 0.686 GHz ( +- 20.94% )
112249 stalled-cycles-frontend # 563.664 M/sec ( +- 24.61% )
100733 stalled-cycles-backend # 505.835 M/sec ( +- 26.31% )
41892 instructions # 0.31 insns per cycle
# 2.68 stalled cycles per insn ( +- 1.56% )
9038 branches # 45.387 M/sec ( +- 1.23% )
0 branch-misses # 0.00% of all branches ( +- 0.00% ) [ 0.00%]
0 L1-dcache-loads # 0.000 M/sec ( +- 0.00% ) [ 0.00%]
0 L1-dcache-load-misses ( +- 0.00% ) [ 0.00%]
0 LLC-loads # 0.000 M/sec ( +- 0.00% ) [ 0.00%]
0 L1-icache-load-misses ( +- 0.00% ) [ 0.00%]
0 dTLB-loads # 0.000 M/sec ( +- 0.00% ) [ 0.00%]
0 dTLB-load-misses ( +- 0.00% ) [ 0.00%]
0 iTLB-loads # 0.000 M/sec ( +- 0.00% ) [ 0.00%]
0 iTLB-load-misses ( +- 0.00% ) [ 0.00%]
749.510718 cycle time [ms] ( +- 0.01% )
Alternatively, we can specify raw performance counters, by using the following syntax:
rN, label_1-counter_1, label_2-counter_2, ..., label_N-counter_N
In practice, you have to provide the number of counters, and append to the line a sequence of label-counter_code pairs. Mind that you could also exploit a unit mask to select sub-events.
Let’s see what happen if we need to sample the number of L2 accesses (EV_COUNTER F0H) of a certain processor, for which we found the following codes:
Demand Data Read requests that access L2 cache (UMASK 01H)
RFO requests that access L2 cache (UMASK 02H)
L2 cache accesses when fetching instructions (UMASK 04H)
L2 or LLC HW prefetches that access L2 cache (UMASK 08H)
L1D writebacks that access L2 cache (UMASK 10H)
L2 fill requests that access L2 cache (UMASK 20H)
L2 writebacks that access L2 cache (UMASK 40H)
Transactions accessing L2 pipe (UMASK 80H)
Example:
$ export BBQUE_RLTIB_OPTS="r8, l2ddr-01f0, l2rfo-02f0, l2if-04f0, l2pref-08f0, l1dwb-10f0, l2fr-20f0, l2wb-40f0, l2p-80f0"
$ ./my-aem-application
$ ...
Perf counters stats for 'ps21_btrack-1' (260 cycles):
1473205 raw 0x1f0 ( +- 18.30% ) [53.86%]
225353 raw 0x2f0 ( +- 25.71% ) [54.01%]
421636 raw 0x4f0 ( +- 25.95% ) [54.57%]
975746 raw 0x8f0 ( +- 18.75% ) [54.27%]
313085 raw 0x10f0 ( +- 21.47% ) [54.35%]
886052 raw 0x20f0 ( +- 20.47% ) [53.96%]
160015 raw 0x40f0 ( +- 27.02% ) [53.78%]
4521546 raw 0x80f0 ( +- 17.30% ) [54.03%]
160.892158 cycle time [ms] ( +- 53.36% )
Programming languages¶
The Adaptive Execution Model provides also wrapper solutions for the following languages/environments.
Heterogeneous programming¶
For heterogeneous programming applications mixed in the Adaptive Execution Model, we can proceed as it follows.
First, we need to specify, as the fourth argument of the superclass constructor
(BbqueEXC
), the programming library or the extension used to this aim. This
will help the resource manager with the selection of the correct set of
resources, especially in case of systems featuring a mix of runtimes and
related devices.
The currently supported options are the following:
RTLIB_LANG_TASKGRAPH: For applications using programming libraries based on task-graph constructs (e.g., the libmango)
RTLIB_LANG_CUDA: For applications coming with CUDA kernels targeting NVIDIA GPU devices.
RTLIB_LANG_OPENCL: For applications using the OpenCL function calls for getting access to the computing devices, setup memory buffers, perform data transfers and offloading kernels onto heterogeneous devices, including GPUs and accelerators.
This is the example of object construction for the OpenCL case:
class MyApp : public BbqueEXC {
public:
MyApp(std::string const & name,
std::string const & recipe,
RTLIB_Services_t *rtlib):
BbqueEXC(name, recipe, rtlib, RTLIB_LANG_OPENCL) {
}
...
};
The OpenCL case¶
Most of the setup code, from the platform and device selection to the buffers
and kernels setup must be placed in the onConfigure()
implementation. This
allows the application to adapt itself to possible device assignments changes,
performed by the resource manager. The values returned by the
clGetPlatformIDs
and the clGetDeviceIDs
are actually set by the
resource manager and may include a subset of the devices installed in the
system.
As of the OpenCL approach, this change of assigned devices requires to redo a sequence of initialization steps which are platform and device-dependent.
Please note that, in this case, you don’t need to include the
GetAssignedResources()
function calls to determine the type of assigned
devices, since this comes with the array of devices returned by the
clGetDeviceIDs
function.
However, the GetAssignedResources()
can still be useful to check the number
of CPU cores (PROC_NR
), in case no GPUs have been assigned, and we want to
configure the execution of the kernel instances (work items) accordingly.
RTLIB_ExitCode_t MyApp::onConfigure(int8_t awm_id) {
// Platform
cl_uint np;
cl_platform_id* platforms;
cl_platform_id platform;
clGetPlatformIDs( 0,0, &np);
platforms = (cl_platform_id *)malloc(np * sizeof(cl_platform_id));
clGetPlatformIDs(np, platforms, 0);
...
platform = platforms[0];
cl_uint nd;
cl_device_id* devices;
cl_device_id dev;
// Device selection
clGetDeviceIDs(platform, CL_DEVICE_TYPE_ALL, 0, 0, &;);
devices = (cl_device_id *)malloc(nd * sizeof(cl_device_id));
clGetDeviceIDs(platform, CL_DEVICE_TYPE_ALL, nd, devices,0);
...
// Initialize buffers and kernels...
return RTLIB_OK;
}
Then, similarly to the other cases, the onRun()
execution should be
synchronized on the termination of the kernels, processing a subset of data
from the overall input set.
Checkout the already integrated OpenCL samples for more.