.. _app-guide:

Application Development Guide
=============================

This is for the developers aiming at running their application under the
management of the BarbequeRTRM.

In general, we distinguish between managing :ref:`generic-processes` and
*Adaptive Execution Model integrated* applications, as introduced in the User
Guide.


.. toctree::
   :maxdepth: 1
   :hidden:

   guide-app-process


.. _aem-applications:

Adaptive Execution Model 
------------------------

BOSP provides a library (``bbque_rtlib``) for implementing applications
according to run-time managed *Adaptive Execution Model*.
This execution model drives the application through a managed execution flow,
characterized by:

  - *Resource-awareness*: the application can configure itself according to the
    assigned computing resources.
  - *Runtime performance monitoring and negotiation*: the application can
    observe the current thoughput and ask for more resources.

From the implementation perspective, the application is asked to implement a
class derived from ``BbqueEXC``, defined in ``bbque_exc.h``.

Then, the typical approach is to instantiate an object of such a class in the
main thread and invoke the ``Start()`` member function, triggering the execution
of a control thread. This thread is responsible of the synchronizing the execution
of the application with the BarbequeRTRM management actions.

.. image:: img/aem.svg


- **onSetup** 
      Initialization code. Here you should perform malloc(s), variables
      initializations and so on.

- **onConfigure**
      Called when the BarbequeRTRM assigns a new *Application Working Mode
      (AWM)*, i.e., the set of resources allocated for the application.  The
      code to place here is related to whatever is required to reconfigure the
      application (number of threads, application parameters, data structures, ...)
      to properly run according to resources assigned through the AWM.

- **onSuspend**
      No resources assigned. The application must be stopped. Implement whatever
      needed to place in a safe and consistent state.

- **onRun**
      This is the entry point of our task. Here must be implemented the code
      to execute a computational run. We strongly suggest keeping the duration of the
      task in a few hundreds of milliseconds, in order to make the task interruptible
      with a "reasonable" time granularity. This would prevent the application from
      being killed by the BarbequeRTRM.

- **onMonitor**
      After a computational run, the application may check whether the level
      of QoS/performance/accuracy is acceptable or not. In the second case, some
      action could be taken.

- **onRelease** 
      Optional, but recommended member function. This is expected to contain cleanup
      stuff (e.g. free malloc(s)).

.. tip::

	The setup and release methods (onSetup, onRelease) are called once
	during the execution of an application. The configure method (onConfigure) is
	called each time the application receives a new AWM; Therefore, usually, the
	AEM can be thought of as the following loop: onRun -> onMonitor ->
	onRun -> onMonitor ...

Overall, the application will continuously run and monitor its execution, until
a termination condition is not encountered.


.. _aem-structure:

Application structure
^^^^^^^^^^^^^^^^^^^^^

We provided a template, representing a basic structure of AEM-integrated application
at https://github.com/HEAPLab/aem-template.

.. code-block:: bash

	 ../aem_template/
	├── build
	├── CMakeLists.txt
	├── include
	│   └── AEMTemplate_exc.h
	├── LICENSE
	├── README
	├── recipes
	│   └── aem-template.recipe
	└── src
	    ├── AEMTemplate_exc.cc
	    ├── AEMTemplate_main.cc
	    └── CMakeLists.txt


As we said, the application needs to implement a class derived from ``BbqueEXC``.
Following the content of the template, we have:

**AEMTemplate_exc.h**


.. code-block:: cpp

	#ifndef AEM_TEMPLATE_EXC_H_
	#define AEM_TEMPLATE_EXC_H_
	#include <bbque/bbque_exc.h>

	using bbque::rtlib::BbqueEXC;

	class AEMTemplate : public BbqueEXC {

	public:
		AEMTemplate(std::string const & name,
			    std::string const & recipe,
			    RTLIB_Services_t *rtlib);

	private:

		RTLIB_ExitCode_t onSetup();
		RTLIB_ExitCode_t onConfigure(int8_t awm_id);
		RTLIB_ExitCode_t onRun();
		RTLIB_ExitCode_t onMonitor();
		RTLIB_ExitCode_t onSuspend();
		RTLIB_ExitCode_t onRelease();
	};

	#endif // AEM_TEMPLATE_EXC_H_


**AEMTemplate_exc.cpp**

.. code-block:: cpp

	#include "AEMTemplate_exc.h"
	#include <iostream>

	using namespace std;

	AEMTemplate::AEMTemplate(std::string const & name,
			std::string const & recipe,
			RTLIB_Services_t *rtlib) :
		BbqueEXC(name, recipe, rtlib, RTLIB_LANG_CPP) 
	{
		cout << "New AEMTemplate::AEMTemplate() UID=" << GetUniqueID() << endl;
	}

	RTLIB_ExitCode_t AEMTemplate::onSetup() 
	{
		cout << "AEMTemplate::onSetup()" << endl;
		return RTLIB_OK;
	}

	RTLIB_ExitCode_t AEMTemplate::onConfigure(int8_t awm_id) 
	{
		cout << "AEMTemplate::onConfigure(): proc_nr= " << proc_nr << endl;
		return RTLIB_OK;
	}

	RTLIB_ExitCode_t AEMTemplate::onRun() {
		RTLIB_WorkingModeParams_t const wmp = WorkingModeParams();

		// Example: return after 5 cycles
		if (Cycles() >= 5)
			return RTLIB_EXC_WORKLOAD_NONE;

		cout << "AEMTemplate::onRun(): Hello AEM! cycle="<< Cycles() << endl;
		return RTLIB_OK;
	}

	RTLIB_ExitCode_t AEMTemplate::onMonitor() 
	{
		cout << "AEMTemplate::onMonitor(): CPS=" << GetCPS() << endl;
		return RTLIB_OK;
	}

	RTLIB_ExitCode_t AEMTemplate::onSuspend() 
	{
		cout << "AEMTemplate::onMonitor()" << GetCPS() << endl;
		return RTLIB_OK;
	}

	RTLIB_ExitCode_t AEMTemplate::onRelease() 
	{
		cout << "AEMTemplate::onRelease()" << endl;
		return RTLIB_OK;
	}

The main file will typically have a structure similar to the provided example:

**AEMTemplate_main.cc**

.. code-block:: cpp

	#include <libgen.h>
	#include <iostream>
	#include <memory>
	#include "AEMTemplate_exc.h"

	using namespace std;

	int main(int argc, char *argv[]) 
	{
		// Initialize RTLIb
		RTLIB_Services_t *rtlib;
		auto ret = RTLIB_Init(basename(argv[0]), &rtlib);
		if (ret != RTLIB_OK) {
			cerr << "ERROR: Did you start the BarbequeRTRM daemon? "<< endl;
			return RTLIB_ERROR;
		}
		assert(rtlib);

		// Instatiate the derived class
		std::string recipe("aem-template");
		auto pexc = std::make_shared<AEMTemplate>("AEMTemplate", recipe, rtlib);
		if (!pexc->isRegistered()) {
			cerr << "ERROR: Register failed (missing the recipe file?)" << endl;
			return RTLIB_ERROR;
		}

		// Start the control thread (the managed application will wait
		// for the resource assignment)
		pexc->Start();

		// Wait for the terminated of the managed application
		pexc->WaitCompletion();
		return EXIT_SUCCESS;
	}


The sub-directory **samples** includes a first set of AEM-integrated samples applications.
If the user aims at developing an additional sample, to include into the BOSP, the alternative
option is to use the BOSPShell command ``bbque-layapp``. This launches a script though which
a new application template is created under **samples** and, therefore, added to the overall
BOSP configuration and building system.

Compilation
^^^^^^^^^^^

The templates previously mentioned already comes with suitable *CMake* files for
properly building the application.

However, in the case the application developer needs to proceed manually, he needs to 
known that the necessary header files and libraries are located under the BOSP
installation path as it follows:

.. code-block:: bash

	$ tree -L 2 out/usr/
	├── bin
	...
	├── include
	│   └── bbque
	│       ├── bbque_exc.h
	│       ├── config.h
	...
	│       ├── rtlib.h
	...
	├── lib
	│   └── bbque
	│       ├── bindings
	│       ├── libbbque_rtlib.so
	...


Therefore, a GCC based compilation line should look like the following:

.. code-block:: bash

	$ g++ <source files> -o <application name> -I <BOSP_PREFIX>/usr/include \
		-L <BOSP_PREFIX>/usr/lib/bbque -L <BOSP_PREFIX>/usr/lib/ \
		-lbbque_rtlib


.. _aem-recipe:

The *Recipe* file
^^^^^^^^^^^^^^^^^

An AEM-integrated application is asked to provide a *recipe* file. This is an
XML file providing some general information, plus a set of profiled *Application
Working Modes (AWMs)* that the BarbequeRTRM could take into account or not, depending
on the specific resource allocation policy. 

The recipe must meet some requirements:

 - The file name must terminate with the ``.recipe`` extension
 - The file must be installed under ``<BOSP_PREFIX>/etc/bbque/recipes``
 - The AWM IDs must be sequentially numbered, starting from 0.
 - At least one ``<platform>`` section with the ''id'' matching the target platform must be provided.

.. tip::
	As a convention, the higher is the AWM ID number, the greater is the
	amount of resource requirements specified.  

.. warning::

	When the application instantiates the EXC object, it provides the
	recipe name as an argument of the object constructor. The BarbequeRTRM checks
	the availability and the validity of the recipe. This means that the recipe
	could be one of reasons for a failed application launch.

Example:

.. code-block:: xml

	<?xml version="1.0"?>
	<BarbequeRTRM version="0.8">
	    <application name="MyApplication" priority="4">
	    
	    <!-- Generic Linux -->
		<platform id="bq.linux.*">
		    <awms>
		    <!-- AWM 0 -->
			<awm id="0" name="LowQ" value="1" config-time="5">
			    <resources>
				<cpu>
				    <pe qty="100"/>
				    <mem qty="100" units="M">
				</cpu>
			    </resources>
			</awm>
		    <!-- AWM 1 -->
			<awm id="1" name="MedQ" value="2" config-time="5">
			    <resources>
				<cpu>
				    <pe qty="200"/>
				    <mem qty="100" units="M">
				</cpu>
			    </resources>
			</awm>
		    <!-- AWM 2 -->
			<awm id="2" name="HighQ" value="4" config-time="5">
			    <resources>
				<cpu>
				    <pe qty="400"/>
				    <mem qty="150" units="M">
				</cpu>
			    </resources>
			</awm>
		    </awms>
		</platform>
	       <platform id="bq.linux.*" hw="exynos_5420">
		    <awms>
		    <!-- AWM 0 -->
			<awm id="0" name="LowQ" value="1" config-time="7">
			    <resources>
				<cpu>
				    <pe qty="100"/>
				    <mem qty="100" units="M">
				</cpu>
			    </resources>
			</awm>
		    <!-- AWM 1 -->
			<awm id="1" name="MedQ" value="2" config-time="7">
			    <resources>
				<cpu>
				    <pe qty="200"/>
				    <mem qty="100" units="M">
				</cpu>
			    </resources>
			</awm>
		    <!-- AWM 2 -->
			<awm id="2" name="HighQ" value="4" config-time="8">
			    <resources>
				<cpu>
				    <pe qty="400"/>
				    <mem qty="150" units="M">
				</cpu>
			    </resources>
			</awm>
		    </awms>
		</platform>
	    </application>
	</BarbequeRTRM>



Here below the complete set of tags and attributes is listed. Some tags or
attributes can be optional.  Concerning the hierarchy of the different XML
elements, please consider the example provided above.

 - ``BarbequeRTRM`` 
	The root tag, including general attributes.

	- ``recipe_version``
		Since the format of the recipes can change in future versions of the
		framework, a first validation step requires to specify the reference version of
		the recipe.

 - ``application`` 
	Application/EXC properties.

	- ``name`` [optional] 
		A descriptive name of the application/EXC.
	- ``priority`` 
		Static priority assigned. Generally, this is taken into
		account at run-time by the resource allocation policy. It is mandatory to
		provide a value between 0 and N, where value 0 denotes the highest priority
		level (critical application). The lowest possible level N can be specified in
		the configuration of the BOSP building. 

 - ``platform``
	The target system. The recipe can contain more than one platform section.

	- ``id``
		The string identifier of the platform. The recipe must contain at
		least the platform section with the id matching the one of
		system platform (see the next section below).
	- ``hw``: [optional] 
		This specifies the target platform from the point of view of
		the hardware (e.g., an SoC). The current BarbequeRTRM version supports
		the following hardware identifiers: "exynos_5410", "exynos_5420",
		"omap_4470".

 - ``awms``
	 The section listing the set of AWMs.
 - ``awm`` 
	Definition of a single AWM. A valid recipe must define at least one AWM.

	- ``id``
		Each AWM is identified by a number. It is strictly mandatory
		that the numeration starts from 0 and continue in a sequence of integer values.
	- ``name``
		A descriptive name for the AWM (e.g., "high-quality","mid-quality", "low-quality").
	- ``value``
		A preference value associated to the AWM. If the *value* is an
		expression of a performance level, in most cases the highest is the *value*
		the greater is the requirement of resources. In the example provided, a direct
		proportionality has been applied between the number of resources and the
		*value* associated.
	- ``config-time`` [optional] 
		[optional] Specify the time spent to configure the application
		in the given Application Working Mode.

 - ``resources``
	The section listing the resource requirements of the AWM. All the
	children tags nested into this section are considered resource names.
	The hierarchy of the nesting and the ID specified are used to build the
	"resource path", i.e. a namespace-style string identifying the specific
	resource. For instance, in the example, the recipe would produce the following
	resource paths: ``cpu.pe`` and ``cpu.mem``.
 - ``sys``
	Used to group resources into a two-level hierarchical partitioning.
	Specifically, ``sys`` references to a system, though as a single
	working machine or board. This is an optional tag if the target
	platform is not a distributed computational system. In other words, on a
	common desktop or embedded board we can avoid to specify it.
 - ``cpu``
	General-purpose processor, usually featuring a set of multiple cores
	sharing one or more cache memory levels. 
 - ``mem``
	The amount of memory required. Please consider the hierarchical
	position to reference the correct level of memory.
 - ``pe``
	The number of processing requirements in terms of CPU time quota
	(percentage). For instance, in AWM 2 the recipe requires a 200%, meaning that
	we need a full usage of 2 CPU cores.

	 - ``units``
		A qualifier for the attribute ``qty``. Values actually supported
		are *%* for the processors, *KB*, *MB*, *GB* for the memories.
	 - ``qty``
		The amount required.


The platform identification string
""""""""""""""""""""""""""""""""""

The attribute ``<platform id="">`` specifies the target hardware configuration
for which the section of the application is intended.

The value of the attribute is a string, which hierarchically identifies, the
*host system* and the *acceleration platforms* (see
[[docs:bosp:config|Configure BOSP]]).

Focusing on the host system part only, according to the current version of the
BarbequeRTRM, we can specify one of the following options:

- *bq.linux*: Linux-based system with cgroup support
- *bq.android*: Android-based system
- *bq.test*: Host is emulated

If the application includes kernels to accelerate, probably the target hardware
must include GPUs or HW accelerators. In such a case, we may want to specify
that the given ``<platform>`` section is intended for a BarbequeRTRM
configuration built with the OpenCL or the nVIDIA support. For example:

- *bq.linux.opencl*
- *bq.linux.nvidia*

One interesting aspect to consider is that the value of the attribute can
include a regular expression pattern. Therefore, the following options are
perfectly acceptable:

- *bq.\*.opencl* : whatever is the host is fine, just match the OpenCL  acceleration
- *bq.linux.\** : Linux-based host with whatever is the acceleration platform
- *bq.\** : whatever combination host system + acceleration platform is fine


Resource awareness
^^^^^^^^^^^^^^^^^^

The **onConfigure()** execution is typically the right moment for checking the
resources assigned by the BarbequeRTRM. Accordingly, the application can
determine the right number of threads to spawn or set any other parameter that
can be considered "resource-sensitive.

To this aim, the RTLib provides the function ''GetAssignedResources()''. In the
following example, we show a possible usage of the function. In particular, we
imagine the application setting the number of threads equal to the number of
CPU cores assigned by the resource manager.

.. code-block:: cpp

	RTLIB_ExitCode_t MyApp::onConfigure(int8_t awm_id) {

		int cpu_quota, nr_cpu_cores, nr_gpus, mem;

		GetAssignedResources(PROC_ELEMENT, cpu_quota);
		GetAssignedResources(PROC_NR, nr_cpu_cores);
		nr_threads = nr_cpu_cores;
		...
		GetAssignedResources(MEMORY, mem);
		...
		GetAssignedResources(GPU, nr_gpus);
		if (nr_gpus > 0) {
		   // Cool... we have some GPUs!
		   ...
		} 
		
		return RTLIB_OK;
	}



Run-time monitoring and negotiation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The execution model allows the application to monitor its own performance
(throughput) at run-time. The throughput is measured as the number of
processing cycles (``onRun()/onMonitor()`` executions) per seconds (CPS).

The RTLib provides the ``GetCPS()`` function to return the cycles-per-second
mean value, computed through and exponential mean over the last cycles. The
``onMonitor()`` function is, in this case, a reasonable point at which placing
the function call. 

.. code-block:: cpp

	RTLIB_ExitCode_t MyApp::onMonitor() {

		float curr_cps = GetCPS();
		...
		
		return RTLIB_OK;
	}


The performance monitoring can lead the application to react by adopting two
possible approaches:

  - To reconfigure itself, for example, by tuning the amount of data to process
    during each ``onRun()`` execution
  - Make the resource manager aware of the current performance goal, such that
    the resource assignment could follow the application-specific requests.

The second option is implemented through the usage of the ``SetCPSGoal(cps_min,
cps_max)`` function. Through this function, the application can specify a range
of CPS that is considered the current performance goal. It is worth noticing
that setting a performance goal has a two-fold objective: on one side, we may
want to push the resource manager in order to allow the application to run as
fast as it can; on the other side, the application can find itself in some
scenarios for which all it needs is the minimum amount of resources. This can
be considered a possible approach to explicitly contribute to reducing the
power consumption of the system, which can be the case of mobile devices.

In general, the ``SetCPSGoal`` calls can be reasonably placed in the
``onSetup()`` body, in order to set an initial goal, and in the ``onMonitor()``
to redefine this goal according to application-specific conditions.

What is worth remarking, about this function, is the fact that it enables a
performance monitoring and resource assignment negotiation process,
automatically driven by the RTLib threads, without requiring further additions
of code lines in the application.

Example:

.. code-block:: cpp

	RTLIB_ExitCode_t MyApp::onSetup() {

		SetCPSGoal(2.5, 3.5);
		...
		
		return RTLIB_OK;
	}

	...

	RTLIB_ExitCode_t MyApp::onMonitor() {

		if (low_power_mode) {
		    SetCPSGoal(0.75, 1.25);
		}

		...
		return RTLIB_OK;
	}


.. include:: guide-app-prof.rst


Programming languages
^^^^^^^^^^^^^^^^^^^^^

The Adaptive Execution Model provides also wrapper solutions for the following
languages/environments.

.. toctree::
   :maxdepth: 1

   guide-app-python
   guide-app-android

Heterogeneous programming
^^^^^^^^^^^^^^^^^^^^^^^^^

For heterogeneous programming applications mixed in the Adaptive Execution
Model, we can proceed as it follows.

First, we need to specify, as the fourth argument of the superclass constructor
(``BbqueEXC``), the programming library or the extension used to this aim. This
will help the resource manager with the selection of the correct set of
resources, especially in case of systems featuring a mix of runtimes and
related devices.

The currently supported options are the following:

  - *RTLIB_LANG_TASKGRAPH*: For applications using programming libraries
    based on task-graph constructs (e.g., the **libmango**)
  - *RTLIB_LANG_CUDA*: For applications coming with CUDA kernels targeting
    NVIDIA GPU devices.
  - *RTLIB_LANG_OPENCL*: For applications using the OpenCL function calls for
    getting access to the computing devices, setup memory buffers, perform data
    transfers and offloading kernels onto heterogeneous devices, including GPUs and
    accelerators.

This is the example of object construction for the OpenCL case:

.. code-block:: cpp

	class MyApp : public BbqueEXC {

	public:

		MyApp(std::string const & name,
		      std::string const & recipe,
		      RTLIB_Services_t *rtlib):
		    BbqueEXC(name, recipe, rtlib, RTLIB_LANG_OPENCL) {

		}
	...
	};



The OpenCL case
"""""""""""""""

Most of the setup code, from the platform and device selection to the buffers
and kernels setup must be placed in the ``onConfigure()`` implementation. This
allows the application to adapt itself to possible device assignments changes,
performed by the resource manager. The values returned by the
``clGetPlatformIDs`` and the ``clGetDeviceIDs`` are actually set by the
resource manager and may include a subset of the devices installed in the
system.

As of the OpenCL approach, this change of assigned devices requires to redo a
sequence of initialization steps which are platform and device-dependent.

Please note that, in this case, you don't need to include the
``GetAssignedResources()`` function calls to determine the type of assigned
devices, since this comes with the array of devices returned by the
``clGetDeviceIDs`` function.

However, the ``GetAssignedResources()`` can still be useful to check the number
of CPU cores (``PROC_NR``), in case no GPUs have been assigned, and we want to
configure the execution of the kernel instances (work items) accordingly.

.. code-block:: cpp

	RTLIB_ExitCode_t MyApp::onConfigure(int8_t awm_id) {

		// Platform
		cl_uint np;
		cl_platform_id* platforms;
		cl_platform_id platform;
		clGetPlatformIDs( 0,0, &np);
		platforms = (cl_platform_id *)malloc(np * sizeof(cl_platform_id));
		clGetPlatformIDs(np, platforms, 0);
		...
		platform = platforms[0];
		
		cl_uint nd;
		cl_device_id* devices;
		cl_device_id dev;

		// Device selection
		clGetDeviceIDs(platform, CL_DEVICE_TYPE_ALL, 0, 0, &;);
		devices = (cl_device_id *)malloc(nd * sizeof(cl_device_id));
		clGetDeviceIDs(platform, CL_DEVICE_TYPE_ALL, nd, devices,0);
		...

		// Initialize buffers and kernels...

		return RTLIB_OK;
	}


Then, similarly to the other cases, the ``onRun()`` execution should be
synchronized on the termination of the kernels, processing a subset of data
from the overall input set.

Checkout the `already integrated OpenCL samples
<https://bitbucket.org/jumanix/bosp-samples/src/master/opencl/>`_  for more.