.. _app-guide: Application Development Guide ============================= This is for the developers aiming at running their application under the management of the BarbequeRTRM. In general, we distinguish between managing :ref:`generic-processes` and *Adaptive Execution Model integrated* applications, as introduced in the User Guide. .. toctree:: :maxdepth: 1 :hidden: guide-app-process .. _aem-applications: Adaptive Execution Model ------------------------ BOSP provides a library (``bbque_rtlib``) for implementing applications according to run-time managed *Adaptive Execution Model*. This execution model drives the application through a managed execution flow, characterized by: - *Resource-awareness*: the application can configure itself according to the assigned computing resources. - *Runtime performance monitoring and negotiation*: the application can observe the current thoughput and ask for more resources. From the implementation perspective, the application is asked to implement a class derived from ``BbqueEXC``, defined in ``bbque_exc.h``. Then, the typical approach is to instantiate an object of such a class in the main thread and invoke the ``Start()`` member function, triggering the execution of a control thread. This thread is responsible of the synchronizing the execution of the application with the BarbequeRTRM management actions. .. image:: img/aem.svg - **onSetup** Initialization code. Here you should perform malloc(s), variables initializations and so on. - **onConfigure** Called when the BarbequeRTRM assigns a new *Application Working Mode (AWM)*, i.e., the set of resources allocated for the application. The code to place here is related to whatever is required to reconfigure the application (number of threads, application parameters, data structures, ...) to properly run according to resources assigned through the AWM. - **onSuspend** No resources assigned. The application must be stopped. Implement whatever needed to place in a safe and consistent state. - **onRun** This is the entry point of our task. Here must be implemented the code to execute a computational run. We strongly suggest keeping the duration of the task in a few hundreds of milliseconds, in order to make the task interruptible with a "reasonable" time granularity. This would prevent the application from being killed by the BarbequeRTRM. - **onMonitor** After a computational run, the application may check whether the level of QoS/performance/accuracy is acceptable or not. In the second case, some action could be taken. - **onRelease** Optional, but recommended member function. This is expected to contain cleanup stuff (e.g. free malloc(s)). .. tip:: The setup and release methods (onSetup, onRelease) are called once during the execution of an application. The configure method (onConfigure) is called each time the application receives a new AWM; Therefore, usually, the AEM can be thought of as the following loop: onRun -> onMonitor -> onRun -> onMonitor ... Overall, the application will continuously run and monitor its execution, until a termination condition is not encountered. .. _aem-structure: Application structure ^^^^^^^^^^^^^^^^^^^^^ We provided a template, representing a basic structure of AEM-integrated application at https://github.com/HEAPLab/aem-template. .. code-block:: bash ../aem_template/ ├── build ├── CMakeLists.txt ├── include │   └── AEMTemplate_exc.h ├── LICENSE ├── README ├── recipes │   └── aem-template.recipe └── src ├── AEMTemplate_exc.cc ├── AEMTemplate_main.cc └── CMakeLists.txt As we said, the application needs to implement a class derived from ``BbqueEXC``. Following the content of the template, we have: **AEMTemplate_exc.h** .. code-block:: cpp #ifndef AEM_TEMPLATE_EXC_H_ #define AEM_TEMPLATE_EXC_H_ #include using bbque::rtlib::BbqueEXC; class AEMTemplate : public BbqueEXC { public: AEMTemplate(std::string const & name, std::string const & recipe, RTLIB_Services_t *rtlib); private: RTLIB_ExitCode_t onSetup(); RTLIB_ExitCode_t onConfigure(int8_t awm_id); RTLIB_ExitCode_t onRun(); RTLIB_ExitCode_t onMonitor(); RTLIB_ExitCode_t onSuspend(); RTLIB_ExitCode_t onRelease(); }; #endif // AEM_TEMPLATE_EXC_H_ **AEMTemplate_exc.cpp** .. code-block:: cpp #include "AEMTemplate_exc.h" #include using namespace std; AEMTemplate::AEMTemplate(std::string const & name, std::string const & recipe, RTLIB_Services_t *rtlib) : BbqueEXC(name, recipe, rtlib, RTLIB_LANG_CPP) { cout << "New AEMTemplate::AEMTemplate() UID=" << GetUniqueID() << endl; } RTLIB_ExitCode_t AEMTemplate::onSetup() { cout << "AEMTemplate::onSetup()" << endl; return RTLIB_OK; } RTLIB_ExitCode_t AEMTemplate::onConfigure(int8_t awm_id) { cout << "AEMTemplate::onConfigure(): proc_nr= " << proc_nr << endl; return RTLIB_OK; } RTLIB_ExitCode_t AEMTemplate::onRun() { RTLIB_WorkingModeParams_t const wmp = WorkingModeParams(); // Example: return after 5 cycles if (Cycles() >= 5) return RTLIB_EXC_WORKLOAD_NONE; cout << "AEMTemplate::onRun(): Hello AEM! cycle="<< Cycles() << endl; return RTLIB_OK; } RTLIB_ExitCode_t AEMTemplate::onMonitor() { cout << "AEMTemplate::onMonitor(): CPS=" << GetCPS() << endl; return RTLIB_OK; } RTLIB_ExitCode_t AEMTemplate::onSuspend() { cout << "AEMTemplate::onMonitor()" << GetCPS() << endl; return RTLIB_OK; } RTLIB_ExitCode_t AEMTemplate::onRelease() { cout << "AEMTemplate::onRelease()" << endl; return RTLIB_OK; } The main file will typically have a structure similar to the provided example: **AEMTemplate_main.cc** .. code-block:: cpp #include #include #include #include "AEMTemplate_exc.h" using namespace std; int main(int argc, char *argv[]) { // Initialize RTLIb RTLIB_Services_t *rtlib; auto ret = RTLIB_Init(basename(argv[0]), &rtlib); if (ret != RTLIB_OK) { cerr << "ERROR: Did you start the BarbequeRTRM daemon? "<< endl; return RTLIB_ERROR; } assert(rtlib); // Instatiate the derived class std::string recipe("aem-template"); auto pexc = std::make_shared("AEMTemplate", recipe, rtlib); if (!pexc->isRegistered()) { cerr << "ERROR: Register failed (missing the recipe file?)" << endl; return RTLIB_ERROR; } // Start the control thread (the managed application will wait // for the resource assignment) pexc->Start(); // Wait for the terminated of the managed application pexc->WaitCompletion(); return EXIT_SUCCESS; } The sub-directory **samples** includes a first set of AEM-integrated samples applications. If the user aims at developing an additional sample, to include into the BOSP, the alternative option is to use the BOSPShell command ``bbque-layapp``. This launches a script though which a new application template is created under **samples** and, therefore, added to the overall BOSP configuration and building system. Compilation ^^^^^^^^^^^ The templates previously mentioned already comes with suitable *CMake* files for properly building the application. However, in the case the application developer needs to proceed manually, he needs to known that the necessary header files and libraries are located under the BOSP installation path as it follows: .. code-block:: bash $ tree -L 2 out/usr/ ├── bin ... ├── include │   └── bbque │   ├── bbque_exc.h │   ├── config.h ... │   ├── rtlib.h ... ├── lib │   └── bbque │   ├── bindings │   ├── libbbque_rtlib.so ... Therefore, a GCC based compilation line should look like the following: .. code-block:: bash $ g++ -o -I /usr/include \ -L /usr/lib/bbque -L /usr/lib/ \ -lbbque_rtlib .. _aem-recipe: The *Recipe* file ^^^^^^^^^^^^^^^^^ An AEM-integrated application is asked to provide a *recipe* file. This is an XML file providing some general information, plus a set of profiled *Application Working Modes (AWMs)* that the BarbequeRTRM could take into account or not, depending on the specific resource allocation policy. The recipe must meet some requirements: - The file name must terminate with the ``.recipe`` extension - The file must be installed under ``/etc/bbque/recipes`` - The AWM IDs must be sequentially numbered, starting from 0. - At least one ```` section with the ''id'' matching the target platform must be provided. .. tip:: As a convention, the higher is the AWM ID number, the greater is the amount of resource requirements specified. .. warning:: When the application instantiates the EXC object, it provides the recipe name as an argument of the object constructor. The BarbequeRTRM checks the availability and the validity of the recipe. This means that the recipe could be one of reasons for a failed application launch. Example: .. code-block:: xml Here below the complete set of tags and attributes is listed. Some tags or attributes can be optional. Concerning the hierarchy of the different XML elements, please consider the example provided above. - ``BarbequeRTRM`` The root tag, including general attributes. - ``recipe_version`` Since the format of the recipes can change in future versions of the framework, a first validation step requires to specify the reference version of the recipe. - ``application`` Application/EXC properties. - ``name`` [optional] A descriptive name of the application/EXC. - ``priority`` Static priority assigned. Generally, this is taken into account at run-time by the resource allocation policy. It is mandatory to provide a value between 0 and N, where value 0 denotes the highest priority level (critical application). The lowest possible level N can be specified in the configuration of the BOSP building. - ``platform`` The target system. The recipe can contain more than one platform section. - ``id`` The string identifier of the platform. The recipe must contain at least the platform section with the id matching the one of system platform (see the next section below). - ``hw``: [optional] This specifies the target platform from the point of view of the hardware (e.g., an SoC). The current BarbequeRTRM version supports the following hardware identifiers: "exynos_5410", "exynos_5420", "omap_4470". - ``awms`` The section listing the set of AWMs. - ``awm`` Definition of a single AWM. A valid recipe must define at least one AWM. - ``id`` Each AWM is identified by a number. It is strictly mandatory that the numeration starts from 0 and continue in a sequence of integer values. - ``name`` A descriptive name for the AWM (e.g., "high-quality","mid-quality", "low-quality"). - ``value`` A preference value associated to the AWM. If the *value* is an expression of a performance level, in most cases the highest is the *value* the greater is the requirement of resources. In the example provided, a direct proportionality has been applied between the number of resources and the *value* associated. - ``config-time`` [optional] [optional] Specify the time spent to configure the application in the given Application Working Mode. - ``resources`` The section listing the resource requirements of the AWM. All the children tags nested into this section are considered resource names. The hierarchy of the nesting and the ID specified are used to build the "resource path", i.e. a namespace-style string identifying the specific resource. For instance, in the example, the recipe would produce the following resource paths: ``cpu.pe`` and ``cpu.mem``. - ``sys`` Used to group resources into a two-level hierarchical partitioning. Specifically, ``sys`` references to a system, though as a single working machine or board. This is an optional tag if the target platform is not a distributed computational system. In other words, on a common desktop or embedded board we can avoid to specify it. - ``cpu`` General-purpose processor, usually featuring a set of multiple cores sharing one or more cache memory levels. - ``mem`` The amount of memory required. Please consider the hierarchical position to reference the correct level of memory. - ``pe`` The number of processing requirements in terms of CPU time quota (percentage). For instance, in AWM 2 the recipe requires a 200%, meaning that we need a full usage of 2 CPU cores. - ``units`` A qualifier for the attribute ``qty``. Values actually supported are *%* for the processors, *KB*, *MB*, *GB* for the memories. - ``qty`` The amount required. The platform identification string """""""""""""""""""""""""""""""""" The attribute ```` specifies the target hardware configuration for which the section of the application is intended. The value of the attribute is a string, which hierarchically identifies, the *host system* and the *acceleration platforms* (see [[docs:bosp:config|Configure BOSP]]). Focusing on the host system part only, according to the current version of the BarbequeRTRM, we can specify one of the following options: - *bq.linux*: Linux-based system with cgroup support - *bq.android*: Android-based system - *bq.test*: Host is emulated If the application includes kernels to accelerate, probably the target hardware must include GPUs or HW accelerators. In such a case, we may want to specify that the given ```` section is intended for a BarbequeRTRM configuration built with the OpenCL or the nVIDIA support. For example: - *bq.linux.opencl* - *bq.linux.nvidia* One interesting aspect to consider is that the value of the attribute can include a regular expression pattern. Therefore, the following options are perfectly acceptable: - *bq.\*.opencl* : whatever is the host is fine, just match the OpenCL acceleration - *bq.linux.\** : Linux-based host with whatever is the acceleration platform - *bq.\** : whatever combination host system + acceleration platform is fine Resource awareness ^^^^^^^^^^^^^^^^^^ The **onConfigure()** execution is typically the right moment for checking the resources assigned by the BarbequeRTRM. Accordingly, the application can determine the right number of threads to spawn or set any other parameter that can be considered "resource-sensitive. To this aim, the RTLib provides the function ''GetAssignedResources()''. In the following example, we show a possible usage of the function. In particular, we imagine the application setting the number of threads equal to the number of CPU cores assigned by the resource manager. .. code-block:: cpp RTLIB_ExitCode_t MyApp::onConfigure(int8_t awm_id) { int cpu_quota, nr_cpu_cores, nr_gpus, mem; GetAssignedResources(PROC_ELEMENT, cpu_quota); GetAssignedResources(PROC_NR, nr_cpu_cores); nr_threads = nr_cpu_cores; ... GetAssignedResources(MEMORY, mem); ... GetAssignedResources(GPU, nr_gpus); if (nr_gpus > 0) { // Cool... we have some GPUs! ... } return RTLIB_OK; } Run-time monitoring and negotiation ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The execution model allows the application to monitor its own performance (throughput) at run-time. The throughput is measured as the number of processing cycles (``onRun()/onMonitor()`` executions) per seconds (CPS). The RTLib provides the ``GetCPS()`` function to return the cycles-per-second mean value, computed through and exponential mean over the last cycles. The ``onMonitor()`` function is, in this case, a reasonable point at which placing the function call. .. code-block:: cpp RTLIB_ExitCode_t MyApp::onMonitor() { float curr_cps = GetCPS(); ... return RTLIB_OK; } The performance monitoring can lead the application to react by adopting two possible approaches: - To reconfigure itself, for example, by tuning the amount of data to process during each ``onRun()`` execution - Make the resource manager aware of the current performance goal, such that the resource assignment could follow the application-specific requests. The second option is implemented through the usage of the ``SetCPSGoal(cps_min, cps_max)`` function. Through this function, the application can specify a range of CPS that is considered the current performance goal. It is worth noticing that setting a performance goal has a two-fold objective: on one side, we may want to push the resource manager in order to allow the application to run as fast as it can; on the other side, the application can find itself in some scenarios for which all it needs is the minimum amount of resources. This can be considered a possible approach to explicitly contribute to reducing the power consumption of the system, which can be the case of mobile devices. In general, the ``SetCPSGoal`` calls can be reasonably placed in the ``onSetup()`` body, in order to set an initial goal, and in the ``onMonitor()`` to redefine this goal according to application-specific conditions. What is worth remarking, about this function, is the fact that it enables a performance monitoring and resource assignment negotiation process, automatically driven by the RTLib threads, without requiring further additions of code lines in the application. Example: .. code-block:: cpp RTLIB_ExitCode_t MyApp::onSetup() { SetCPSGoal(2.5, 3.5); ... return RTLIB_OK; } ... RTLIB_ExitCode_t MyApp::onMonitor() { if (low_power_mode) { SetCPSGoal(0.75, 1.25); } ... return RTLIB_OK; } .. include:: guide-app-prof.rst Programming languages ^^^^^^^^^^^^^^^^^^^^^ The Adaptive Execution Model provides also wrapper solutions for the following languages/environments. .. toctree:: :maxdepth: 1 guide-app-python guide-app-android Heterogeneous programming ^^^^^^^^^^^^^^^^^^^^^^^^^ For heterogeneous programming applications mixed in the Adaptive Execution Model, we can proceed as it follows. First, we need to specify, as the fourth argument of the superclass constructor (``BbqueEXC``), the programming library or the extension used to this aim. This will help the resource manager with the selection of the correct set of resources, especially in case of systems featuring a mix of runtimes and related devices. The currently supported options are the following: - *RTLIB_LANG_TASKGRAPH*: For applications using programming libraries based on task-graph constructs (e.g., the **libmango**) - *RTLIB_LANG_CUDA*: For applications coming with CUDA kernels targeting NVIDIA GPU devices. - *RTLIB_LANG_OPENCL*: For applications using the OpenCL function calls for getting access to the computing devices, setup memory buffers, perform data transfers and offloading kernels onto heterogeneous devices, including GPUs and accelerators. This is the example of object construction for the OpenCL case: .. code-block:: cpp class MyApp : public BbqueEXC { public: MyApp(std::string const & name, std::string const & recipe, RTLIB_Services_t *rtlib): BbqueEXC(name, recipe, rtlib, RTLIB_LANG_OPENCL) { } ... }; The OpenCL case """"""""""""""" Most of the setup code, from the platform and device selection to the buffers and kernels setup must be placed in the ``onConfigure()`` implementation. This allows the application to adapt itself to possible device assignments changes, performed by the resource manager. The values returned by the ``clGetPlatformIDs`` and the ``clGetDeviceIDs`` are actually set by the resource manager and may include a subset of the devices installed in the system. As of the OpenCL approach, this change of assigned devices requires to redo a sequence of initialization steps which are platform and device-dependent. Please note that, in this case, you don't need to include the ``GetAssignedResources()`` function calls to determine the type of assigned devices, since this comes with the array of devices returned by the ``clGetDeviceIDs`` function. However, the ``GetAssignedResources()`` can still be useful to check the number of CPU cores (``PROC_NR``), in case no GPUs have been assigned, and we want to configure the execution of the kernel instances (work items) accordingly. .. code-block:: cpp RTLIB_ExitCode_t MyApp::onConfigure(int8_t awm_id) { // Platform cl_uint np; cl_platform_id* platforms; cl_platform_id platform; clGetPlatformIDs( 0,0, &np); platforms = (cl_platform_id *)malloc(np * sizeof(cl_platform_id)); clGetPlatformIDs(np, platforms, 0); ... platform = platforms[0]; cl_uint nd; cl_device_id* devices; cl_device_id dev; // Device selection clGetDeviceIDs(platform, CL_DEVICE_TYPE_ALL, 0, 0, &;); devices = (cl_device_id *)malloc(nd * sizeof(cl_device_id)); clGetDeviceIDs(platform, CL_DEVICE_TYPE_ALL, nd, devices,0); ... // Initialize buffers and kernels... return RTLIB_OK; } Then, similarly to the other cases, the ``onRun()`` execution should be synchronized on the termination of the kernels, processing a subset of data from the overall input set. Checkout the `already integrated OpenCL samples `_ for more.