Runtime Adaptive Hardware/Software Execution in Complex Heterogeneous Systems

<< Volver atrás

Tesis:

Runtime Adaptive Hardware/Software Execution in Complex Heterogeneous Systems

Autor: SURIANO, Leonardo

Título: Runtime Adaptive Hardware/Software Execution in Complex Heterogeneous Systems

Fecha: 2020

Materia: Sin materia definida

Escuela: E.T.S. DE INGENIEROS INDUSTRIALES

Departamentos: AUTOMATICA, INGENIERIA ELECTRICA Y ELECTRONICA E INFORMATICA INDUSTRIAL

Acceso electrónico: http://oa.upm.es/66169/

Director/a 1º: TORRE ARNANZ, Eduardo de la

Resumen: Hoy en día, es indiscutible que la sociedad está en la era del Internet of Things (IoT) y la Industria 4.0. Todos se benefician del uso de dispositivos electrónicos (es decir, teléfonos móviles, relojes inteligentes, cámaras de videovigilancia inteligentes, etc.). Las crecientes necesidades de las personas están impulsando el desarrollo de dispositivos electrónicos hasta un punto que era inimaginable hace años cuando, en 1970, apareció el primer microprocesador. La tendencia es clara: se requieren dispositivos electrónicos más potentes, ya que las necesidades son cada vez más exigentes (como por ejemplo las de comunicación, monitorarización, etc.). La nueva generación de sistemas informáticos integrados debería ser portátil y ofrecer la mayor capacidad de cálculo y comunicación utilizando la menor cantidad de energía posible. Gracias al análisis de mercado de la nueva generación de dispositivos electrónicos (objeto de estudio en el Capítulo 1), es posible comprobar que en la actualidad, y en contraposición a lo que ocurría hace algunos años, hay dispositivos con una alta capacidad computacional siendo al mismo tiempo de reducido tamaño y de bajo consumo energético. Tradicionalmente, este objetivo se conseguía aumentando el número de transistores y la frecuencia de los circuitos digitales. Sin embargo, durante los últimos 20 años, el mismo propósito se logra juntando en el mismo dispositivo más elementos de procesamiento heterogéneos (es decir, que cada componente está optimizado para ciertas funcionalidades y ofrece prestaciones distintas a las de los demás). Por ello, los Multi-Processor Systems-on-Chip (MPSoCs) están ganado importancia en el mercado, ya que combinan heterogeneamente procesamiento sofware con aceleración hardware programable, que es el contexto en el que se desarrolla esta tesis. La tendencia muestra una complejidad creciente del hardware. Al mismo tiempo, una aplicación que se ejecute en cualquiera de estas nuevas plataformas debe poder aprovechar las oportunidades que el hardware ofrece. Sin embargo, el uso de estos MPSoCs heterogéneos tiene como contrapartida una productividad en el diseño reducida, generalmente debido a la falta de métodos y herramientas de diseño conjunto de hardware/software que exploten el paralelismo de manera eficiente. Por otro lado, hay que destacar que un dispositivo embebido suele ser parte de un sistema más grande, generalmente definido como Sistema Ciberfísico. Comosu propio nombre indica, coexiste una parte cibernética (para propósitos computacionales) directamente conectada al mundo físico mediante sensores y actuadores. En la sección 1.1.2, donde se analizarán las principales características de estos sistemas complejos, se resaltará que la autoadaptación es una propiedad requerida siempre que el dinamismo en tiempo de ejecución sea necesario para reaccionar a estímulos externos cambiantes (por ejemplo, para hacer frente a nuevas situaciones ambientales adversas detectadas). La función de autoadaptación en un sistema ciberfísico debe garantizar la capacidad de ajustar su propia estructura y comportamiento en tiempo de ejecución. Por lo tanto, la adaptación puede afectar profundamente tanto a la aplicación (es decir, el software) como a la infraestructura del hardware. Esto motivará la propuesta de esta tesis e impulsará el desarrollo de un método que de la posibilidad de diseñar sistemas autoadaptativos para dispositivos heterogéneos complejos de manera eficiente, incluyendo la reconfiguración del hardware. Si bien este es el fin y objetivo último de la tesis, para conseguirlo habrá que realizar otras muchas tareas, todas ellas contempladas en la Sección 1.3. Un sistema electrónico moderno es siempre una simbiosis de hardware hábilmente orquestada por el software. Como tal, ambos deben considerarse en conjunto desde la primera fase del diseño. En el capítulo 2 se analizará el estado del arte de tres aspectos cruciales de la tesis: los modelos de computación, las técnicas de creación de prototipos para el co-diseño de hardware/software y las arquitecturas modernas de hardware heterogéneas. Normalmente, los flujos de diseño tradicionales se basan en un paralelismo explícito definido por el usuario en el código de la aplicación (lenguajes imperativos), en lugar de apoyarse en modelos de computación alternativos donde el paralelismo está inherentemente presente. En la Sección 2.1, se proporcionará la definición de modelos de computación y se discutirán en profundidad sus características. Después de un debate documentado sobre la literatura de los modelos de flujo de datos, o dataflow, se elegirá un modelo de computación sobre la base de tres características esenciales: expresividad, analizabilidad y reconfiguración en tiempo de ejecución. De hecho, la reconfiguración es una de las palabras clave más importantes en el contexto de la autoadaptación: representa la posibilidad de cambiar y reorganizar dinámicamente tanto el software como el hardware para cumplir con las nuevas necesidades. Posteriormente, en la Sección 2.2 se estudiarán en profundidad los principales métodos, técnicas y herramientas para la creación rápida de prototipos. El objetivo será destacar las principales características que deben cumplir las propuestas de esta tesis. En la última Sección de este capítulo, se describirán las ventajas e inconvenientes de las diferentes plataformas de hardware en el mercado. Una de las principales características para elegir la arquitectura será su flexibilidad, ya que asegurará la capacidad de reconfiguración del hardware. En esta sección se trata de demostar el importante papel de la reconfiguración dinámica y parcial para conseguir el objetivo de la tesis. Una Field Programmable Gate Array (FPGA) es una arquitectura reconfigurable que garantiza un equilibrio entre rendimiento y flexibilidad. Ofrecen la posibilidad de crear aceleradores personalizados para fines de cálculo específicos. En la Sección 3.1 del Capítulo 3, se revisarán las técnicas y herramientas de diseño para la creación de aceleradores hardware. Para descargar el cálculo de Unidades Centrales de Procesamiento (en inglés, CPUs) a los aceleradores en FPGA, el sistema operativo de la plataforma debería poder administrar nuevos dispositivos de hardware personalizados (cuando se proporcionen). Por esta razón, también se discutirán la abstracción de hardware y los servicios del sistema operativo. Finalmente, se examinarán las posibilidades que ofrece el flujo de trabajo de Software-Defined System-On- Chip (SDSoC) (desarrollado por Xilinx). SDSoC es un entorno de Desarrollo Integrado que integra, en un solo flujo, la creación del sistema hardware y del sistema operativo. Se resaltarán las ventajas e inconvenientes para justificar su uso en la propuesta principal del Capítulo. En la sección 3.2 se examinará la propuesta de integrar en un solo flujo el uso de SDSoC y el modelo de computación de dataflow. El enfoque tiene como objetivo ofrecer un instrumento válido para acelerar el proceso de diseño de aplicaciones multiproceso que hacen uso de múltiples aceleradores de hardware. El plan implica el uso del ya mencionado SDSoC y la herramienta académica PREESM (desarrollada en INSA Rennes). El método se comentará paso a paso y se analizarán todos los desafíos abordados. Específicamente, PREESMes un programa para la creación rápida de prototipos que implementa aplicaciones de software a partir de una representación de alto nivel de las arquitecturas y una representación de aplicaciones basada en el flujo de datos. Gracias a sus transformaciones de diagramas dataflow, te da la posibilidad de generar código ya mapeado y ordenado temporalmente para la plataforma de destino. La propuesta dará la posibilidad de extender el uso de PREESM para la creación de sistemas heterogéneos multi-hardware y multi-hilo. Además, el flujo de trabajo permite la exploración del espacio de diseño de hardware/software sin necesidad de redefinir la nueva distribución de datos entre los elementos de procesamiento de la arquitectura. También se adoptará el manager en tiempo real basado en dataflow llamado SPiDER y desarrollado en INSA Rennes. SPiDER permite variar dinámicamente y en tiempo de ejecución los parámetros que influyen en el paralelismo de la aplicación. Todo el flujo de la exploración del espacio de diseño y, además, SPiDER, se probarán en una aplicación de procesamiento de imágenes (Sección 3.3). Se discutirán tanto los detalles matemáticos del algoritmo, como la estrategia de paralelización aplicada al caso de uso. Después del diseño del acelerador de hardware, se aplicará el método propuesto y cada paso se vuelve a examinar en la aplicación real. Las mejoras de los resultados se compararán con las aplicaciones del estado del arte. En la Sección 3.4, el método también se aplicará para realizar una exploración del espacio de diseño de varias soluciones de hardware/software para una nueva versión acelerada por hardware del videojuego 3D DOOM. Para hacer posible la ejecución del videojuego acelerado por hardware, también se desarrollará un sistema operativo personalizado basado en Linux, dado que los servicios básicos que ofrece el sistema operativo generado automáticamente por SDSoC no cubre todas las necesidades de esta compleja aplicación. Finalmente, la exploración del espacio de diseño destacará las contraposiciones entre tiempo de ejecución y consumo de energía. En la conclusión del capítulo 3, se describirán los beneficios y limitaciones del método propuesto. Las limitaciones discutidas, de hecho, sentarán las bases para otras propuestas presentadas en el Capítulo 4. En primer lugar, se observará que la arquitectura y las capas de software creadas automáticamente por SDSoC deben usarse como una caja negra, limitando así las acciones del diseñador. Después, se destacará que la reconfiguración dinámica y parcial no es compatible directamente con SDSoC, evitando así la posibilidad de cambiar la estructura de la arquitectura en tiempo de ejecución. Estas limitaciones impulsarán la adopción de una nueva infraestructura de arquitectura. En la Sección 4.1, se analizará la arquitectura de procesamiento reconfigurable en tiempo de ejecución llamada ARTICo3 (desarrollada en CEI-UPM). La flexibilidad de su infraestructura de hardware es la consecuencia natural de la reconfiguración dinámica y parcial, que permite lamultiplexación por división temporal de los recursos lógicos. El uso de la arquitectura se facilita gracias a las herramientas automatizadas (que ayudan al diseñador a construir todo el sistema basado en FPGA). Con la inclusión de una arquitectura reconfigurable, el flujo de trabajo de PREESM se volverá a discutir en la Sección 4.2. Por un lado, la descripción de alto nivel de la arquitectura (llamada S-LAM) permitirá la especificación de “slots reconfigurables”. Por otro lado, se propone el mapeo de los actores del dataflow dentro de un acelerador de hardware reconfigurable y se analizan sus implicaciones. También se modificará el generador de código de PREESM para permitir la correcta gestión de los aceleradores ARTICo3 y la creación de un subproceso de software especial que delega y despacha tareas de hardware a los slots de la arquitectura ARTICo3. Finalmente, se discutirán los detalles sobre cómo administrar la configuración dinámica y parcial de y los elementos de procesamiento hardware en tiempo de ejecución. El objetivo se logrará combinando Synchronous Parameterized and Interfaced Dataflow Embedded Runtime (SPiDER) y las funciones básicas de ARTICo3. Esta última propuesta asegura la reconfiguración tanto de software como de hardware de todo el sistema en tiempo de ejecución. Sin embargo, para que un sistema sea autoadaptable, también debe garantizarse la “autoconciencia” (self-awareness). En la Sección 4.3, se discutirán las motivaciones para la propuesta de un método unificado de monitorización de hardware y software. Se describirá el importante papel de la biblioteca de monitorización estándar Performance Application Programming Interface (PAPI). Su integración con PAPIFY (desarrollado en CITSEM-UPM) y PREESM sentará las bases para adoptar esta infraestructura de software de múltiples capas como un instrumento de monitorización en tiempo de ejecución para arquitecturas reconfigurables. Para que esta integración sea posible, se justifica la modificación del entorno de ejecución de ARTICo3 y la creación de un componente PAPI reconfigurable específico para la arquitectura ARTICo3 (inspirado en las estrategias de monitorización del software PAPIFY). Toda la infraestructura de monitorización garantizará la “autoconciencia” del sistema integrado diseñado. Como prueba de concepto para el método propuesto para diseñar sistemas reconfigurables de hardware y software adaptables en tiempo de ejecución, se utilizará una versión paralela del algoritmo para multiplicación de matrices. Después de la presentación de conceptos intuitivos en la base del Divide and Conquer Algorithm, se diseña y propone la versión dataflow de la multiplicación de matrices. En los resultados experimentales dentro de la Sección 4.4, una exploración del espacio de diseño se realizará actuando solo sobre los parámetros de la aplicación, demostrando la solidez y consistencia del método. Como caso de uso para todas las propuestas de la tesis, el Capítulo 5 estará íntegramente dedicado al estudio de un problema antiguo pero aún activo: la cinemática inversa de un manipulador de brazo robótico, analizado desde una novedosa perspectiva y utilizando los nuevos instrumentos de diseño presentados a lo largo de la tesis. Para justificar el enfoque novedoso del problema, se observará que, para aprovechar al máximo las oportunidades de las nuevas tecnologías, también deben revisarse los algoritmos tradicionales. Como tal, el “solver” se formulará como un problema de optimización, en el que se propondrán dos niveles de paralelismo algorítmico: por una parte se modificará el método Nelder-Mead utilizado como motor de optimización para permitir la evaluación de la función de coste en múltiples vértices simultáneamente, y por otra la trayectoria se dividirá en segmentos en los que todos los puntos se resolverán simultáneamente. El paralelismo algorítmico también estará respaldado por un número variable de aceleradores hardware, los cuales aceleran el cálculo de las ecuaciones de la cinemática directa del robot necesarias durante la resolución de la cinemática inversa. Los resultados experimentales (Sección 5.7) mostrarán cómo un número variable de aceleradores de hardware reconfigurables dinámicamente, combinados con la capacidad de reconfiguración de los parámetros de la aplicación, proporcionarán escalabilidad en tiempo de ejecución en términos de precisión de la trayectoria, recursos lógicos, confiabilidad y tiempo de ejecución. Para comprobar las características de “autoadaptabilidad” que brinda el sistema diseñado, se describirá un manager básico para todo el sistema autoadaptativo en la Sección 5.8. Se implementará simulando la entradas del mundo exterior utilizando las conexiones de la placa de desarrollo utilizada. El último Capítulo de la tesis resumirá, brevemente, todo el camino seguido para desplegar el trabajo de tesis y las principales aportaciones. También analizará el impacto de la tesis a través de publicaciones en revistas, congresos y otros canales de difusión. Los resultados más significativos de la tesis estarán también disponibles en repositorios open-source para dar la posibilidad de reproducir los resultados e incluso mejorarlos mediante otras investigaciones académicas. La tesis finaliza con unas futuras líneas de investigación cuya intención es inspirar e impulsar el desarrollo de futuros sistemas heterogéneos autoadaptables y autónomos. ----------ABSTRACT---------- Nowadays, it is indisputable that society is in the era of the IoT and Industry 4.0. Everyone’s life takes advantage of the use of electronic devices (i.e.,mobile phones, smart-watches, intelligent video surveillance cameras, et cetera). People’s growing needs are pushing the development of electronic devices to the point that was unimaginable years ago when, in 1970, the first Microprocessor appeared. The tendency is clear: to have as much portable electronic power as we can always with us (communication, sensors et cetera). The new generation of embedded computer systems should be portable, wearable, and offer the highest computing power using the lesser energy possible. Thanks to the market analysis of the new generation of electronic platforms (that will be reported in Chapter 1), it will be possible to note that a more significant computational capability in smaller and less power-hungry devices is nowadays achievable. Traditionally, the goal was pursued by increasing the number of transistors and the frequency of digital circuits. However, during the last 20 years, the same objective is attained by embedding, on the same chip, more heterogeneous Processing Elements (PEs). For this reason, MPSoCs that combine SW processing cores with programmable hardware acceleration are currently gaining market share in the embedded device domains, which is the context of this thesis. The trend delineates a growing complexity of the hardware. At the same time, an application running on any of these new platforms must be able to exploit the hardware capabilities offered. Therefore, the use of these heterogeneous MPSoCs comes at the price of reduced productivity, usually imposed by the lack of efficient hardware/software co-design methods and tools that exploit parallelism efficiently. On the other side, it must be remarked that an embedded device is usually part of a bigger system, generally defined as Cyber-Physical System to remark the coexistence of a cyber-part (for computational purposes) directly and strictly connected to the physical-world by meaning of sensors and actuators. In Section 1.1.2, where the main characteristics of these complex systems will be analyzed, it will be highlighted that the self-adaptation is a property required whenever run-time dynamism is necessary for reacting to changing external stimulus (for instance, to face new detected adverse environment situations). The self-adaptation feature in a Cyber-Physical System must ensure the capability of adjusting its own structure and behavior at run-time. Thus, the adaptation can profoundly affect the application (i.e., the software) and the hardware infrastructure. This will motivate the proposal of this thesis and push the development of a method that gives the possibility to design self-adaptive systems for complex heterogeneous devices efficiently, including hardware reconfiguration. The main task of the thesis will have several implications that define the Ph.D. objective goals in Section 1.3. A modern electronic system is always an extraordinary symbiosis of hardware shrewdly orchestrated by the software. As such, both must be considered together already from the very first phase of the design. Chapter 2 will analyze the state-of-the-art of three crucial aspects of the thesis: the Models of Computation (MoCs), the prototyping techniques for hardware/software co-design, and modern heterogeneous hardware architectures. Traditional design flows often rely on explicit user-defined parallelism in the application code (Imperative Languages), instead of relying on alternative MoCs where parallelism is inherently present. New programming paradigms raise the level of abstraction and make parallelism explicit. In Section 2.1, MoCs will be formally defined and their features deeply discussed. After a documented debate on dataflow literature, a MoC will be chosen for its expressiveness and analyzability associated with a crucial thesis aspect: its runtime reconfiguration capabilities. In fact, reconfiguration is one of the most important key-words in the context of self-adaptation: it is the possibility of dynamically changing and rearranging the software as well as the hardware to fulfill new requirements. In Section 2.2, a literature review of the main methods, techniques, and tools for rapid prototyping will be reported. The aim will be to highlight the main features and characteristics that these thesis’s proposals should achieve. In the last Section 2.3 of the state-of-the-art Chapter, the benefits and drawbacks of the possible hardware platforms on the market will be depicted. The flexibility to ensure the hardware reconfiguration capability for the designed system will deeply influence the choice of the architecture. Specifically, the benefits of theDynamic Partial Reconfiguration (DPR) available on the modern FPGAs are shown. The aim is to remark the reason for the important role of DPR within the thesis proposals. An FPGA is a reconfigurable architecture that guarantees a trade-off among performance and flexibility. They offer the possibility of creating custom accelerators specialized for specific computation purposes. In Section 3.1 of Chapter 3, the techniques and design tools for the creation of hardware accelerators will be reviewed. Among these techniques, the High-Level Synthesis (HLS)workflowallows a designer to start froma hardware description based on high-level languages (such as C/C++) instead of relying on the traditional Hardware Description Language (HDL)-based flow. In order to offload computation fromCPUs to accelerators on the FPGA, the Operating System (OS) of the platformshould be able of managing new custom hardware devices (when provided). For this reason, the hardware abstraction and the OS services will also be discussed. Finally, the possibilities offered by the Software-Defined System-On-Chip (SDSoC) workflow(developed by Xilinx) will be examined. SDSoC is an Integrated Development Environment (IDE) that integrates, in a single flow, the creation of the hardware system and of the OS with services to handle the accelerators properly. Benefits and drawbacks will be highlighted to justify its use in the main proposal of the Chapter. In Section 3.2, the proposal of integrating in a single flow the use of SDSoC and the dataflow MoC will be examined. The approach aims at offering a valid instrument to speed up the process of designing multithreaded applications that make use of multiple hardware accelerators. The idea involves the use of the already-mentioned SDSoC and the academic tool PREESM (developed at INSA Rennes). The method will be commented step by step, and every single challenge addressed analyzed. Specifically, PREESM is a rapid prototyping framework that deploys software applications starting from a high-level representation of architectures and a dataflow-based representation of applications. Thanks to its internal graph transformations and algorithms, it deploys the entire system generating a mapped and scheduled code for the target platform. The proposal will give the possibility of extending the use of PREESM for creating multi-hardware and multi-threaded heterogeneous systems. Additionally, the workflow allows Design Space Exploration (DSE) of different hardware/software design possibilities with no need of re-thinking and re-defining new data repartition among the PEs of the architecture. Also, the run-time manager of dataflow-based application called SPiDER (also developed at INSA Rennes) will be adopted to vary, dynamically at run-time, the parameters of the application that influence and modify the data-level parallelism of the dataflow applications. The entire DSE-flow and the run-time manager adopted will be tested on an image processing application (Section 3.3). The mathematical details of the algorithm are going to be discussed as well as the parallelization strategy applied to the use-case. After the design of the ad-hoc hardware accelerator, the method is applied, and every proposed step is re-examined on the real application. The result improvements will then be compared with the stateof- the-art performance of the hardware-accelerated-based application. In Section 3.4, the method will also be applied to perform a DSE of several hardware/software solutions for a new hardware-accelerated version of the 3D video game DOOM. To make possible the execution of the video game accelerated by hardware, a custom Linux-based OS will also be developed, since the basic services offered by the OS automatically generated by SDSoC does not cover all the needs of this complex application. Finally, the performed DSE will highlight the trade-off design choices among execution-time, power requirements, and energy consumption. Additionally, it will be observed that the cache misses caused by the data-starvation of several accelerators working in parallel could affect the overall performance of the entire system. In the conclusion of Chapter 3, the benefits and limitations of the proposed method are reported. The discussed limitations will, in fact, lay the foundation for further proposals presented in Chapter 4. Firstly, the hardware architecture and the software layers automatically created by SDSoC should be used as a black-box, thus limiting the designer’s hardware/software actions. Then, DPR is not directly supported by SDSoC, thus preventing the possibility of changing the structure of the architecture at run-time. These limitations will push the adoption of a new architecture infrastructure. In Section 4.1, the open-source run-time reconfigurable processing architecture ARTICo3 (developed at CEI-UPM) will be analyzed. The flexibility of its hardware infrastructure is the natural consequence of the DPR, which allows time-divisionmultiplexing of the logic resources. The architecture usage is made easy by the automated toolchain (which helps the designer to build the entire FPGA-based system), and by a run-time execution environment (that transparently manages the reconfigurable accelerators). With the inclusion of a reconfigurable architecture, the PREESM workflow will be re-discussed in Section 4.2. On the one hand, the high-level description of the architecture (namely S-LAM) will allow the specification of reconfigurable slots. On the other hand, the mapping of dataflow actors within a custom reconfigurable hardware accelerator is proposed, and its implications are analyzed. The code-generator of PREESM will also be modified in order to allow the correct management of the ARTICo3 accelerators and the creation of a special software thread that delegates and dispatches hardware tasks to the slots of the ARTICo3 architecture. Finally, the details on how to manage DPR and hardware PEs at run-time will be discussed. The goal will be achieved by combining SPiDER and the run-time Application Programming Interfaces (APIs) collection of ARTICo3. This last proposal ensures software and hardware reconfiguration of the whole system at run-time. However, for a system to be self-adaptable, self-awareness must also be guaranteed. In Section 4.3, the motivations for a unified hardware and software monitoring method will be discussed. The important role of the standard monitoring library PAPI will be depicted. Its integration with PAPIFY (developed at CITSEM-UPM) and PREESM will lay the foundation for adopting this multi-layered software infrastructure as a run-time monitoring instrument for reconfigurable architectures. In order to make possible this integration, the modification to the ARTICo3 run-time execution environment and the creation of a reconfigurable PAPIcomponent specific to the ARTICo3 architecture (inspired by PAPIFY software monitoring strategies) are reported and justified. The entire monitoring infrastructure will so ensure self-awareness of the designed embedded system. As a proof of concept for the newly proposed method for designing run-time adaptive hardware- and software-reconfigurable systems, a parallel version of the algorithm for matrix-multiplication is used. After the presentation of intuitive concepts at the base of the Divide and Conquer Algorithm, the dataflow version of matrix-multiplication is designed and proposed. In the experimental results within Section 4.4, DSE is performed by acting only on the parameters of the application, proving the strength and consistency of the method. As a use-case for the proposals of the entire thesis, Chapter 5 will be entirely dedicated to the study of an old but still active problem: the Inverse Kinematics (IK) of a robotic arm manipulator,attacked from a novel multi-level parallel perspective and using the new design instruments presented along with the thesis. To justify the novel approach to the problem, it will be observed that, in order to fully take advantage of the new technology opportunities, also the basic and widely-used algorithms should be revisited. As such, the solver will be formulated as an optimization problem, in which two levels of algorithmic parallelism will be proposed: the Nelder-Mead derivative-free method used as the optimization engine will be modified to allow the evaluation of the cost function in multiple vertices simultaneously, and the trajectory-path will be divided into non-overlapping segments, in which all the points will be solved concurrently. Algorithmic parallelism will also be supported by a variable number of parallel instances of a custom hardware accelerator, which speeds up the computation of the Forward Kinematics (FK) equations of the robot required during the resolution of the IK. The experimental results (Section 5.7) will show how a variable number of dynamically reconfigurable hardware accelerators, combined with the reconfiguration capability of the application parameters will provide run-time scalability in terms of trajectory accuracy, logic resources, dependability, and execution time. In order to prove the self-adaptivity opportunities provided by the designed system, a basic manager for the whole self-adaptive system will be described in Section 5.8. It will be implemented by simulating external input from the outside world by using the hardware connections of the used development board. The last Chapter of the thesis will summarize, briefly, the whole path followed to deploy the thesis work and the main contributions. It will also analyze the impact of the thesis through journal and conference publications and other dissemination channels. The most significant results of the thesis will also be published in open-source repositories to give the possibility of reproducing the results and even improved by other academic research. The thesis ends with a future research line ideas that will inspire and push the developments of future autonomous self-adaptable heterogeneous systems.