HASLab - Indexed Articles in Conferences

Item

Machine Learning Regression-Based Prediction for Improving Performance and Energy Consumption in HPC Platforms

( 2025) André Martins Pereira ; 9080

High-performance computing is pivotal for processing large datasets and executing complex simulations, ensuring faster and more accurate results. Improving the performance of software and scientific workflows in such environments requires careful analysis of their computational behavior and energy consumption. Therefore, maximizing computational throughput in these environments, through adequate software configuration and resource allocation, is essential for improving performance. The work presented in this paper focuses on leveraging regression-based machine learning and decision trees to analyze and optimize resource allocation in high-performance computing environments based on application's performance and energy metrics. Applied to a bioinformatics case study, these models enable informed decision-making by selecting the appropriate computing resources to enhance the performance of a phylogenomics software. Our contribution is to better explore and understand the efficient resource management of supercomputers, namely Santos Dumont. We show that the predictions for application's execution time using the proposed method are accurate for various amounts of computing nodes, while energy consumption predictions are less precise. The application parameters most relevant for this work are identified and the relative importance of each application parameter to the accuracy of the prediction is analysed.

Item

Towards Adaptive Transactional Consistency for Georeplicated Datastores

( 2025) José Orlando Pereira ; Fábio André Coelho ; 5602 ; 6059

Developers of data-intensive georeplicated applications face a difficult decision when selecting a database system. As captured by the CAP theorem, CP systems such as Spanner provide strong consistency that greatly simplifies application development. AP systems such as AntidoteDB providing Transactional Causal Consistency (TCC), ensure availability in face of network partitions and isolate performance from wide-area round-trip times, but avoid lost-update anomalies only when values can be merged. Ideally, an application should be able to adapt to current data and network conditions by selecting which transactional consistency to use for each transaction. In this paper, we test the hypothesis that a georeplicated database system can be built at its core providing only TCC, hence, being AP, but allow an application to execute some transactions under Snapshot Isolation (SI), hence CP. Our main result is showing that this can be achieved even when all the interaction happens through the TCC database system, without additional communication channels between the participants. A preliminary experimental evaluation with a proof-of-concept implementation using AntidoteDB shows that this approach is feasible. Copyright © 2025 held by the owner/author(s).

Item

Alloy Repair Hint Generation Based on Historical Data

( 2025) Alcino Cunha ; Nuno Moreira Macedo ; Ana Cristina Paiva ; 5612 ; 5625 ; 6073

Platforms to support novices learning to program are often accompanied by automated next-step hints that guide them towards correct solutions. Many of those approaches are data-driven, building on historical data to generate higher quality hints. Formal specifications are increasingly relevant in software engineering activities, but very little support exists to help novices while learning. Alloy is a formal specification language often used in courses on formal software development methods, and a platform-Alloy4Fun-has been proposed to support autonomous learning. While non-data-driven specification repair techniques have been proposed for Alloy that could be leveraged to generate next-step hints, no data-driven hint generation approach has been proposed so far. This paper presents the first data-driven hint generation technique for Alloy and its implementation as an extension to Alloy4Fun, being based on the data collected by that platform. This historical data is processed into graphs that capture past students' progress while solving specification challenges. Hint generation can be customized with policies that take into consideration diverse factors, such as the popularity of paths in those graphs successfully traversed by previous students. Our evaluation shows that the performance of this new technique is competitive with non-data-driven repair techniques. To assess the quality of the hints, and help select the most appropriate hint generation policy, we conducted a survey with experienced Alloy instructors.

Item

Multi-Partner Project: Green.Dat.AI: A Data Spaces Architecture for Enhancing Green AI Services

( 2025) Cláudia Vanessa Brito ; 7516

The concept of data spaces has emerged as a structured, scalable solution to streamline and harmonize data sharing across established ecosystems. Simultaneously, the rise of AI services enhances the extraction of predictive insights, operational efficiency, and decision-making. Despite the potential of combining these two advancements, integration remains challenging: data spaces technology is still developing, and AI services require further refinement in areas like ML workflow orchestration and energy-efficient ML algorithms. In this paper, we introduce an integrated architectural framework, developed under the Green.Dat.AI project, that unifies the strengths of data spaces and AI to enable efficient, collaborative data sharing across sectors. A practical application is illustrated through a smart farming use case, showcasing how AI services within a data space can advance sustainable agricultural innovation. Integrating data spaces with AI services thus maximizes the value of decentralized data while enhancing efficiency through a powerful combination of data and AI capabilities.

Item

Extending C2 Traffic Detection Methodologies: From TLS 1.2 to TLS 1.3-enabled Malware

( 2024) Bernardo Luís Portela ; 6060

As the Internet evolves from TLS 1.2 to TLS 1.3, it offers enhanced security against network eavesdropping for online communications. However, this advancement also enables malicious command and control (C2) traffic to more effectively evade malware detectors and intrusion detection systems. Among other capabilities, TLS 1.3 introduces encryption for most handshake messages and conceals the actual TLS record content type, complicating the task for state-of-the-art C2 traffic classifiers that were initially developed for TLS 1.2 traffic. Given the pressing need to accurately detect malicious C2 communications, this paper examines to what extent existing C2 classifiers for TLS 1.2 are less effective when applied to TLS 1.3 traffic, posing a central research question: is it possible to adapt TLS 1.2 detection methodologies for C2 traffic to work with TLS 1.3 flows? We answer this question affirmatively by introducing new methods for inferring certificate size and filtering handshake/protocolrelated records in TLS 1.3 flows. These techniques enable the extraction of key features for enhancing traffic detection and can be utilized to pre-process data flows before applying C2 classifiers. We demonstrate that this approach facilitates the use of existing TLS 1.2 C2 classifiers with high efficacy, allowing for the passive classification of encrypted network traffic. In our tests, we inferred certificate sizes with an average error of 1.0%, and achieved detection rates of 100% when classifying traffic based on certificate size, and over 93% when classifying TLS 1.3 traffic behavior after training solely on TLS 1.2 traffic. To our knowledge, these are the first findings to showcase specialized TLS 1.3 C2 traffic classification.

HASLab - Indexed Articles in Conferences

Permanent URI for this collection

Browse

Browse

Recent Submissions