HASLab - Indexed Articles in Journals
Permanent URI for this collection
Browse
Browsing HASLab - Indexed Articles in Journals by Author "5635"
Results Per Page
Sort Options
-
ItemDetailed Black-Box Monitoring of Distributed Systems( 2021) Ricardo Pereira Vilaça ; 5602Modern containerized distributed systems, such as big data storage and processing stacks or micro-service based applications, are inherently hard to monitor and optimize, as resource usage does not directly match hardware resources due to multiple virtualization layers. For instance, inter-application traffic is an important factor in as it directly indicates how components interact, it has not been possible to accurately monitor it in an application independent way and without severe overhead, thus putting it out of reach of cloud platforms. In this paper we present an efficient black-box monitoring approach for gathering detailed structural information of collaborating processes in a distributed system that can be queried for various purposes, as it includes both information about processes, containers, and hosts, as well as resource usage and amount of data exchanged. The key to achieving high detail and low overhead without custom application instrumentation is to use a kernel-aided event driven strategy. We validate a prototype implementation by applying it to multi-platform microservice deployments, evaluate its performance with micro-benchmarks, and demonstrate its usefulness for container placement in a distributed data storage and processing stack (i.e., Cassandra and Spark).
-
ItemScalable transcriptomics analysis with Dask: applications in data science and machine learning( 2022) Moreno,M ; Ricardo Pereira Vilaça ; Pedro Gabriel Ferreira ; 5635 ; 7497Background: Gene expression studies are an important tool in biological and biomedical research. The signal carried in expression profiles helps derive signatures for the prediction, diagnosis and prognosis of different diseases. Data science and specifically machine learning have many applications in gene expression analysis. However, as the dimensionality of genomics datasets grows, scalable solutions become necessary. Methods: In this paper we review the main steps and bottlenecks in machine learning pipelines, as well as the main concepts behind scalable data science including those of concurrent and parallel programming. We discuss the benefits of the Dask framework and how it can be integrated with the Python scientific environment to perform data analysis in computational biology and bioinformatics. Results: This review illustrates the role of Dask for boosting data science applications in different case studies. Detailed documentation and code on these procedures is made available at https:// github. com/martaccmoreno/gexp-ml-dask. Conclusion: By showing when and how Dask can be used in transcriptomics analysis, this review will serve as an entry point to help genomic data scientists develop more scalable data analysis procedures.
-
ItemTiQuE: Improving the Transactional Performance of Analytical Systems for True HybridWorkloads( 2023) Ana Nunes Alonso ; Ricardo Pereira Vilaça ; Nuno Filipe Faria ; José Orlando Pereira ; 5692 ; 5635 ; 8253 ; 5602Transactions have been a key issue in database management for a long time and there are a plethora of architectures and algorithms to support and implement them. The current state-of-the-art is focused on storage management and is tightly coupled with its design, leading, for instance, to the need for completely new engines to support new features such as Hybrid Transactional Analytical Processing (HTAP). We address this challenge with a proposal to implement transactional logic in a query language such as SQL. This means that our approach can be layered on existing analytical systems but that the retrieval of a transactional snapshot and the validation of update transactions runs in the server and can take advantage of advanced query execution capabilities of an optimizing query engine. We demonstrate our proposal, TiQuE, on MonetDB and obtain an average 500x improvement in transactional throughput while retaining good performance on analytical queries, making it competitive with the state-of-the-art HTAP systems.