HASLab - Indexed Articles in Journals
Permanent URI for this collection
1 - 5 of 161
ItemScalable transcriptomics analysis with Dask: applications in data science and machine learning( 2022)Background: Gene expression studies are an important tool in biological and biomedical research. The signal carried in expression profiles helps derive signatures for the prediction, diagnosis and prognosis of different diseases. Data science and specifically machine learning have many applications in gene expression analysis. However, as the dimensionality of genomics datasets grows, scalable solutions become necessary. Methods: In this paper we review the main steps and bottlenecks in machine learning pipelines, as well as the main concepts behind scalable data science including those of concurrent and parallel programming. We discuss the benefits of the Dask framework and how it can be integrated with the Python scientific environment to perform data analysis in computational biology and bioinformatics. Results: This review illustrates the role of Dask for boosting data science applications in different case studies. Detailed documentation and code on these procedures is made available at https:// github. com/martaccmoreno/gexp-ml-dask. Conclusion: By showing when and how Dask can be used in transcriptomics analysis, this review will serve as an entry point to help genomic data scientists develop more scalable data analysis procedures.
ItemInteractive VPL-based global illumination on the GPU using fuzzy clustering( 2022)Physically-based synthesis of high quality imagery, including global illumination light transport phenomena, results in a significant workload, which makes interactive rendering a very challenging task. We propose a VPL-based ray tracing approach that runs entirely in the GPU and achieves interactive frame rates while handling global illumination light transport phenomena. This approach is based on clustering both shading points and VPLs and computing visibility only among clusters' representatives. A new massively parallel K-means clustering algorithm, enables efficient execution in the GPU. Rendering artifacts, that could result from the piecewise constant approximation of the VPLs/shading points visibility function introduced by the clustering, are smoothed away by resorting to an innovative approach based on fuzzy clustering and weighted interpolation of the visibility function. The effectiveness of the proposed approach is experimentally verified for a collection of scenes, with frame rates larger than 3 fps and up to 25 fps being demonstrated.(c) 2022 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
ItemEnsemble Metropolis Light Transport( 2022)This article proposes a Markov Chain Monte Carlo (MCMC) rendering algorithm based on a family of guided transition kernels. The kernels exploit properties of ensembles of light transport paths, which are distributed according to the lighting in the scene, and utilize this information to make informed decisions for guiding local path sampling. Critically, our approach does not require caching distributions in world space, saving time and memory, yet it is able to make guided sampling decisions based on whole paths. We show how this can be implemented efficiently by organizing the paths in each ensemble and designing transition kernels for MCMC rendering based on a carefully chosen subset of paths from the ensemble. This algorithm is easy to parallelize and leads to improvements in variance when rendering a variety of scenes.
ItemA formal treatment of the role of verified compilers in secure computation( 2022)Secure multiparty computation (SMC) allows for complex computations over encrypted data. Privacy concerns for cloud applications makes this a highly desired technology and recent performance improvements show that it is practical. To make SMC accessible to non-experts and empower its use in varied applications, many domain-specific compilers are being proposed. We review the role of these compilers and provide a formal treatment of the core steps that they perform to bridge the abstraction gap between high-level ideal specifications and efficient SMC protocols. Our abstract framework bridges this secure compilation problem across two dimensions: 1) language-based source- to target-level semantic and efficiency gaps, and 2) cryptographic ideal- to real-world security gaps. We link the former to the setting of certified compilation, paving the way to leverage long-run efforts such as CompCert in future SMC compilers. Security is framed in the standard cryptographic sense. Our results are supported by a machine-checked formalisation carried out in EasyCrypt. © 2021 Elsevier Inc.
ItemThe CoronaSurveys System for COVID-19 Incidence Data Collection and Processing( 2021)CoronaSurveys is an ongoing interdisciplinary project developing a system to infer the incidence of COVID-19 around the world using anonymous open surveys. The surveys have been translated into 60 languages and are continuously collecting participant responses from any country in the world. The responses collected are pre-processed, organized, and stored in a version-controlled repository, which is publicly available to the scientific community. In addition, the CoronaSurveys team has devised several estimates computed on the basis of survey responses and other data, and makes them available on the project's website in the form of tables, as well as interactive plots and maps. In this paper, we describe the computational system developed for the CoronaSurveys project. The system includes multiple components and processes, including the web survey, the mobile apps, the cleaning and aggregation process of the survey responses, the process of storage and publication of the data, the processing of the data and the computation of estimates, and the visualization of the results. In this paper we describe the system architecture and the major challenges we faced in designing and deploying it.