Scalable transcriptomics analysis with Dask: applications in data science and machine learning

dc.contributor.author Moreno,M en
dc.contributor.author Ricardo Pereira Vilaça en
dc.contributor.author Pedro Gabriel Ferreira en
dc.contributor.other 5635 en
dc.contributor.other 7497 en
dc.date.accessioned 2023-05-08T08:54:00Z
dc.date.available 2023-05-08T08:54:00Z
dc.date.issued 2022 en
dc.description.abstract Background: Gene expression studies are an important tool in biological and biomedical research. The signal carried in expression profiles helps derive signatures for the prediction, diagnosis and prognosis of different diseases. Data science and specifically machine learning have many applications in gene expression analysis. However, as the dimensionality of genomics datasets grows, scalable solutions become necessary. Methods: In this paper we review the main steps and bottlenecks in machine learning pipelines, as well as the main concepts behind scalable data science including those of concurrent and parallel programming. We discuss the benefits of the Dask framework and how it can be integrated with the Python scientific environment to perform data analysis in computational biology and bioinformatics. Results: This review illustrates the role of Dask for boosting data science applications in different case studies. Detailed documentation and code on these procedures is made available at https:// github. com/martaccmoreno/gexp-ml-dask. Conclusion: By showing when and how Dask can be used in transcriptomics analysis, this review will serve as an entry point to help genomic data scientists develop more scalable data analysis procedures. en
dc.identifier P-00X-QNH en
dc.identifier.uri http://dx.doi.org/10.1186/s12859-022-05065-3 en
dc.identifier.uri https://repositorio.inesctec.pt/handle/123456789/13936
dc.language eng en
dc.rights info:eu-repo/semantics/openAccess en
dc.title Scalable transcriptomics analysis with Dask: applications in data science and machine learning en
dc.type en
dc.type Publication en
Files
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
P-00X-QNH.pdf
Size:
2.48 MB
Format:
Adobe Portable Document Format
Description: