Linux shell scripting for high-throughput biological data processing on supercomputers

16/01/2018 - 17/01/2018 All day

Course Description

An unprecedented amount of biomedical data have been produced and stored in the last years. Managing such biological big data is often not affordable without high-performance computing architectures, needed to analyze and process large-scale datasets. Running high-throughput (HTP) bioinformatics data pipelines on supercomputing machines requires advanced Linux shell command line and scripting skills. Most scientists working with such data often lack such skills or have acquired them by self-learning without becoming fully independent and fluent. This may have repercussions on the quality, reproducibility, and reliability of the analyses. In this two-day course, we will introduce the Linux shell and, on day one, we will show how to navigate and work with files and directories, how to combine commands to do new things, how to perform the same actions on many different files, how to filter and selectively extract data from tables, and how to find objects in files. Moreover, we will show how to connect to a remote supercomputer and how to utilise a supercomputing environment to analyse big amount of biological data, run simple shell scripts and bioinformatics pipelines. Day 2 will be wholly practical. Participants are invited to let us know in advance which are the typical file format(s) they have to deal with (e.g. fastq, table, etc), the typical processes they need to perform on them (e.g. filtering, ordering, etc.) and the typical programs they need to run (e.g. bwa, hisat2, etc.) so that we can prepare tailored practicals. Participants are also welcome to come to the course with one or more files they wish to work with, provided they do not exceed a given size.

Target audience

This course is aimed at scientists at any stage of their career who work with big data files and/or large numbers of files, and need to process and analyse their data on local or remote supercomputingmachines, but lack the Linux shell command line and scripting skills necessary to perform such tasks.


CINECA, Via dei Tizii 6b, 00185 Roma, Italy.


- training elixir-iib

other training

Elixir-IIB/NETTAB Tutorial on Biological Networks: data analysis, visualization and medical application

18/10/2017 - 19/10/2017
14:00 - 17:30

Palermo, Italy | Oct 18–19, 2017 | Participants in the course will be introduced to protein-protein interactions, biochemical reactions and causal interactions field. During the course the participants will be exposed to the literature curation principles and methods. In addition, there will be a session dedicated to common standards and ontologies adopted to describe data retrieved from the … Continue reading Elixir-IIB/NETTAB Tutorial on Biological Networks: data analysis, visualization and medical application

more >

Best practices for RNA-Seq data analysis

27/09/2017 - 29/09/2017
All day

ELIXIR-IIB, in collaboration with University of Salerno, Italy, is pleased to announce the upcoming training course on “Best practices for RNA-Seq data analysis”. Course date: 27-29 SEPTEMBER 2017 Deadline for applications: 11 September 2017 Selection will start on July 20th and those with an adequate profile will be accepted immediately, until we reach 25 participants. … Continue reading Best practices for RNA-Seq data analysis

more >

< December 2017 >
27 28 29 30 1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31
Monday 11

No current events