Peter van Heusden (pvh@sanbi.ac.za) and
Eugene de Beste, SANBI
Reminder: High Performance Computing means optimising the whole computing system.
Remember Amdahl's Law: the theoretical speedup of a task is always limited by the part of the task that cannot benefit from the improvement.
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." - Tony Hoare
Solve it in a way that is portable and can be adapted as software changes with a handful of developers
class: Workflow
cwlVersion: v1.0
inputs:
reference:
type: File
doc: reference human genome file
steps:
create-dict:
run: ../../tools/picard-CreateSequenceDictionary.cwl
in:
reference: reference
outputFileName: output_RefDictionaryFile
tmpdir: tmpdir
out: [ createDict_output ]
From CWL GATK workflow
pvh@gabber:~$ docker run hello-world
Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
c04b14da8d14: Pull complete
Digest: sha256:0256e8a36e2070f7bf2d0b0763dbabdd67798512411de4cdcf9431a1feb60fd9
Status: Downloaded newer image for hello-world:latest
Hello from Docker!
Things to explore:
FROM quay.io/refgenomics/docker-ubuntu:14.04
MAINTAINER Nik Krumm <nkrumm@gmail.com>
RUN git clone https://github.com/lh3/bwa && \
cd bwa && \
git checkout 0.7.10 && \
make && cp bwa /usr/local/bin/bwa
RUN apt-get install -y samtools
# Convenience commands
ADD align.py /usr/local/bin/align.py
RUN chmod +x /usr/local/bin/align.py
RUN ln -s /usr/local/bin/align.py /usr/local/bin/align
CMD ["/usr/local/bin/align"]
From: onecodex/docker-bwa
Docker images are built using the docker build command with a reference to the path containing the Dockerfile. Images can optionally be pushed to DockerHub using the push command.
$ docker build -t pvanheus/aligntool:latest .
$ docker push pvanheus/aligntool:latest
The DockerHub is run by Docker, Inc. but alternative repositories exist, most notably quay.io. Quay is notable for having a powerful API and thus allowing for integration in automated workflows.
It also supports Github integration.