Container image
NDIP uses containers for all tools that are deployed on the platform. To deploy a new tool the first step will be to create a containerized image and associated Dockerfile. If the tool is already containerized, it might be possible to use this version, depending on how the tool obtains its configuration and input files, and where the output files are located.
Containerizing the code
Starting with the source code, which should be available in a repository on GitHub or ORNL's GitLab, determine if there is already a Dockerfile for the code. If not, then a Dockerfile will need to be added to this source code and the CI configured to automatically build the image. It is preferrable to work with the original repository if possible, however this requires write access. If it's not possible to modify the existing repository, it will be necessary to create a project with forked source in https://code.ornl.gov/ndip/tool-sources and work there.
There are some requirements that must be followed when containerizing the code, otherwise it will not be successful. These include:
- Ensuring that there are no hard-coded paths or references to absolute filesystem paths
- Making sure the code does not make assumptions about an environment where the code runs
Compatibility with Galaxy tools
Galaxy assumes a tool will take zero or more input files and generate some number of output files. It is possible to have tools that do not generate output files, but this requires special handling and should be avoided. The tool itself should be executed with a simple command although there is some flexibility in what can be handled. Ideally, the inputs should be specified as command line arguments or a fixed set of input files. If that is not possible they should be available in mountable input folders. The outputs should be stdout/stderr or a fixed set of file names and types in a known location. If that is not possible, they should be placed in a mountable folder. If the output is to be managed by Galaxy, there should be no complex hierarchy of output subfolders. Specifying the location and number of output files in an input configuration file should be avoided as this makes it difficult for Galaxy to know what has been generated.
If the code does not satisfy these conditions, it might be necessary to refactor it. Usually, a couple of simple changes is enough. See the next section on how to specify input/output files in the tool.xml so Galaxy can interact with the tool.
Preparing the Dockerfile
After obtaining access to the source code (either the original or a cloned repository), you can add a Dockerfile. It should be possible to determine the required dependencies from the code installation documentation and add these to the Dockerfile.
Here are some examples:
- maxent: a Python code with a couple of standard dependencies
- single peak fitting: a Python code with Mantid framefork
- DCA++: a C++ code with with MPI, CUDA, ADIOUS2
You can also use Dockerfile reference for more information.
Your image should not rely on ENTRYPOINT, since Galaxy ignores it. You can still have it in Dockerfile (e.g. if you want to use it somewhere else), but you should use the command from entrypoint in the tool command when you create a tool file
One can use same image with for multiple tools, e.g. when the source code contains multiple executables or the same executable can be used for different purpose.
One can use already existing image which already has some dependencies installed (e.g. CUDA image, or Python image , etc) as a base for another one.
Building Image
Although it is possible to build an image manually and push it to a registry, deployed NDIP tools require an automated build process. See an example of how to do that in GitLab. After a successful build, the image is available at ONRL's public registry in the ndip project.
Verify Image
Before adding the tool to Galaxy, make sure that the image works as expected. The easiest way to do this is run it locally and check if it accepts inputs and produces expected outputs. Note that inputs and outputs should be external to image, as this will be required when the tool runs in Galaxy.
Some tools require direct access to SNS/HFIR storage so these are automatically mounted in all NDIP tool
containers and accessible from standard /SNS
or /HFIR
folders within the container. Please notat that
this is not recommended as Galaxy cannot manage
data that is accessed directly, and it will also create an infrastructure dependency in a tool, however
in some cases it is unavoidable. Currently
only read-only access is available this way.