The Visual Integrated Bioinformatics Environment
(VIBE) is a visual programming interface for creating
workflows, or pipelines, from analysis modules and data
sources available on the web, on a grid, and through VIBE
servers, which provide access to many common bioinformatics algorithms. The
graphical interface offers an extensible drag-and-drop
environment for the creation of sequence analysis pipelines
and visualization of results.
VIBE makes extensive use of XML for configuration, data
exchange, data storage, and communications. The VIBE system,
both client and server, is largely configured by XML files
and provides interfaces that can be used during execution to
refine the configuration on the fly. All data exchange
between modules and between the client and the server is in
XML format. Modules and services available through the system
manage their parameters and communications through XML files;
many algorithms can be integrated into the system simply by
adding an XML file. Pipelines and templates are also
stored as XML.
In what language is VIBE written?
The VIBE client is 100% Java and so is portable to any
platform. The user-friendly interface for creating and
monitoring workflows and the interactive visualization tools
for exploring results can be used by virtually any user. The
VIBE client runs on Java 1.4+ and takes advantage of the
Apache Jakarta project’s logging functionality (log4j)
for debugging purposes.
The VIBE server is a servlet-based web application, so it
can be installed on nearly any platform to work with nearly
any algorithm. The VIBE server also runs on Java 1.4+ and
uses log4j. The recommended application server for VIBE is
Tomcat 4.1.27+.
What are the main areas of the client
interface?
The VIBE main window (shown below) is comprised of five
functional areas: The menu and operation toolbar, the Module Toolbar,
the optional Resources, the Workspace, and the Details pane.
The VIBE interface, showing a pipeline during
execution.
The menu and operation toolbar provide access to many actions
available in the system, such as saving and loading
pipelines, cut-copy-paste, execution controls, and
application configuration.
The Module Toolbar contains all of the modules that
are available in the installation and groups them onto
tabs in user-configurable groupings.
The Resources area (on the left) contains tabs for resources that may optionally
be displayed, such as listings of Favorite Pipelines (shown) or a Launchpad
for quick launching of local programs or favorite websites. Additionally,
the Module Toolbar may be viewed in a hierarchical format as a resource in
this area.
The Workspace area (center) is the primary area of interaction
between a user and the system. It is the canvas on which
a pipeline is created by dragging modules from the Module
Toolbar and dropping them here. Multiple workspaces can
be open simultaneously, and are accessed by clicking the
appropriate tab at the bottom.
The Details pane (on the right) is the primary source
of information in the program. It provides information
about the pipeline or specific modules on the Details tab
(shown), access to a module’s parameters on the
Parameters tab, and a place for user annotation on an
individual module or whole pipeline basis on the Notes
tab. The images below show how the three tabs might be
used for a single module.
The details page for BLASTX, its parameters, and
notes that a user has entered regarding this instance of
the module.
What are modules?
Modules are the unit of activity in VIBE and represent a
step or set of steps in a process. Modules are connected to
form pipelines which represent a full analysis process.
Modules can represent input sources, such as a GenBank
sequence retrieval module; a processing or algorithmic
service, such as BLAST; a utility, such as a filter or email
notification; a visualization tool, such as a dendrogram
viewer; or just about any other category of algorithm or
tool. As numerous as their differences, all modules share
some fundamental characteristics; after all, they all inherit
ultimately from the same source.
All modules have an XML representation. A user can select a
module on the Workspace, copy it, and paste it into a text
editor where its XML form will be rendered. Similarly, that
XML can be copied and pasted into VIBE where it will manifest
as a graphical module. This XML can be emailed to colleagues
for incorporation into their VIBE workspace.
All modules have an associated default parameters XML file.
Some of these files do not specify parameters, such as for an
MSAViewer. Some specify many parameters, such as for BLASTX.
But all modules have this file, and any defined parameters
will be presented to the user in the Parameter table on the
Parameters tab in the Details pane. This parameters file will
also define the mappings from VIBE parameters to command-line
arguments if it is associated with a command-line
program.
All modules define the input types it can accept and the
output types it generates in the parameters file. These types
govern the connections that VIBE will allow to be made. Input
modules must have empty input types; that is, they are always
"source" modules of a workflow and cannot accept input.
All modules can be executed, interrupted, reset, and
restarted. After successful execution, the data they
generated can be viewed as XML (the standard results format)
or as a text report, if implemented.
All modules have a context-sensitive menu accessible by
right-clicking on the module with the mouse. This menu
provides shortcuts to execution control, saving and loading
module-specific data, retrieving and viewing data, and
more. Some modules, such as most visualization modules, also
allow their data to be printed or exported as an image.
What are pipelines, and how do I
make one?
VIBE pipelines are workflows that guide data through
multiple analyses, filters and visualizations. Pipelines are
created by dragging modules from the Module Toolbar and
dropping them on the Workspace. When the application is in
auto-connect mode, VIBE will attempt to connect the module
currently being dragged to the module on the workspace that
has focus (signified by a yellow border) by comparing their
input and output types, respectively. If the types match, the
data can be transferred between them and VIBE will
automatically make a connection.
All pipelines start with an Input (or Query) module, such as a
sequence from a database or a chromatogram file. Pipelines
generally end with a visualization module for viewing the
outcome of the workflow; many visualization modules can also be
used as intermediate steps and will allow the user to manually
select data items to suppress from further analysis. Analyses, transformations, and
utilities are used to make the path from start to finish. The
parameters for each module along the path can be adjusted to
meet the needs of the researcher.
Each module can accept at most one input pipe from another
module, and can send as many as ten output pipes to other
modules. All ten pipes will carry the same information. The
data flowing through a pipe can be filtered using the
Conditional module, which allows the user to set Boolean
criteria and filter the results accordingly down a true or
false route.
Can I reuse a pipeline?
Once built and parameterized, the user can save the pipeline
as a template for future use. A template includes all modules
and settings, but no data. This is useful for processing
multiple sets of data according to the same protocol. The
template is saved as XML.
Can I come back to my pipeline
later?
Pipelines that have data can be saved as well. Since genomic
data sets can be quite large, pipelines with data are saved
as .zva files, or Zipped VIBE Archives. This saves all
associated data files from start to finish and all
intermediates, along with the template which is then
populated with pointers to the data files. ZVAs can be
reopened any time in the future for further processing or
inspection.
What happens during pipeline
execution?
Execution of a pipeline begins with precompilation. During
this step, VIBE checks all input and output connections for
validity, verifies that necessary parameters for each module
have been assigned, and checks to see that input data has
been supplied.
Once the pipeline passes precompilation, each module is
processed in turn. Note that parallel branches in a workflow
will be executed in parallel. During execution, the border of
the module will fade from green to white and back to green,
and the cube in the upper right-hand corner of the window
will spin.
As a module finishes its execution successfully (indicated
by a solid-green border), its results are placed into the
local cache. This cache is emptied periodically as specified
by the configuration file, so results worth keeping should be
saved individually or the entire pipeline should be
archived.
Modules that do not complete successfully will receive a red
border. Any error information generated by the module or its
algorithm will be contained in the results file.