Skip to main content

Overview

Automated Data Ingress is an NDIP capability designed to process live raw or reduced experimental data as it is written to the analysis cluster filesystem. The process involves multiple services working together to ensure data ingestion:

  • Webmon: A service provided by the Neutrons team, which must be configured to trigger data ingress in NDIP.
  • Ingress Proxy: A service that receives messages from Webmon (via a postprocessing agent running on an analysis node, listening to ActiveMQ Webmon events). These postprocessing agents are managed by the SNS Linux support team and are not part of NDIP. The Ingress Proxy forwards messages to Apache Kafka.
  • Apache Kafka: A message broker that receives messages from the Ingress Proxy and delivers them to the NDIP ingress tool.
  • Ingress Tool: A Galaxy tool containing Kafka consumer that receives messages, ingests data into Galaxy, and optionally executes a Galaxy workflow with the ingested data.

The table below provides a quick reference to each service, its host, and its repositories:

ServiceHostCodeRelevant Configuration
Webmonhttps://monitor.sns.govNeutrons teamhttps://github.com/neutrons/post_processing_agent
Ingress Proxyingress proxy VMhttps://code.ornl.gov/ndip/ingress-proxyAnsible playbook
Apache Kafkaingress proxy VM3rd partyAnsible playbook
Ingress ToolGalaxyhttps://code.ornl.gov/ndip/tool-sources/generic/ingressGalaxy tool in Ansible playbook

Further details about each service are provided in the corresponding subsections.