Overview
Automated Data Ingress is an NDIP capability designed to process live raw or reduced experimental data as it is written to the analysis cluster filesystem. The process involves multiple services working together to ensure data ingestion:
- Webmon: A service provided by the Neutrons team, which must be configured to trigger data ingress in NDIP.
- Ingress Proxy: A service that receives messages from Webmon (via a postprocessing agent running on an analysis node, listening to ActiveMQ Webmon events). These postprocessing agents are managed by the SNS Linux support team and are not part of NDIP. The Ingress Proxy forwards messages to Apache Kafka.
- Apache Kafka: A message broker that receives messages from the Ingress Proxy and delivers them to the NDIP ingress tool.
- Ingress Tool: A Galaxy tool containing Kafka consumer that receives messages, ingests data into Galaxy, and optionally executes a Galaxy workflow with the ingested data.
Related Services
The table below provides a quick reference to each service, its host, and its repositories:
Service | Host | Code | Relevant Configuration |
---|---|---|---|
Webmon | https://monitor.sns.gov | Neutrons team | https://github.com/neutrons/post_processing_agent |
Ingress Proxy | ingress proxy VM | https://code.ornl.gov/ndip/ingress-proxy | Ansible playbook |
Apache Kafka | ingress proxy VM | 3rd party | Ansible playbook |
Ingress Tool | Galaxy | https://code.ornl.gov/ndip/tool-sources/generic/ingress | Galaxy tool in Ansible playbook |
Further details about each service are provided in the corresponding subsections.