Introduction
Background—Smart manufacturing
Yield management
Product re-engineering
Predictive maintenance
Challenges to data analysis systems in manufacturing
Complex standards landscape
Complex technical architecture
Safety
Regulations and legislation
Process data analysis
Parameter | Requirement | |
---|---|---|
Functionality | Security | Compliant with legislative and regulatory requirementsa |
Compliant with enterprise security policies | ||
Extensibility | Capable of integrating new interfaces, data types, connectors, and components | |
Reusability | System functions should, at minimum, handle structured time-series data. The system should also have sufficient connectors to allow its reuse for new compositions of data and functions | |
Usability | Aesthetics | |
Documentation | Well-documented to assist in the reduction of system ambiguity and entropy, and to allow for system extensibility, component replacement, user training, etc | |
Responsiveness | Limits on stream analysis response times depend on the use case and can range from milliseconds to seconds (alarms and eventing), to daily and weekly reports (process optimization) | |
Reliability | Accuracy | Intolerant of data and event loss |
Availability | Data acquisition, storage systems, and event processing and reporting should have the highest guarantees for availability | |
Recoverability | Recovery of persisted data (raw and processed) is necessary. Speed of system recovery from faults and the resumption of functions is important | |
Performance | Throughput | Typically, this is in the order of 10s of Gigabytes (GB) per day |
Scalability | The system should scale to accommodate geographically dispersed sources/sinks | |
Supportability | The components should be well-maintained, stable, active, well-documented, and with a strong, supportive, and responsive user and developer community. They should also be compatible with well-established monitoring solutions |
Pipeline stage | Requirement |
---|---|
Ingestion | I1: Native support of a large number of technology connectors |
I2: Can ingest a large variety of formats | |
I3: Supports custom processors and connectors | |
I4: Scales to support a large number of sources and sinks (1000s to 100,000s) | |
I5: Native processors for data validation, transformation, filtration, compression, noise reduction, identification, and integration | |
I6: Supports active (real-time) ingestion | |
I7: Supports passive (batch) ingestion | |
Communication | C1: Scalable It should be able to support a large number of sources (ms poll rate) and sinks. The combined number can range from 1000s to 100,000s |
C2: Secures data in transit | |
C3: Exactly-once message delivery semantics | |
C4: Publish-subscribe communication | |
C5: Efficient bandwidth utilization | |
C6: Supports both real-time data streams and bulk data transfer | |
C7: Pull-based data consumption | |
Storage | S1: Scalable up to 10s GB/day |
S2: Read/Write speed independent of volume of stored data. | |
S3: Large variety of formats and types (structured, semi-structured, and unstructured) | |
S4: Compression features for cost-efficient long-term storage (years) | |
S5: Intolerant of data loss | |
S6: Secures stored data | |
S7: Exports data to relational databases | |
Analysis [16] | A1: Scalable up to 100,000 variables |
A2: Heterogeneous data types | |
A3: Imperfect data | |
A4: Real-time and batch processing required | |
A5: Supports time-series analysis & data mining and machine learning | |
Visualization | V1: Scalable |
V2: Visualization methods for large data volumes, variety, and velocity | |
V3: Dynamic and static visualization | |
V4: Interactive | |
V5: Extensible interfaces |
RQ1: What are the requirements for a big data analysis pipeline for manufacturing process data?
Data ingestion
Communication
Storage
Analysis
Visualization
RQ2: What are the available big data analysis pipelines for process data in academic literature?
Paper | Year | Type | Industry | Use case |
---|---|---|---|---|
[44] | 2014 | C | Agnostic | Model discovery and analysis |
[45] | 2014 | C | Agnostic | Knowledge management |
[46] | 2014 | C | Agnostic | Cloud manufacturing |
[47] | 2014 | C | Agnostic | Anomaly detection |
[48] | 2014 | C | SCM | Predictive maintenance |
[49] | 2015 | C | Agnostic | Air quality |
[50] | 2015 | A | Polymer | Yield optimization |
[51] | 2015 | A | Cement | Performance monitoring |
[52] | 2015 | A | Chemical agricultural recycling | Anomaly detection |
[53] | 2015 | C | SCM | APC |
[54] | 2016 | A | Agnostic | Agnostic |
[55] | 2016 | C | Agnostic | Risk management |
[56] | 2016 | C | Agnostic | Agnostic |
[57] | 2016 | C | SCM | Yield improvement |
[58] | 2016 | C | Agriculture | Quality control |
[59] | 2016 | C | Printing | Anomaly detection |
[60] | 2016 | C | Tire | Quality control |
[61] | 2017 | A | Polymer | Quality control |
[62] | 2017 | A | Die casting | Quality control |
[63] | 2017 | A | Agnostic | Quality control |
[64] | 2017 | A | SCM | APC |
[65] | 2017 | C | SCM | Process monitoring |
[66] | 2017 | C | Oil and gas | Predictive maintenance |
[67] | 2017 | C | Agnostic | Prognostics |
[68] | 2017 | C | Hydroelectric | Semantic integration |
[69] | 2017 | C | Weichai Power Co., Ltd. | Quality management |
[70] | 2017 | C | Machining | Energy use tool use and wear |
[71] | 2017 | C | Agnostic | Energy use |
[72] | 2017 | C | Polymer | Prognostics |
[73] | 2017 | C | Automotive | Quality management |
[74] | 2018 | A | SCM | Production planning |
[75] | 2018 | A | Metal casting | Quality management |
[76] | 2018 | A | Machining | Kanban |
[77] | 2018 | A | Food | Event processing |
[78] | 2018 | A | Agnostic | Supply chain management |
[79] | 2018 | A | Agnostic | Agnostic |
[80] | 2018 | C | Agnostic | OEE |
[81] | 2018 | C | Agnostic | Agnostic |
Paper | Ingestion | Communication | Storage | Analysis | Visualization |
---|---|---|---|---|---|
[44] | Custom | − | HDFS, HBase, MongoDB Infinispan, | Hadoop, Hive, Pig, Elasticsearch | Custom |
[45] | Custom | − | HDFS, MySQL | Hadoop | − (\(\sim \)) |
[46] | WSO2 BAM | WSO2 ESB | HDFS, RDB (\(\sim \)), Cassandra (\(\sim \)) | Hadoop, WSO2 CEP | Custom (WSO2 UES) |
[47] | − | Kafka | HDFS | Hadoop, Storm | − |
[48] | − | − | HDFS, HBase, MongoDB Cassandra, | Hadoop, Hive | − |
[49] | − | − | HDFS | Hadoop, Mahout, Jena Elephas | − |
[50] | − | − | MySQL | Matlab, QuickCog | − |
[51] | Custom | − | Microsoft SQL 2012 | Custom | Custom |
[52] | Custom | Kafka | HDFS, HBase | Hadoop, Storm, Hive, Radoop, Rapidminer | − (\(\sim \)) |
[53] | Sqoop | − | HDFS, HBase | Hadoop, Hive, Impala | − |
[54] | Sqoop | Flume | HDFS, HBase, MySQL | Hadoop, Hive | Custom |
[55] | Custom | Custom | MongoDB | Custom | Custom |
[56] | Custom | − | MongoDB, PostgreSQL | RStudio, Watson Analytics, Qliksense | Custom |
[57] | Flume (\(\sim \)), Sqoop (\(\sim \)) | Custom | HDFS, HBase | Hadoop, Hive, Impala, Spark, Pig | Custom |
[58] | Custom | Custom | Cassandra | Spark | Zeppelin (\(\sim \)) |
[59] | Kafka | Kafka | Cassandra, OntoQUAD | Spark | Custom, Jupyter, Ontos Eiger |
[60] | Custom | − | HDFS | Hadoop, Hive, Spark | Custom |
[61] | Storm | Kafka | MongoDB | Storm | Custom |
[62] | Pig, Hive | Custom | HDFS | Hadoop, Hive, Pig | Flamingo, Custom |
[63] | ODI, Talend, Sqoop | Kafka | HDFS, HBase | Hadoop, Spark, IPython | Tableau, Microsoft BI |
[64] | Sqoop, Custom | Custom | HDFS, RDB \(\sim \) | Hadoop, Hive, Impala, Spark, Matlab | − |
[65] | Custom | − | − | Custom (\(\sim \)) | Custom |
[66] | Custom | Kafka, RabbitMQ | HDFS, HBase, Cassandra, PostgreSQL | Hadoop, Spark, Storm | Custom |
[67] | Custom | Custom | Microsoft SQL 2008R2 | Custom | Custom |
[68] | Custom | − | Cassandra | Spark | − |
[69] | Sqoop | − | HDFS | Spark | Custom |
[70] | Custom, Storm | Kafka | CouchDB | − | − |
[71] | Sqoop | − | HDFS, HBase | Hadoop, Hive, Pig | Custom |
[72] | Custom | − | MongoDB | Custom | − |
[73] | WSO2 ESB | WSO2 ESB | Alfresco CMS, Neo4j | Apache UIMA, WEKA | Custom |
[74] | − (\(\sim \)) | − | HDFS, HBase | Hadoop, Hive | − |
[75] | Spark | − | HDFS, HBase | R, Drools | Custom |
[76] | Custom | − | MySQL | Custom | Custom |
[77] | Custom | − | Microsoft SQL 2008R2 | Custom | Custom |
[78] | Flume, Sqoop | − | HDFS | Hadoop, Hive, Solr, RServe, Mahout | Custom |
[79] | Flume | Kafka | HDFS, HBase, MySQL | Hadoop, Hive, Storm | Custom |
[80] | − | Kafka | Cassandra | Spark | Custom |
[81] | Custom | RabbitMQ | HDFS | Hadoop | − |