The amount of data generated by millions of connected IoT sensors and devices is growing exponentially. The need to extract relevant information from this data in modern and future generation computing system, necessitates efficient data handling and processing platforms that can migrate such big data from one location to other locations seamlessly and securely, and can provide a way to preprocess and analyze that data before migrating to the final destination. Various data pipeline architectures have been proposed allowing the data administrator/user to handle the data migration operation efficiently. However, the modern data pipeline architectures do not offer built-in functionalities for ensuring data veracity, which includes data accuracy, trustworthiness and security. Furthermore, allowing the intermediate data to be processed, especially in the serverless computing environment, is becoming a cumbersome task. In order to fill this research gap, this paper introduces an efficient and novel data pipeline architecture, named as CCoDaMiC (Coherent Coordination of Data Migration and Computation), which brings both the data migration operation and its computation together into one place. This also ensures that the data delivered to the next destination/pipeline block is accurate and secure. The proposed framework is implemented in private OpenStack environment and Apache Nifi.
Future Generation Computer Systems, Impact factor: 7.187