April 3, 2017
PUBLISHED BY Jack Vaughan
To reduce complexity while building a data pipeline for analytics, Salgado and McKesson opted for a hybrid database from Splice Machine for some projects. Splice Machine is called a hybrid database because it supports both transactional and advanced analytics jobs. It is ready-built with connections to different big data ecosystem elements.
“We realized the ecosystem for big data is not as mature as traditional data management,” Salgado said. “We were dealing with a lot of components, and we looked for a way to make it easier.”
Splice Machine, in effect, does a lot of the necessary integration for customers, as its architecture directly connects a SQL relational database to an HBase NoSQL database for transaction processing, as well as Spark for analytics, distributing work across multiple Hadoop clusters. Along the way, it handles both analytics and operational data functions, and provides a single management console.
“In relational databases, such as Oracle and SQL Server, the database takes care of the details of the data management tasks. But that is difficult with Hadoop running by itself. It is just a file system,” Salgado said. “At the end of the day, you have to manage those files.”
He said he wanted to ensure that analysts and developers were not spending too much time managing the complexity of highly scaled data processing, and that Splice Machine helped in this regard. Salgado said the approach helped simplify data management, while reducing data movement.
“We are able to get the data into Splice Machine and do modeling and machine learning there,” he said. “We can call up Spark or [Google] TensorFlow machine learning libraries and not have to move data around.”
The result is that analytics and modeling occur in the same place, “as opposed to a lot of data round trips,” Salgado said.
Read the full article on TechTarget’s SearchDataManagement here.