McKesson uses Splice Machine

April 3, 2017

PUBLISHED BY Jack Vaughan

SOURCE TechTarget

“The important thing is to get the data in a format in which the machine can work on it,” said Manuel Salgado, senior data and analytics manager at healthcare giant McKesson Corp.

To reduce complexity while building a data pipeline for analytics, Salgado and McKesson opted for a hybrid database from Splice Machine for some projects. Splice Machine is called a hybrid database because it supports both transactional and advanced analytics jobs. It is ready-built with connections to different big data ecosystem elements.

“We realized the ecosystem for big data is not as mature as traditional data management,” Salgado said. “We were dealing with a lot of components, and we looked for a way to make it easier.”

The objective in using the hybrid approach was to eliminate data silos, reduce data movement and cut down on the number of moving parts, according to Salgado.

Splice Machine, in effect, does a lot of the necessary integration for customers, as its architecture directly connects a SQL relational database to an HBase NoSQL database for transaction processing, as well as Spark for analytics, distributing work across multiple Hadoop clusters. Along the way, it handles both analytics and operational data functions, and provides a single management console.

“In relational databases, such as Oracle and SQL Server, the database takes care of the details of the data management tasks. But that is difficult with Hadoop running by itself. It is just a file system,” Salgado said. “At the end of the day, you have to manage those files.”

He said he wanted to ensure that analysts and developers were not spending too much time managing the complexity of highly scaled data processing, and that Splice Machine helped in this regard. Salgado said the approach helped simplify data management, while reducing data movement.

“We are able to get the data into Splice Machine and do modeling and machine learning there,” he said. “We can call up Spark or [Google] TensorFlow machine learning libraries and not have to move data around.”

The result is that analytics and modeling occur in the same place, “as opposed to a lot of data round trips,” Salgado said.

Read the full article on TechTarget’s SearchDataManagement¬†here.


KEYWORDS