Data Virtualization in SQL Server 2019

Data Virtualization in SQL Server 2019

The latest version of SQL Server is in CTP with many improvements. SQL Server 2019 introduces Data Virtualization through Big Data Clusters. SQL Server now able to read data natively HDFS and other Big Data systems. Apart from just HDFS, built-in Polybase can read data from RDMS, NoSQL and ODBC sources. This is a big step forward for SQL Server and with a scale-out option we can setup couple of SQL Server instances as worker nodes and designate one instance as the head node. This allows parallel processing and distributed computing.

Polybase in SQL Server 2019 is more powerful and through which data can be queried from a variety of databases, example Oracle, Hadoop, NoSQL and ODBC sources.

As per the SQL Server Program Manager,

Data virtualization enables unified data services to support multiple applications and users. The virtual data layer—sometimes referred to as a data hub—allows users to query data from many sources through a single, unified interface. Access to sensitive data sets can be controlled from a single location. The delays inherent to ETL need not apply; data can always be up to date. Storage costs and data governance complexity are minimized.

SQL Server 2019 big data clusters with enhancements to PolyBase act as a data hub to integrate structured and unstructured data from across the entire data estate–SQL Server, Azure SQL Database, Azure SQL Data Warehouse, Azure Cosmos DB, MySQL, PostgreSQL, MongoDB, Oracle, Teradata, , HDFS, and more – using familiar programming frameworks and data analysis tools.