Hadoop and Data Warehouses

Hadoop and Data Warehouses

I see a lot of confusion when it comes to Hadoop and its role in a data warehouse solution.  Hadoop should not be a replacement for a data warehouse, but rather should augment/complement a data warehouse.  Hadoop and a data warehouse will often work together in a single information supply chain: Hadoop excels in handling raw, unstructured and complex data with vast programming flexibility; Data warehouses, on the other hand, manage structured data, integrating subject areas and providing interactive performance through BI tools.

There are three main use cases for Hadoop with a data warehouse, with the above picture an example of use case 3:

1. Archiving data warehouse data to Hadoop (move)
Hadoop as cold storage/Long Term Raw Data Archiving:
– So don’t need to buy bigger PDW or SAN or tape

2. Exporting relational data to Hadoop (copy)
Hadoop as backup/DR, analysis, cloud use:
– Export conformed dimensions to compare incoming raw data with what is already in PDW
– Can use dimensions against older fact table
– Sending validated relational data to Hadoop
– Hadoop data to WASB and have that used by other tools/products (i.e. Cloud ML Studio)
– Incremental Hadoop load / report

3. Importing Hadoop data into data warehouse (copy)
Hadoop as staging area:
– Great for real-time data, social networks, sensor data, log data, automated data, RFID data (ambient data)
– Where you can capture the data and only pass the relevant data to PDW
– Can do processing of the data as it sits in Hadoop (clean it, aggregate it, transform it)
– Some processing is better done on Hadoop instead of SSIS
– Way to keep staging data
– Long-term raw data archiving on cheap storage that is online all the time (instead of tape) – great if need to keep the data for legal reasons
– Others can do analysis on it and later pull it into data warehouse if find something useful

Thanks for reading this article. If you have any opinions, please leave a comment below or send us a message

New Approaches to Analytics to Revolutionize Logistics

New Approaches to Analytics to Revolutionize Logistics

Three stages are commonly used to categorize an organizations maturity in their use of business intelligence and analytics technologies:

  1. Descriptive: What happened in the past?
  2. Predictive: What will (probably) happen in the future?
  3. Prescriptive: What should we do to change the future?

Descriptive analytics typically means good old fashioned business intelligence (BI) – reports and dashboards.  But, there is a newish technology in the Descriptive category – one that I might argue is worthy of a category in its own right.  That technology is visual data discovery.  The visual data discovery approach has a rapidly growing fan base for many reasons, but one stands out:  It increases the probability that business managers will find the information they need in time to influence their decisions.

Visual data discovery tools typically provide:

  1. Unrestricted navigation through, and exploration of, data.
  2. Rich data visualization so that information can be comprehended rapidly.
  3. The ability to introduce new data sources into an analysis to expand it further.

By helping to answer a different class of question – the unanticipated one – visual data discovery tools increase the probability that managers will find the information they need in time to influence their decisions.  And that, after all, is the only valid reason for investing in business intelligence solutions.

If you have any opinions, you are welcome to leave a comment or send us message.

2013 in review: Big data, bigger expectations?

In the parlance of the industry, big data’s feat was a result of the successful convergence of the “three Vs”:

Volume: A large amount of data

Variety: A wide range of data types and sources

Velocity: The speed of data moving from its sources, into the hands of those who need it

Although other Vs have since been contemplated, such as Veracity and Value, the original three attributes promised big data could go far beyond the boundaries of traditional databases, which require data to be stored in rigid rows and columns.

However, over the past year, reality began to sink in: People came to realize what big data could and could not do. Unfortunately, performing large-scale analytics in real time proved to be more daunting than originally thought. Although Hadoop continues to be the world’s most popular big data processing platform, it was designed for batch processing and is far too slow for real-time use.

Reference: 2013 in review: Big data, bigger expectations?