Comprehensive Software Reviews to make better IT decisions
The End of Hadoop and Cloudera?
Hadoop was designed as a NoSQL and non-proprietary technology to persist large volumes of structured and unstructured data with minimal risk of loss using inexpensive hardware. I believe it has fulfilled this goal pretty well. So, where’s the problem?
I think the root cause of all “failed” Hadoop implementations lies in the “enterprise data warehouse” mentality, architecture and methodology used to build enterprise data lakes. Trying to squeeze all phases of a data lifecycle – creation, accumulation, integration, augmentation, packaging for BI and analytics – into one monolithic process implemented in a technology that was not designed for such use case is doomed for failure.
Instead of acknowledging the methodological mistake, the implementers started blaming the technology – which was never designed for their use case in the first place. This blame resulted in an economic downturn for the main vendor – Cloudera – which apparently positioned the technology to match the “enterprise data warehouse” expectations rather than what it is best suited for. This strategy resulted in the sharp drop of the stock price and the departure of the CEO in June 2019. However, the market is still betting on the technology and the company – they may not be ready to go yet.
Source: SoftwareReviews Big Data Data Quadrant, Accessed August 21, 2019.
Hadoop may be still a good choice for structured and unstructured data accumulation and “as is” storage. Its technology may still be too rudimentary for data augmentation and is absolutely a misfit for data packaging for BI and analytics. Hadoop is in the trough of disillusionment – and this is good. I hope that Hadoop failure stories will be regarded as lessons learned against the magic-technology-first approach to creating data management solutions. There’s no “magic technology” – even AI is not magic. Every good solution requires a combination of business, information, and technology architectures. Data creation, accumulation and persistence, data integration and augmentation, and data packaging for BI and analytics are all distinctly different phases of data progression and require different architecture, governance, and technologies.
Want to Know More?
Google is acquiring Fitbit, a leading wearables brand, for $2.1 billion. Google says the motivation is to “help more people with wearables,” expand its vision of “ambient computing,” and, of course, grab a share of the fast-growing $8-billion digital health market.
Databricks, a data processing and analytics platform with a strong focus on artificial intelligence (AI) and machine learning (ML), is investing 100 million euros (US$111 million) in its European Development Center to take advantage of the European pool of talent and cutting-edge research.
Databricks, a data processing and analytics platform with a strong focus on AI and machine learning, recently raised $400 million in a series F funding round. This puts the company value at $6.2 billion. Databricks plans to use the money to hire more engineers to accelerate R&D.
Cambridge Semantics enhanced its Anzo platform to enable data management and analytics over both structured and unstructured data, the firm announced in an August 22 press release.
Snowflake has announced a new data exchange allowing businesses to generate new revenue.
The two major Hadoop developers – Hortonworks and Cloudera – merged into one company at the dawn of 2019.
Tableau 2019.1 includes a great new feature, which enables users to type “standard” English to query your data. It leverages a specific sub discipline of AI called Natural Language Processing.