The End of Hadoop and Cloudera?
Hadoop was designed as a NoSQL and non-proprietary technology to persist large volumes of structured and unstructured data with minimal risk of loss using inexpensive hardware. I believe it has fulfilled this goal pretty well. So, where’s the problem?
I think the root cause of all “failed” Hadoop implementations lies in the “enterprise data warehouse” mentality, architecture and methodology used to build enterprise data lakes. Trying to squeeze all phases of a data lifecycle – creation, accumulation, integration, augmentation, packaging for BI and analytics – into one monolithic process implemented in a technology that was not designed for such use case is doomed for failure.
Instead of acknowledging the methodological mistake, the implementers started blaming the technology – which was never designed for their use case in the first place. This blame resulted in an economic downturn for the main vendor – Cloudera – which apparently positioned the technology to match the “enterprise data warehouse” expectations rather than what it is best suited for. This strategy resulted in the sharp drop of the stock price and the departure of the CEO in June 2019. However, the market is still betting on the technology and the company – they may not be ready to go yet.
Source: SoftwareReviews Big Data Data Quadrant, Accessed August 21, 2019.
Hadoop may be still a good choice for structured and unstructured data accumulation and “as is” storage. Its technology may still be too rudimentary for data augmentation and is absolutely a misfit for data packaging for BI and analytics. Hadoop is in the trough of disillusionment – and this is good. I hope that Hadoop failure stories will be regarded as lessons learned against the magic-technology-first approach to creating data management solutions. There’s no “magic technology” – even AI is not magic. Every good solution requires a combination of business, information, and technology architectures. Data creation, accumulation and persistence, data integration and augmentation, and data packaging for BI and analytics are all distinctly different phases of data progression and require different architecture, governance, and technologies.