Comprehensive software reviews to make better IT decisions
AWS Lake Formation Takes Pain Out of Data Lakes
AWS Lake Formation makes it easier for users to set up and manage data lakes. But organizations will face challenges in determining how to derive value from their data lakes.
AWS Lake Formation does a lot of the heavy lifting in setting up data lakes for AWS users.
A data lake is a single repository of an organization’s data, including both the raw data in its original form and restructured and transformed data prepared for analysis. The purpose of a data lake is to break down data silos and make it easier for organizations to derive insights.
Establishing data lakes has traditionally been fraught with technical challenges. IT professionals have to identify the appropriate data repositories and bring together the various sources, categorize data, deduplicate data, and cross-link records, all while providing for appropriate security and access permissions and sometimes having to transform or restructure data in certain ways to make it useable. Best practice would also require regular auditing access to ensure that policies are being adhered to.
As such, building and managing a data lake from scratch can be expensive and time-consuming, not to mention difficult.
AWS Lake Formation is designed to take on much of the heaving lifting, making it easier to set up, configure, and manage data lakes. Lake Formation reduces the work in setting up the data lake to the user by “defining data sources and what data access and security policies [they] want to apply.” Then Lake Formation “helps [them] collect and catalog data from databases and object storage, move the data into ... [an] S3 data lake, clean and classify [the] data using machine learning algorithms, and secure access to [their] sensitive data.” At the end of the day, “users can access a centralized data catalog which describes available data sets and their appropriate usage,” and can then work with the data using various AWS analytics services.Importantly, Lake Formation simplifies the establishment of security and access policies, because administrators can define the policies within Lake Formation itself, rather than having to set up the policies for each service using identity and access management (IAM) roles through the AWS console or AWS CloudFormation. The user simply has to define policies in one place, and AWS will manage the enforcement of those policies across the entire platform, greatly simplifying auditing and compliance concerns.
Image: AWS Lake Formation’s process. Source: AWS.
AWS Lake Formation assists users in solving many significant technical and operational challenges in setting up and managing data lakes.
However, organizations will still have their work cut out for them to get business value from those data lakes. AWS Lake Formation gives users a data lake, but the platform doesn’t ensure that they know how to use it to derive the most value.
While the data lake is a powerful tool, users will need working knowledge of data and analytics and proficiency in the business context of their organization in order to be able to ask the right questions to perform the truly insightful analyses that lead to breakthroughs and better business decisions.
In this sense, AWS Lake Formation is just one further example of the broader trend in cloud services and in technology: back-end and non-business-facing IT functions are becoming more and more commoditized as point-and-click service offerings.
At the end of the day, this trend is transforming demand for skill sets across the industry, but it isn’t making IT any easier overall. Technology professionals who can master the new tools and combine technical skills with a deep understanding of their organizational context will thrive in the years to come.
Want to Know More?
Egnyte Protect Platform Provides Unique and Effective Cloud-Based Approach to Data Classification and Security
Egynte, a player in the cloud enterprise content management space since 2007, has recently emerged as a multi-faceted Software as a Service (SaaS) offering, now providing data classification and security options for businesses looking to identify, classify, and protect sensitive data.
A leader in the data security and privacy industry, Spirion presents its swift response to the changing landscape of COVID-19 through a no-cost offering of data security tools.
PHEMI is a data privacy solution focused on keeping data-processing activities secure by redacting information based on the role of the accessor. Thus, allowing such data to be used for multiple use cases without compromising privacy.
Board International Solution Marketplace Gets SOC Compliance: Prêt-à-Porter Solutions and Assured Security in the Cloud
Board International follows the trend of delivering solutions by opening a solution marketplace while strengthening customer trust by getting SOC-2 and SOC-3 certifications.
Boomi, a Dell Technologies business, has been known for its lack of hierarchy and relationship management capability in its Master Data Hub (MDH) offering. Acquiring Unifi Software does not seem to fill this void but could even cannibalize MDH – unless the two products are merged into one.
Joining the ranks of giants such as Snap (Snapchat’s parent company), Microsoft and Tesla, Immuta the automated Data Governance company has been named to Fast Company’s 2020 list of the World’s 50 Most Innovative Companies.
Spirion’s recent integration with synthetic data provider Tonic marks an important evolutionary step in the data privacy space. The synergistic gain enables organizations to replace the personally identifiable information (PII) they collect and store with high-quality synthetic data.
Collibra’s FedRAMP Authorization Means Increased Opportunity to Partner With Government Organizations on Their Strategic Data Initiatives
On the heels of its Federal Risk and Authorization Management Program (FedRAMP) authorization, Collibra, the Data Intelligence company, now counts some of the leading US government organizations, such as the US Department of the Army and the Office of the Secretary of Defense, in its customer base.
Databricks has launched a new Data Ingestion Network, made up of partners whose integrations to Data Ingest provide hundreds of connectors and enable automation to move disparate data into Databricks’ new storage layer, eliminating the need to maintain siloed data in a data lake and data warehouse.