Drill to Detail Ep.88 'Superset, Preset and the Future of Business Intelligence' with Special Guest Maxime Beauchemin
Drill to Detail
Maxime Beauchemin returns to the Drill to Detail Podcast and joins Mark Rittman to talk about what's new with Apache Airflow 2.0, the origin story for Apache Superset and now Preset.io, why the future of business intelligence is open source and news on Marquez, a reference implementation of the OpenLineage open source metadata service for the collection, aggregation, and visualization of a data ecosystem’s metadata sponsored by WeWork.The Rise of the Data EngineerDrill to Detail Ep.26 'Airflow, Superset & The Rise of the Data Engineer' with Special Guest Maxime BeaucheminApache Airflow 2.0 is here!Apache Superset is a modern data exploration and visualization platformThe Future of Business Intelligence is Open SourcePowerful, easy to use data exploration and visualization platform, powered by Apache Superset™Admunsen: Open source data discovery and metadata engineOpenLineageMarquez: Collect, aggregate, and visualize a data ecosystem's metadata
Apache Airflow with Maxime Beauchemin, Vikram Koka, and Ash Berlin-Taylor
Data – Software Engineering Daily
Apache Airflow was released in 2015, introducing the first popular open source solution to data pipeline orchestration. Since that time, Airflow has been widely adopted for dependency-based data workflows. A developer might orchestrate a pipeline with hundreds of tasks, with dependencies between jobs in Spark, Hadoop, and Snowflake. Since Airflow’s creation, it has powered the data infrastructure at companies like Airbnb, Netflix, and Lyft. It has also been at the center of Astronomer, a startup that helps enterprises build infrastructure around Airflow. Airflow is used to construct DAGs–directed acyclic graphs for managing data workflows. Maxime Beauchemin is the creator of Airflow. Vikram Koka and Ash Berlin-Taylor work at Astronomer. They join the show to talk about the state of Airflow–the purpose of the project, its use cases, and open source ecosystem. Sponsorship inquiries: email@example.com The post Apache Airflow with Maxime Beauchemin, Vikram Koka, and Ash Berlin-Taylor appeared first on Software Engineering Daily.
Upcoming events: A Conversation with Haseeb Qureshi at Cloudflare on April 3, 2019 FindCollabs Hackathon at App Academy on April 6, 2019 Data engineering touches every area of an organization. Engineers need a data platform to build search indexes and microservices. Data scientists need data pipelines to build machine learning models. Business analysts need flexible dashboards to understand the trends and customer use for a product. Max Beauchemin is a data engineer who has worked at Airbnb, Lyft, and Facebook. He’s the creator of two successful open source projects: Apache Airflow and Apache Superset. In a previous show, Max discussed data engineering at Airbnb, and the usage of Airflow. In today’s show, Max discusses the engineering of Apache Superset. Superset is an open source business intelligence web application. Superset allows users to create visualizations, slice and dice their data, and query it. Superset integrates with Druid, a database that supports exploratory, OLAP-style workloads. One reason Superset is distinctive is that it is a full open source application. Many open source projects are tools like databases, command line tools, and web frameworks. Superset is an open source application that can be used by individuals who are not developers–so the audience is wider than the typical open source tool built for engineers. Max joins the show to talk about his experience as a data engineer at Airbnb and Lyft, and the open source projects he has started. The post Apache Superset with Maxime Beauchemin appeared first on Software Engineering Daily.
Drill to Detail Ep.26 'Airflow, Superset & The Rise of the Data Engineer' with Special Guest Maxime Beauchemin
Drill to Detail
Mark Rittman is joined by Maxime Beauchemin to talk about analytics and data integration at Airbnb, the Apache Airflow and Superset open-source projects he helped launch and now works with day-to-day at Airbnb , and his recent Medium article on "The Rise of the Data Engineer"."The Rise of the Data Engineer" blog by Maxime BeaucheminApache AirflowAirbnb Superset"Engineers Shouldn’t Write ETL: A Guide to Building a High Functioning Data Science Department" blog by Jeff Magnusson
Defining Data Engineering with Maxime Beauchemin - Episode 3
Data Engineering Podcast
Summary What exactly is data engineering? How has it evolved in recent years and where is it going? How do you get started in the field? In this episode, Maxime Beauchemin joins me to discuss these questions and more. Transcript provided by CastSource Preamble Hello and welcome to the Data Engineering Podcast, the show about modern data infrastructure Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. You can help support the show by checking out the Patreon page which is linked from the site. To help other people find the show you can leave a review on iTunes, or Google Play Music, and tell your friends and co-workers Your host is Tobias Macey and today I’m interviewing Maxime Beauchemin Questions Introduction How did you get involved in the field of data engineering? How do you define data engineering and how has that changed in recent years? Do you think that the DevOps movement over the past few years has had any impact on the discipline of data engineering? If so, what kinds of cross-over have you seen? For someone who wants to get started in the field of data engineering what are some of the necessary skills? What do you see as the biggest challenges facing data engineers currently? At what scale does it become necessary to differentiate between someone who does data engineering vs data infrastructure and what are the differences in terms of skill set and problem domain? How much analytical knowledge is necessary for a typical data engineer? What are some of the most important considerations when establishing new data sources to ensure that the resulting information is of sufficient quality? You have commented on the fact that data engineering borrows a number of elements from software engineering. Where does the concept of unit testing fit in data management and what are some of the most effective patterns for implementing that practice? How has the work done by data engineers and managers of data infrastructure bled back into mainstream software and systems engineering in terms of tools and best practices? How do you see the role of data engineers evolving in the next few years? Keep In Touch @mistercrunch on Twitter mistercrunch on GitHub Medium Links Datadog Airflow The Rise of the Data Engineer Druid.io Luigi Apache Beam Samza Hive Data Modeling The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA