Data Engineering with dbt

Data Engineering with dbt
Author: Roberto Zagni
Publsiher: Packt Publishing Ltd
Total Pages: 578
Release: 2023-06-30
Genre: Computers
ISBN: 9781803241883

Download Data Engineering with dbt Book in PDF, Epub and Kindle

Use easy-to-apply patterns in SQL and Python to adopt modern analytics engineering to build agile platforms with dbt that are well-tested and simple to extend and run Purchase of the print or Kindle book includes a free PDF eBook Key Features Build a solid dbt base and learn data modeling and the modern data stack to become an analytics engineer Build automated and reliable pipelines to deploy, test, run, and monitor ELTs with dbt Cloud Guided dbt + Snowflake project to build a pattern-based architecture that delivers reliable datasets Book Descriptiondbt Cloud helps professional analytics engineers automate the application of powerful and proven patterns to transform data from ingestion to delivery, enabling real DataOps. This book begins by introducing you to dbt and its role in the data stack, along with how it uses simple SQL to build your data platform, helping you and your team work better together. You’ll find out how to leverage data modeling, data quality, master data management, and more to build a simple-to-understand and future-proof solution. As you advance, you’ll explore the modern data stack, understand how data-related careers are changing, and see how dbt enables this transition into the emerging role of an analytics engineer. The chapters help you build a sample project using the free version of dbt Cloud, Snowflake, and GitHub to create a professional DevOps setup with continuous integration, automated deployment, ELT run, scheduling, and monitoring, solving practical cases you encounter in your daily work. By the end of this dbt book, you’ll be able to build an end-to-end pragmatic data platform by ingesting data exported from your source systems, coding the needed transformations, including master data and the desired business rules, and building well-formed dimensional models or wide tables that’ll enable you to build reports with the BI tool of your choice.What you will learn Create a dbt Cloud account and understand the ELT workflow Combine Snowflake and dbt for building modern data engineering pipelines Use SQL to transform raw data into usable data, and test its accuracy Write dbt macros and use Jinja to apply software engineering principles Test data and transformations to ensure reliability and data quality Build a lightweight pragmatic data platform using proven patterns Write easy-to-maintain idempotent code using dbt materialization Who this book is for This book is for data engineers, analytics engineers, BI professionals, and data analysts who want to learn how to build simple, futureproof, and maintainable data platforms in an agile way. Project managers, data team managers, and decision makers looking to understand the importance of building a data platform and foster a culture of high-performing data teams will also find this book useful. Basic knowledge of SQL and data modeling will help you get the most out of the many layers of this book. The book also includes primers on many data-related subjects to help juniors get started.

Analytics Engineering with SQL and Dbt

Analytics Engineering with SQL and Dbt
Author: Rui Pedro Machado,Helder Russa
Publsiher: "O'Reilly Media, Inc."
Total Pages: 324
Release: 2023-12-08
Genre: Computers
ISBN: 9781098142353

Download Analytics Engineering with SQL and Dbt Book in PDF, Epub and Kindle

With the shift from data warehouses to data lakes, data now lands in repositories before it's been transformed, enabling engineers to model raw data into clean, well-defined datasets. dbt (data build tool) helps you take data further. This practical book shows data analysts, data engineers, BI developers, and data scientists how to create a true self-service transformation platform through the use of dynamic SQL. Authors Rui Machado from Monstarlab and Hélder Russa from Jumia show you how to quickly deliver new data products by focusing more on value delivery and less on architectural and engineering aspects. If you know your business well and have the technical skills to model raw data into clean, well-defined datasets, you'll learn how to design and deliver data models without any technical influence. With this book, you'll learn: What dbt is and how a dbt project is structured How dbt fits into the data engineering and analytics worlds How to collaborate on building data models The main tools and architectures for building useful, functional data models How to fit dbt into data warehousing and laking architecture How to build tests for data transformations

Data Pipelines Pocket Reference

Data Pipelines Pocket Reference
Author: James Densmore
Publsiher: O'Reilly Media
Total Pages: 277
Release: 2021-02-10
Genre: Computers
ISBN: 9781492087809

Download Data Pipelines Pocket Reference Book in PDF, Epub and Kindle

Data pipelines are the foundation for success in data analytics. Moving data from numerous diverse sources and transforming it to provide context is the difference between having data and actually gaining value from it. This pocket reference defines data pipelines and explains how they work in today's modern data stack. You'll learn common considerations and key decision points when implementing pipelines, such as batch versus streaming data ingestion and build versus buy. This book addresses the most common decisions made by data professionals and discusses foundational concepts that apply to open source frameworks, commercial products, and homegrown solutions. You'll learn: What a data pipeline is and how it works How data is moved and processed on modern data infrastructure, including cloud platforms Common tools and products used by data engineers to build pipelines How pipelines support analytics and reporting needs Considerations for pipeline maintenance, testing, and alerting

Unlocking dbt

Unlocking dbt
Author: Cameron Cyr,Dustin Dorsey
Publsiher: Apress
Total Pages: 0
Release: 2023-09-30
Genre: Computers
ISBN: 1484296990

Download Unlocking dbt Book in PDF, Epub and Kindle

This book shows how dbt is used to build data transformation pipelines that enable dependency management and allow for version control and automated testing. It explains how dbt is revolutionizing data transformation and the advantages that a command-line tool like dbt provides over and above the use of database stored procedures and other ETL and ELT tools that handle data transformations. You’ll see how to create custom-written transformations through simple SQL SELECT statements, eliminating the need for boilerplate code and making it easy to incorporate dbt as the transformation layer in your data warehouse pipelines. Additionally, you will learn how dbt enables data teams to incorporate software engineering best practices such as code reusability, version control, and automated testing into the data transformation process. Unlocking dbt walks you through using dbt to establish a project, build and modularize SQL models, and execute jobs in a way that is easy to maintain and scale as your data ecosystem matures. You’ll begin by establishing and configuring a project, a process covered using both dbt Cloud and dbt Core, so that you can confidently stand up a project using either platform. From there, you’ll move into building transformations with peace of mind that your project will scale appropriately as you continue to develop it. After learning the basics needed to get started, you’ll continue to build on that foundation by looking at the unique ways in which dbt combines SQL with Jinja to take your code beyond what is capable in normal SQL. You will learn about advanced materializations, building lineage in your data flows, the unlimited potential of macros, and so much more. This book also explores supported file types and the building of Python models. Rounding things out, you will learn features of dbt that will assist you in making your transformation layer production ready. These include how to implement automated testing, using dbt to generate documentation, and running CI/CD pipelines. What You Will Learn Understand what dbt is and how it is used in the modern data stack Set up a project using both dbt Cloud and dbt Core Connect a dbt project to a cloud data warehouse Build SQL and Python models that are scalable and maintainable Configure development, testing, and production environments Capture reusable logic in the form of Jinja macros Incorporate version control with your data transformation code Who This Book Is For Current and aspiring data professionals, including architects, developers, analysts, engineers, data scientists, and consultants who are beginning the journey of using dbt as part of their data pipeline’s transformation layer. Readers should have a foundational knowledge of writing basic SQL statements, development best practices, and working with data in an analytical context such as a data warehouse.

Data Engineering with Python

Data Engineering with Python
Author: Paul Crickard
Publsiher: Packt Publishing Ltd
Total Pages: 357
Release: 2020-10-23
Genre: Computers
ISBN: 9781839212307

Download Data Engineering with Python Book in PDF, Epub and Kindle

Build, monitor, and manage real-time data pipelines to create data engineering infrastructure efficiently using open-source Apache projects Key Features Become well-versed in data architectures, data preparation, and data optimization skills with the help of practical examples Design data models and learn how to extract, transform, and load (ETL) data using Python Schedule, automate, and monitor complex data pipelines in production Book DescriptionData engineering provides the foundation for data science and analytics, and forms an important part of all businesses. This book will help you to explore various tools and methods that are used for understanding the data engineering process using Python. The book will show you how to tackle challenges commonly faced in different aspects of data engineering. You’ll start with an introduction to the basics of data engineering, along with the technologies and frameworks required to build data pipelines to work with large datasets. You’ll learn how to transform and clean data and perform analytics to get the most out of your data. As you advance, you'll discover how to work with big data of varying complexity and production databases, and build data pipelines. Using real-world examples, you’ll build architectures on which you’ll learn how to deploy data pipelines. By the end of this Python book, you’ll have gained a clear understanding of data modeling techniques, and will be able to confidently build data engineering pipelines for tracking data, running quality checks, and making necessary changes in production.What you will learn Understand how data engineering supports data science workflows Discover how to extract data from files and databases and then clean, transform, and enrich it Configure processors for handling different file formats as well as both relational and NoSQL databases Find out how to implement a data pipeline and dashboard to visualize results Use staging and validation to check data before landing in the warehouse Build real-time pipelines with staging areas that perform validation and handle failures Get to grips with deploying pipelines in the production environment Who this book is for This book is for data analysts, ETL developers, and anyone looking to get started with or transition to the field of data engineering or refresh their knowledge of data engineering using Python. This book will also be useful for students planning to build a career in data engineering or IT professionals preparing for a transition. No previous knowledge of data engineering is required.

97 Things Every Data Engineer Should Know

97 Things Every Data Engineer Should Know
Author: Tobias Macey
Publsiher: "O'Reilly Media, Inc."
Total Pages: 243
Release: 2021-06-11
Genre: Computers
ISBN: 9781492062363

Download 97 Things Every Data Engineer Should Know Book in PDF, Epub and Kindle

Take advantage of today's sky-high demand for data engineers. With this in-depth book, current and aspiring engineers will learn powerful real-world best practices for managing data big and small. Contributors from notable companies including Twitter, Google, Stitch Fix, Microsoft, Capital One, and LinkedIn share their experiences and lessons learned for overcoming a variety of specific and often nagging challenges. Edited by Tobias Macey, host of the popular Data Engineering Podcast, this book presents 97 concise and useful tips for cleaning, prepping, wrangling, storing, processing, and ingesting data. Data engineers, data architects, data team managers, data scientists, machine learning engineers, and software engineers will greatly benefit from the wisdom and experience of their peers. Topics include: The Importance of Data Lineage - Julien Le Dem Data Security for Data Engineers - Katharine Jarmul The Two Types of Data Engineering and Data Engineers - Jesse Anderson Six Dimensions for Picking an Analytical Data Warehouse - Gleb Mezhanskiy The End of ETL as We Know It - Paul Singman Building a Career as a Data Engineer - Vijay Kiran Modern Metadata for the Modern Data Stack - Prukalpa Sankar Your Data Tests Failed! Now What? - Sam Bail

Database Design for Mere Mortals

Database Design for Mere Mortals
Author: Michael James Hernandez
Publsiher: Addison-Wesley Professional
Total Pages: 668
Release: 2003
Genre: Computers
ISBN: 0201752840

Download Database Design for Mere Mortals Book in PDF, Epub and Kindle

"This book takes the somewhat daunting process of database design and breaks it into completely manageable and understandable components. Mike's approach whilst simple is completely professional, and I can recommend this book to any novice database designer." --Sandra Barker, Lecturer, University of South Australia, Australia "Databases are a critical infrastructure technology for information systems and today's business. Mike Hernandez has written a literate explanation of database technology--a topic that is intricate and often obscure. If you design databases yourself, this book will educate you about pitfalls and show you what to do. If you purchase products that use a database, the book explains the technology so that you can understand what the vendor is doing and assess their products better." --Michael Blaha, consultant and trainer, author of A Manager's Guide to Database Technology "If you told me that Mike Hernandez could improve on the first edition of Database Design for Mere Mortals I wouldn't have believed you, but he did! The second edition is packed with more real-world examples, detailed explanations, and even includes database-design tools on the CD-ROM! This is a must-read for anyone who is even remotely interested in relational database design, from the individual who is called upon occasionally to create a useful tool at work, to the seasoned professional who wants to brush up on the fundamentals. Simply put, if you want to do it right, read this book!" --Matt Greer, Process Control Development, The Dow Chemical Company "Mike's approach to database design is totally common-sense based, yet he's adhered to all the rules of good relational database design. I use Mike's books in my starter database-design class, and I recommend his books to anyone who's interested in learning how to design databases or how to write SQL queries." --Michelle Poolet, President, MVDS, Inc. "Slapping together sophisticated applications with poorly designed data will hurt you just as much now as when Mike wrote his first edition, perhaps even more. Whether you're just getting started developing with data or are a seasoned pro; whether you've read Mike's previous book or this is your first; whether you're happier letting someone else design your data or you love doing it yourself--this is the book for you. Mike's ability to explain these concepts in a way that's not only clear, but fun, continues to amaze me." --From the Foreword by Ken Getz, MCW Technologies, coauthor ASP.NET Developer's JumpStart "The first edition of Mike Hernandez's book Database Design for Mere Mortals was one of the few books that survived the cut when I moved my office to smaller quarters. The second edition expands and improves on the original in so many ways. It is not only a good, clear read, but contains a remarkable quantity of clear, concise thinking on a very complex subject. It's a must for anyone interested in the subject of database design." --Malcolm C. Rubel, Performance Dynamics Associates "Mike's excellent guide to relational database design deserves a second edition. His book is an essential tool for fledgling Microsoft Access and other desktop database developers, as well as for client/server pros. I recommend it highly to all my readers." --Roger Jennings, author of Special Edition Using Access 2002 "There are no silver bullets! Database technology has advanced dramatically, the newest crop of database servers perform operations faster than anyone could have imagined six years ago, but none of these technological advances will help fix a bad database design, or capture data that you forgot to include! Database Design for Mere Mortals(TM), Second Edition, helps you design your database right in the first place!" --Matt Nunn, Product Manager, SQL Server, Microsoft Corporation "When my brother started his professional career as a developer, I gave him Mike's book to help him understand database concepts and make real-world application of database technology. When I need a refresher on the finer points of database design, this is the book I pick up. I do not think that there is a better testimony to the value of a book than that it gets used. For this reason I have wholeheartedly recommended to my peers and students that they utilize this book in their day-to-day development tasks." --Chris Kunicki, Senior Consultant, OfficeZealot.com "Mike has always had an incredible knack for taking the most complex topics, breaking them down, and explaining them so that anyone can 'get it.' He has honed and polished his first very, very good edition and made it even better. If you're just starting out building database applications, this book is a must-read cover to cover. Expert designers will find Mike's approach fresh and enlightening and a source of great material for training others." --John Viescas, President, Viescas Consulting, Inc., author of Running Microsoft Access 2000 and coauthor of SQL Queries for Mere Mortals "Whether you need to learn about relational database design in general, design a relational database, understand relational database terminology, or learn best practices for implementing a relational database, Database Design for Mere Mortals(TM), Second Edition, is an indispensable book that you'll refer to often. With his many years of real-world experience designing relational databases, Michael shows you how to analyze and improve existing databases, implement keys, define table relationships and business rules, and create data views, resulting in data integrity, uniform access to data, and reduced data-entry errors." --Paul Cornell, Site Editor, MSDN Office Developer Center Sound database design can save hours of development time and ensure functionality and reliability. Database Design for Mere Mortals(TM), Second Edition, is a straightforward, platform-independent tutorial on the basic principles of relational database design. It provides a commonsense design methodology for developing databases that work. Database design expert Michael J. Hernandez has expanded his best-selling first edition, maintaining its hands-on approach and accessibility while updating its coverage and including even more examples and illustrations. This edition features a CD-ROM that includes diagrams of sample databases, as well as design guidelines, documentation forms, and examples of the database design process. This book will give you the knowledge and tools you need to create efficient and effective relational databases.

Streaming Systems

Streaming Systems
Author: Tyler Akidau,Slava Chernyak,Reuven Lax
Publsiher: "O'Reilly Media, Inc."
Total Pages: 391
Release: 2018-07-16
Genre: Computers
ISBN: 9781491983829

Download Streaming Systems Book in PDF, Epub and Kindle

Streaming data is a big deal in big data these days. As more and more businesses seek to tame the massive unbounded data sets that pervade our world, streaming systems have finally reached a level of maturity sufficient for mainstream adoption. With this practical guide, data engineers, data scientists, and developers will learn how to work with streaming data in a conceptual and platform-agnostic way. Expanded from Tyler Akidau’s popular blog posts "Streaming 101" and "Streaming 102", this book takes you from an introductory level to a nuanced understanding of the what, where, when, and how of processing real-time data streams. You’ll also dive deep into watermarks and exactly-once processing with co-authors Slava Chernyak and Reuven Lax. You’ll explore: How streaming and batch data processing patterns compare The core principles and concepts behind robust out-of-order data processing How watermarks track progress and completeness in infinite datasets How exactly-once data processing techniques ensure correctness How the concepts of streams and tables form the foundations of both batch and streaming data processing The practical motivations behind a powerful persistent state mechanism, driven by a real-world example How time-varying relations provide a link between stream processing and the world of SQL and relational algebra