Lam Tran

db-schemachange

July 13, 2025 · 18 min read

Data Engineer

db-schemachange is a simple, lightweight python based tool to manage database objects for Databricks, Snowflake, MySQL, Postgres, SQL Server, and Oracle. It follows an Imperative-style approach to Database Change Management (DCM) and was inspired by the Flyway database migration tool. When combined with a version control system and a CI/CD tool, database changes can be approved and deployed through a pipeline using modern software delivery practices. As such schemachange plays a critical role in enabling Database (or Data) DevOps.

banner image

Cloud Native Data Platform

February 20, 2025 · 4 min read

Lam Tran

Data Engineer

This platform leverages cloud-native technologies to build a flexible and efficient data pipeline. It supports various data ingestion, processing, and storage needs, enabling real-time and batch data analytics. The architecture is designed to handle structured, semi-structured, and unstructured data from diverse external sources.

banner image

Understanding Snowflake micro-partitions

December 22, 2024 · 7 min read

Lam Tran

Data Engineer

Snowflake is one of the most popular data warehouse solutions nowaday because of the various features that it can provide you to build a complete data platform. Understanding the Snowflake storage layer not only helps us to have a deep dive into how it organizes the data under the hood but it is also crucial for performance optimization of your queries.

banner image

Differences between Spark RDD, Dataframe and Dataset

April 17, 2024 · 7 min read

Lam Tran

Data Engineer

I have participated in fews technical interviews and have discussed with people topics around data engineering and things they have done in the past. Most of them are familiar with Apache Spark, obviously, one of the most adopted frameworks for big data processing. What I have been asked and what I often ask them is simple concepts around RDD, Dataframe, and Dataset and the differences between them. It sounds quite fundamental, right? Not really. If we have more closer look at them, there are lots of interesting things that can help us understand and choose which is the best suited for our project.

banner image

How Is Memory Managed In Spark?

July 7, 2023 · 8 min read

Lam Tran

Data Engineer

Spark is an in-memory data processing framework that can quickly perform processing tasks on very large data sets, and can also distribute tasks across multiple computers. Spark applications are memory heavy, hence, it is obvious that memory management plays a very important role in the whole system.

👋 I'm Lam, a data engineer.