Skip to main content

馃憢 I'm Lam, a data engineer.

I write about data engineering, web development, and other technology stuff...

7 min read
Lam Tran

I have participated in fews technical interviews and have discussed with people topics around data engineering and things they have done in the past. Most of them are familiar with Apache Spark, obviously, one of the most adopted frameworks for big data processing. What I have been asked and what I often ask them is simple concepts around RDD, Dataframe, and Dataset and the differences between them. It sounds quite fundamental, right? Not really. If we have more closer look at them, there are lots of interesting things that can help us understand and choose which is the best suited for our project.

banner image

8 min read
Lam Tran

Spark is an in-memory data processing framework that can quickly perform processing tasks on very large data sets, and can also distribute tasks across multiple computers. Spark applications are memory heavy, hence, it is obvious that memory management plays a very important role in the whole system.

banner image

11 min read
Lam Tran

State is a very crucial part of React applications which will help update the information of the React components to change UI accordingly and make our application interactive with clients. In this article, I will walk you through the usage of state, its characteristic, and how we can use state in an efficient way.

showcase image

5 min read
Lam Tran

Spark and Ranger are widely used by many enterprises because of their powerful features. Spark is an in-memory data processing framework and Ranger is a framework to enable, monitor and manage comprehensive data security across the Hadoop platform. Thus, Ranger can be used to do authorization for Spark SQL and this blog will walk you through the integration of those two frameworks. This is the first part of the series, where we install the Ranger framework on our machine, and additionally, Apache Solr for auditing.

banner