Sunday, May 29, 2022

Where have I been?

Data has suddenly become a very big deal in tech space. Today the data processing/analytics platforms are more valuable than the actual code that power these platform - in the sense that monetization happens on the platforms and not the code. The code is of course the foundation without which you cannot build solid platforms. So despite all the talk about "no-code", demand for solid programming skills are going no where. Most of the code that powers these modern data stack platforms are open-source with a very healthy and collaborative community of very smart people. 

When I joined Cloudera innovation accelerator via this Feb (2022), and I had absolutely no idea of anything about modern data stack. I had experience with building ML models for medical domains, but the data sizes I dealt with were generally few thousand rows rather than million. Since then it has been quite a wonderful journey of learning, finding and connecting to new people. 

During this time I have primarily dabbled with building adapters for a transformation tool called dbt from dbtLabs ( One of these adapters for the Impala is now open source and is available from ( I also made my first upstream contribution to impyla project (, which is used by dbt-impala to connect to Impala warehouse. It is cool, when you discover how does the actual patch process work in an open-source project. If you came here looking for using dbt-impala, do check this tutorial written by Alasdair Brown (

The most fun part of working here has been closely interacting with people not only at Cloudera but also dbtLabs, something I have hardly done in my previous assignments. 

Last but not least the whole journey was possible due to the platform provided by and the wonderful staff they have. I would strongly recommend that you try them out if you are looking for meaningful remote assignments.