Data Engineer

Experienced Data Engineer proficient in AWS infrastructure (Glue, EMR, Redshift), Azure(Databricks, ADF, Microsoft Fabric) Spark, PySpark, Azure Databricks, Data Lake, Airflow, and PL/SQL. Skilled in Python scripting, query optimization, automation, and pipeline development. Recognized for optimizing Redshift report generation and receiving awards for data governance and debut initiatives

Experience

Data Engineer

ADF Data Science Pvt. Ltd.
JUNE 2021

Centralized Reporting System with RAG LLM Integration

Create Data Pipelines with different activities & data flow with different sources and destinations to satisfy giving business needs:

Extract data from various third-party sources using cutting-edge-tech stack:

Scheduling and monitoring tasks via (CI/CD) Jenkins:

Implementing Data Catalog Tool (open source):

ETL - Pentaho, Python, Pyspark

Education

Degree/Grade Institution Duration Performance
MCA SRM Institute of Science and Technologie, Chennai JUNE 2019 - MAY 2021 82%
BCA Aadhiparasakthi College of arts and science, vellore JUNE 2016 - MAY 2019 76%
12th Grade Anderson Hr.Sec. School, Kanchipuram Year of Passing - 2016 69%
10th Grade Anderson Hr.Sec. School, Kanchipuram Year of Passing - 2014 85%

Skills

Projects

Azure Data Factory Promise Table Migration with PySpark Processing I led the migration of promise tables into Azure Data Lake Storage Gen2 (ADLS Gen2) through Azure Data Factory, ensuring seamless data transfer and storage. Leveraging PySpark on Azure Databricks, I orchestrated efficient data processing pipelines for in-depth analysis. By optimizing ETL orchestration and implementing data quality checks, I ensured the reliability and accuracy of promise table data. The project focused on scalability and performance, enabling the processing of large datasets with ease. Through monitoring and logging mechanisms, I ensured the continuous improvement of data pipelines, facilitating informed decision-making.
Seamless Third-Party Data Integration - Orchestrated the loading of third-party data into Redshift through API integration, web scraping, and Graph API for Outlook, ensuring a continuous influx of relevant data for analysis.
Creating Data Mart Tables Using CDC CDC files from DMS S3 were extracted into Redshift with a proper naming convention Pyspark was used for transformation and mapping variables Daily jobs were run to fetch the most recent historical CDC files and write them to the target table in Redshift 5 reports were generated using the EDW tables as the data source
Streamlined Machine Learning QA Processes Implemented a novel methodology for QA of MLE model tables, slashing manual QA time by 70% and accelerating the process by 65% compared to previous methods.
Implementation of Data Governance tool using Apache Atlas - Apache Atlas was implemented using Docker RDS metadata was imported into Atlas via REST API Automated script created to ingest RDS entities into Atlas To ensure secure data governance, role-based policies were implemented and assigned to relevant teams The data dictionary is uploaded via Jenkins for each release Atlas automatically populates the data dictionary with necessary data.
Nush Shopping - Fashion E-commerce Shopping website using Bootstrap, HTML5, CSS

Certifications

Google BigQuery & PostgreSQL : Big Query for Data Analysis
Data Engineering essential SQL, Python and Spark
MongoDB Basic M001
Core Java Certifications

Achievement

Received Promising Debut award for the year 2022