Professional Information
Certifications
Badge | Name and Information | Date Earned | Date Expired | Certificates |
---|---|---|---|---|
Microsoft Certified: Azure Data Engineer Associate Official Page |
July 1, 2024 | July 2, 2025 |
PDF file; Digital credentials |
|
AWS Certified Cloud Practitioner Official Page |
May 9, 2024 | May 9, 2027 |
PDF file; Digital credentials |
|
Microsoft Certified: Power BI Data Analyst Associate Official Page |
March 14, 2024 | March 14, 2025 |
PDF file; Digital credentials |
|
Microsoft Office Specialist: Excel Associate (Office 2019) Official Page |
March 7, 2024 | Does not expire. |
PDF file; Digital credentials |
|
Deep Learning Specialization Official Page |
July 11, 2023 | Does not expire |
PDF file; Digital credentials |
|
Google Advance Data Analytics Certificate Official Page |
June 28, 2023 | Does not expire |
PDF file; Digital credentials |
|
Google Data Analytics Certificate Official Page |
June 14, 2023 | Does not expire |
PDF file Digital credentials |
|
Web Applications for Everybody Official Page |
August 10, 2023 | Does not expire | PDF file | |
Web Design for Everybody: Basics of Web Development & Coding Official Page |
July 30, 2023 | Does not expire | PDF file | |
IELTS Result: 7.0/9.0 Official Page |
October 20, 2021 | October 19, 2023 |
Test Report Form (TRF): 21HK006743 LIC027A |
To show the weak-related and expired certificates, click this .
Work Experiences
**Remark: The contents displayed here have been approved by the employers and/or the sensitive information has been modified.
Data Scientist
Synergistic IT, Fremont, CA (Remote)
June, 2023 - Present
Predictive Sales Analytics Platform
- Period: Jun 2024 - Present
-
Project Description:
Performed a machine learning model to predict total sales for each product and shop for a retail chain using 2.9 million daily historical sales data from 22,169 kinds of items and 59 shops using NLP and time series techniques, enhancing inventory optimization and driving data-driven decision-making.
-
My responsibility:
- Imported and merged table into pandas DataFrames, removed duplicates and imputed missing values.
- Implemented TF-IDF vectorizer to extract features from item and shop names, creating text-based features.
- Engineered lagged features and trend-based features for time series analysis.
- Conducted Exploratory Data Analysis (EDA), including visualization of target distribution and time trends. Used multivariate heatmaps to analyze numerical and categorical pairings.
- Applied mean encoding on categorical features and matrix factorization for text features.
- Constructed and trained pipelines with the above transformations and Ridge, XGBoost, and LightGBM regressors.
- Performed feature selection using Recursive Feature Elimination with Cross-Validation (RFECV) and optimized hyperparameters using Bayesian optimization.
- Evaluated models through cross-validation, analyzing Root Mean Square Error (RMSE).
- Predicted future outcomes and compiled comprehensive reports.
-
Technology Used:
Python, Scikit Learn, Machine Learning Pipeline, NLP, TF-IDF, mean encoding, matrix factorization, Ridge Regressor, LightGBM Regressor, XGBoost Regressor, feature selection, hyperparameter optimization.
Social Media Sentiment Analysis and Reporting
- Period: Mar - May, 2024
-
Project Description:
Developed an Azure-based system with a data scientist partner for sentiment analysis on 16,000 post-sale reviews for an online clothing store on social media. Ingested data with Data Factory pipeline, cleaned data using SQL in Synapse Analytic, and implemented NLP techniques with a BERT-based model in Azure Databricks for sentiment analysis. Concluded a satisfactory rate of 96% and trend upward over time.
-
My responsibility:
- Configured an Azure Data Factory pipeline to ingest data from on-premise database to Azure Data Lake Storage.
- Cleaned and filtered data using SQL scripts in Azure Synapse Analytics.
- Designed a notebook in Azure Databricks for data processing, analysis, and visualization.
- Implemented NLP models for text cleaning, lemmatization, stop-word removal, and perform sentiment intensity analysis with BERT based model from Hugging Face Transformers.
- Trained and evaluated a random forest classifier model to predict the sentiment of the apparel reviews.
- Visualized results by time and item and created comprehensive reports.
- Updated the dataset with a scheduled trigger in Azure Data Factory on a weekly basis.
-
Technology Used:
Python, Database management, Azure Data Factory, Synapse Analytics, Databricks, NLP, Hugging Face, BERT.
Hybrid Cloud Site-to-Site VPN Deployment for Secure Connectivity
- Period: Dec 2023 - Feb, 2024
-
Project Description:
Implemented a hybrid cloud architecture that connects an on-premises data center to an AWS VPC using a Site-to-Site VPN with configuring multiple AWS services. Solved the challenge of seamless, secure, and scalable integration between two environments improving operational efficiency and data protection.
-
My responsibility:
- Set up VPCs in two regions with appropriate subnets, route tables, and internet gateways.
- Launched Amazon Linux 2 instances in both regions with secure groups allowing SSH and ICMP traffic.
- Established Site-to-Site VPN connection using virtual private gateway with Openswan configuration.
- Utilized Amazon S3 for critical data backup, enabling versioning and server-side encryption.
- Implemented IAM roles and policies for secure access management, ensuring least privilege principles.
- Monitored instance performance, network traffic and VPN connection using AWS CloudWatch.
- Configured Amazon SNS to send alerts for critical events.
- Automated infrastructure deployment with AWS CloudFormation for consistent, repeatable setups.
-
Technology Used:
AWS Instances, security groups, VPC, Site-to-Site VPN, IAM, CloudWatch, SNS, CloudFormation.
Sport Corporation Sales Analysis
- Period: Sep - Nov, 2023
-
Project Description:
Developed an advanced Power BI dashboard for an international sports corporation to analyze sales data, providing real-time insights into sales performance, discount analysis, and regional success. The dashboard was designed to facilitate data-driven decision-making by presenting key metrics in an interactive and user-friendly format.
-
My responsibility:
- Created and filtered data using SQL on the on-premise dataset, leveraging a star schema data model with the Sales table at the center for optimized performance.
- Cleaned and transformed data using Power Query, ensuring data accuracy and consistency.
- Implemented advanced DAX formulas to create columns and measures for in-depth analysis, including update time display, discount calculations and fiscal year-specific insights.
- Designed a one-page interactive dashboard with key metrics, including total sales, customer counts, product sales, and discount analysis.
- Incorporated various visualizations for comprehensive sales insights, making the data easily interpretable.
- Enabled scheduled refresh to ensure real-time data updates, keeping the dashboard always up-to-date with the latest data.
- Published the dashboard to the Power BI service for broad access and distribution within the organization.
- Collaborated with stakeholders to understand business requirements and tailor the dashboard to meet their needs.
- Conducted user training sessions to ensure effective use and interpretation of the dashboard.
-
Technology Used:
Microsoft SQL Server, Power BI services, Power Query, DAX.
Retail Chain Transaction Analysis
- Period: Jun - Aug, 2023
-
Project Description:
Developed a comprehensive Power BI dashboard to analyze retail chain transactions, providing actionable insights into product sales, customer behavior, seasonal trends, and the effectiveness of promotions. The dashboard facilitated data-driven decision-making with detailed visualizations and interactive features.
-
My responsibility:
- Utilized Power Query to clean and transform data, including splitting and unpivoting the Product column.
- Developer advanced DAX formulas to create calculated columns and measures for in-depth analysis.
- Created detailed Product Analysis and Customer Analysis pages featuring key metrics, slicers, and interactive visuals such as line charts, treemaps, and ribbon charts.
- Ensured alignment and consistency across all dashboard pages for a cohesive user experience.
- Integrated product and customer analysis visuals into a combined Retail Analysis Page with interactive buttons for toggling between views using bookmarks.
- Designed and presented slides summarizing key insights and findings to stakeholders, facilitating informed decision-making.
- Conducted stakeholder meetings to gather requirements and incorporate feedback into the dashboard design.
- Provided training and support to users for effective utilization and interpretation of the dashboard.
-
Technology Used:
Power BI services, Power Query, DAX, Bookmarks, Slides.
Statistical Consultant
University of Chicago, Chicago, IL
September - November 2023
-
Description:
Managed the project execution of three client-facing consulting projects within two months, working with a team of five consultants to analyze data issues, communicate insights, and provide tailored statistical recommendations through client meetings and comprehensive reports.
- Details could be checked at Statistical Consulting section in the Research page.