Professional Certificates and Experience

Certifications

Here are my professional certifications, which are valid at the latest update of this page.

Professional Certificates
Badge Name and Information Date Earned Date Expired Certificates
Microsoft Certified: Azure Data Engineer Associate
Official Page
July 1, 2024 July 2, 2026 PDF file;
Digital credentials
AWS Certified Cloud Practitioner
Official Page
May 9, 2024 May 9, 2027 PDF file;
Digital credentials
Microsoft Certified: Power BI Data Analyst Associate
Official Page
March 14, 2024 March 14, 2026 PDF file;
Digital credentials
Microsoft Office Specialist: Excel Associate (Office 2019)
Official Page
March 7, 2024 Does not expire PDF file;
Digital credentials
Deep Learning Specialization
Official Page
July 11, 2023 Does not expire PDF file;
Digital credentials
Google Advanced Data Analytics Certificate
Official Page
June 28, 2023 Does not expire PDF file;
Digital credentials
SAS Advanced Programmer
Official Page
Dec 14, 2024 Does not expire PDF file;
Digital credentials
Google Data Analytics Certificate
Official Page
June 14, 2023 Does not expire PDF file
Digital credentials
Web Applications for Everybody
Official Page
August 10, 2023 Does not expire PDF file
Web Design for Everybody: Basics of Web Development & Coding
Official Page
July 30, 2023 Does not expire PDF file
IELTS
Result: 7.0/9.0
Official Page
October 20, 2021 October 19, 2023 Test Report Form (TRF):
21HK006743
LIC027A

To show the weak-related and expired certificates, click this .

Work Experiences

**Remark: The contents displayed here have been approved by the employers and/or the sensitive information has been modified.




Data Scientist

Synergistic IT, Fremont, CA (Remote)
June, 2023 - Present


Cloud-Based Clinical Workload Monitoring System

  • Project Description:

    Designed and implemented a cloud-based data pipeline and analytics platform for a hospital to consolidate clinical, HR, and scheduling data for daily reporting on provider workload, documentation timing, and staffing efficiency.
    The system integrated AWS-based ETL pipelines, a Snowflake data warehouse, and Power BI dashboards to deliver near real-time insights for HR and clinical operations, enabling proactive workload balancing and reducing reporting latency by over 80%, presenting key metrics in an interactive and user-friendly format.

  • My responsibility:
    • Designed and maintained a multi-stage ETL pipeline leveraging AWS DMS to capture daily CDC updates from multiple systems in Amazon RDS into a partitioned raw S3 zone, then using AWS Glue (PySpark) to transform and standardize datasets into a curated processed S3 zone for efficient loading into Snowflake.
    • Implemented data standardization, null handling, duplicate resolution, and timestamp alignment to ensure cross-system consistency.
    • Performed SQL-based data quality checks and remodeled the data into a star schema with SCD Type 2 dimensions in Snowflake to support business-critical reporting.
    • Applied column-level masking in Snowflake and aligned all data access controls with HIPAA compliance.
    • Automated pipeline orchestration using AWS Managed Workflows for Apache Airflow (MWAA), monitored workflows using AWS CloudWatch, and configured SNS alerts for pipeline failures or data anomalies.
    • Collaborated with BI analysts to define clinical workload metrics, built curated Snowflake views, and implemented row-level security for department-specific access.
    • Developed Power BI dashboards for HR and operations teams, visualizing appointment volumes, documentation delays, off-shift activity, and workload distribution by clinic and provider role.
    • Enabled HR to proactively identify staff overloads, reduce burnout risk, and improve resource allocation; cut reporting time from 12 hours to under 3 hours, supporting data-driven staffing decisions across multiple clinics.
    • Trained users on pipeline and dashboard functionality and interpretation, facilitating adoption for the HR team.
  • Technology Used:

    AWS (Glue, S3, RDS, DMS, CloudWatch, SNS, IAM, KMS, CloudFormation, MWAA/Airflow), Snowflake, SQL, PySpark, Power BI, Python




Permission granted to publish this figure.

Social Media Sentiment Analysis and Reporting

  • Project Description:

    Partnered with a fellow data scientist to design and deploy an Azure-based NLP system for sentiment analysis on 16,000 post-sale reviews from an online clothing store. Developed an Azure Data Factory ingestion pipeline and implemented a PySpark-based Azure Databricks workflow to clean, process, and classify review text using a BERT-based model. Visualized sentiment trends and generated automated weekly reports to inform product feedback analysis and guide marketing strategies.

  • My responsibility:
    • Configured an Azure Data Factory pipeline to ingest data from an on-premise database into Azure Data Lake Storage, scheduled for weekly updates.
    • Developed an Azure Databricks notebook using PySpark to perform data cleaning, deduplication, and null handling.
    • Implemented NLP workflows for text cleaning, lemmatization, and stop-word removal; performed sentiment intensity analysis using a BERT-based model from Hugging Face Transformers.
    • Trained and fine-tuned a Random Forest classifier to predict sentiment labels, optimizing preprocessing parameters to improve classification accuracy.
    • Evaluated model performance and produced comprehensive sentiment trend reports by product and time period, achieving an overall satisfactory rate of 96% with an upward trend.
    • Automated the end-to-end workflow—from ingestion to reporting—using ADF triggers and Databricks job scheduling.
  • Technology Used:

    Python, PySpark, Azure Data Factory, Azure Databricks, Azure Data Lake Storage, Hugging Face Transformers, BERT, Random Forest, NLP


Permission granted to publish this figure.


Predictive Sales Analytics Platform

  • Project Description:

    Performed a machine learning model to predict total sales for each product and shop for a retail chain using 2.9 million daily historical sales data from 22,169 kinds of items and 59 shops using NLP and time series techniques, enhancing inventory optimization and driving data-driven decision-making.

  • My responsibility:
    • Imported and merged table into pandas DataFrames, removed duplicates and imputed missing values.
    • Implemented TF-IDF vectorizer to extract features from item and shop names, creating text-based features.
    • Engineered lagged features and trend-based features for time series analysis.
    • Conducted Exploratory Data Analysis (EDA), including visualization of target distribution and time trends. Used multivariate heatmaps to analyze numerical and categorical pairings.
    • Applied mean encoding on categorical features and matrix factorization for text features.
    • Constructed and trained pipelines with the above transformations and Ridge, XGBoost, and LightGBM regressors.
    • Performed feature selection using Recursive Feature Elimination with Cross-Validation (RFECV) and optimized hyperparameters using Bayesian optimization.
    • Evaluated models through cross-validation, analyzing Root Mean Square Error (RMSE).
    • Predicted future outcomes and compiled comprehensive reports.
  • Technology Used:

    Python, Scikit Learn, Machine Learning Pipeline, NLP, TF-IDF, mean encoding, matrix factorization, Ridge Regressor, LightGBM Regressor, XGBoost Regressor, feature selection, hyperparameter optimization.




Permission granted and sensitive information has been modified.

Sport Corporation Sales Analysis

  • Project Description:

    Developed an advanced Power BI dashboard for an international sports corporation to analyze sales data, providing real-time insights into sales performance, discount analysis, and regional success. The dashboard was designed to facilitate data-driven decision-making by presenting key metrics in an interactive and user-friendly format.

  • My responsibility:
    • Created and filtered data using SQL on the on-premise dataset, leveraging a star schema data model with the Sales table at the center for optimized performance.
    • Cleaned and transformed data using Power Query, ensuring data accuracy and consistency.
    • Implemented advanced DAX formulas to create columns and measures for in-depth analysis, including update time display, discount calculations and fiscal year-specific insights.
    • Designed a one-page interactive dashboard with key metrics, including total sales, customer counts, product sales, and discount analysis.
    • Incorporated various visualizations for comprehensive sales insights, making the data easily interpretable.
    • Enabled scheduled refresh to ensure real-time data updates, keeping the dashboard always up-to-date with the latest data.
    • Published the dashboard to the Power BI service for broad access and distribution within the organization.
    • Collaborated with stakeholders to understand business requirements and tailor the dashboard to meet their needs.
    • Conducted user training sessions to ensure effective use and interpretation of the dashboard.
  • Technology Used:

    Microsoft SQL Server, Power BI services, Power Query, DAX.


Permission granted and sensitive information has been modified.


Permission granted and sensitive information has been modified.

Retail Chain Transaction Analysis

  • Project Description:

    Developed a comprehensive Power BI dashboard to analyze retail chain transactions, providing actionable insights into product sales, customer behavior, seasonal trends, and the effectiveness of promotions. The dashboard facilitated data-driven decision-making with detailed visualizations and interactive features.

  • My responsibility:
    • Utilized Power Query to clean and transform data, including splitting and unpivoting the Product column.
    • Developed advanced DAX formulas to create calculated columns and measures for in-depth analysis.
    • Created detailed Product Analysis and Customer Analysis pages featuring key metrics, slicers, and interactive visuals such as line charts, treemaps, and ribbon charts.
    • Ensured alignment and consistency across all dashboard pages for a cohesive user experience.
    • Integrated product and customer analysis visuals into a combined Retail Analysis Page with interactive buttons for toggling between views using bookmarks.
    • Designed and presented slides summarizing key insights and findings to stakeholders, facilitating informed decision-making.
    • Conducted stakeholder meetings to gather requirements and incorporate feedback into the dashboard design.
    • Provided training and support to users for effective utilization and interpretation of the dashboard.
  • Technology Used:

    Power BI services, Power Query, DAX, Bookmarks, Slides.


Permission granted and sensitive information has been modified.


Statistical Consultant

University of Chicago, Chicago, IL
September - November 2023

  • Description:

    Managed the project execution of three client-facing consulting projects within two months, working with a team of five consultants to analyze data issues, communicate insights, and provide tailored statistical recommendations through client meetings and comprehensive reports.

  • Details could be checked at Statistical Consulting section in the Research page.