Professional Information

Certifications

Here are my professional certifications, which are valid at the latest update of this page.

Professional Certificates
Badge Name and Information Date Earned Date Expired Certificates
Microsoft Certified: Azure Data Engineer Associate
Official Page
July 1, 2024 July 2, 2025 PDF file;
Digital credentials
AWS Certified Cloud Practitioner
Official Page
May 9, 2024 May 9, 2027 PDF file;
Digital credentials
Microsoft Certified: Power BI Data Analyst Associate
Official Page
March 14, 2024 March 14, 2025 PDF file;
Digital credentials
Microsoft Office Specialist: Excel Associate (Office 2019)
Official Page
March 7, 2024 Does not expire. PDF file;
Digital credentials
Deep Learning Specialization
Official Page
July 11, 2023 Does not expire PDF file;
Digital credentials
Google Advance Data Analytics Certificate
Official Page
June 28, 2023 Does not expire PDF file;
Digital credentials
Google Data Analytics Certificate
Official Page
June 14, 2023 Does not expire PDF file
Digital credentials
Web Applications for Everybody
Official Page
August 10, 2023 Does not expire PDF file
Web Design for Everybody: Basics of Web Development & Coding
Official Page
July 30, 2023 Does not expire PDF file
IELTS
Result: 7.0/9.0
Official Page
October 20, 2021 October 19, 2023 Test Report Form (TRF):
21HK006743
LIC027A

To show the weak-related and expired certificates, click this .

Work Experiences

**Remark: The contents displayed here have been approved by the employers and/or the sensitive information has been modified.




Data Scientist

Synergistic IT, Fremont, CA (Remote)
June, 2023 - Present


Predictive Sales Analytics Platform

  • Period: Jun 2024 - Present
  • Project Description:

    Performed a machine learning model to predict total sales for each product and shop for a retail chain using 2.9 million daily historical sales data from 22,169 kinds of items and 59 shops using NLP and time series techniques, enhancing inventory optimization and driving data-driven decision-making.

  • My responsibility:
    • Imported and merged table into pandas DataFrames, removed duplicates and imputed missing values.
    • Implemented TF-IDF vectorizer to extract features from item and shop names, creating text-based features.
    • Engineered lagged features and trend-based features for time series analysis.
    • Conducted Exploratory Data Analysis (EDA), including visualization of target distribution and time trends. Used multivariate heatmaps to analyze numerical and categorical pairings.
    • Applied mean encoding on categorical features and matrix factorization for text features.
    • Constructed and trained pipelines with the above transformations and Ridge, XGBoost, and LightGBM regressors.
    • Performed feature selection using Recursive Feature Elimination with Cross-Validation (RFECV) and optimized hyperparameters using Bayesian optimization.
    • Evaluated models through cross-validation, analyzing Root Mean Square Error (RMSE).
    • Predicted future outcomes and compiled comprehensive reports.
  • Technology Used:

    Python, Scikit Learn, Machine Learning Pipeline, NLP, TF-IDF, mean encoding, matrix factorization, Ridge Regressor, LightGBM Regressor, XGBoost Regressor, feature selection, hyperparameter optimization.




Permission granted to publish this figure.

Social Media Sentiment Analysis and Reporting

  • Period: Mar - May, 2024
  • Project Description:

    Developed an Azure-based system with a data scientist partner for sentiment analysis on 16,000 post-sale reviews for an online clothing store on social media. Ingested data with Data Factory pipeline, cleaned data using SQL in Synapse Analytic, and implemented NLP techniques with a BERT-based model in Azure Databricks for sentiment analysis. Concluded a satisfactory rate of 96% and trend upward over time.

  • My responsibility:
    • Configured an Azure Data Factory pipeline to ingest data from on-premise database to Azure Data Lake Storage.
    • Cleaned and filtered data using SQL scripts in Azure Synapse Analytics.
    • Designed a notebook in Azure Databricks for data processing, analysis, and visualization.
    • Implemented NLP models for text cleaning, lemmatization, stop-word removal, and perform sentiment intensity analysis with BERT based model from Hugging Face Transformers.
    • Trained and evaluated a random forest classifier model to predict the sentiment of the apparel reviews.
    • Visualized results by time and item and created comprehensive reports.
    • Updated the dataset with a scheduled trigger in Azure Data Factory on a weekly basis.
  • Technology Used:

    Python, Database management, Azure Data Factory, Synapse Analytics, Databricks, NLP, Hugging Face, BERT.


Permission granted to publish this figure.


Hybrid Cloud Site-to-Site VPN Deployment for Secure Connectivity

  • Period: Dec 2023 - Feb, 2024
  • Project Description:

    Implemented a hybrid cloud architecture that connects an on-premises data center to an AWS VPC using a Site-to-Site VPN with configuring multiple AWS services. Solved the challenge of seamless, secure, and scalable integration between two environments improving operational efficiency and data protection.

  • My responsibility:
    • Set up VPCs in two regions with appropriate subnets, route tables, and internet gateways.
    • Launched Amazon Linux 2 instances in both regions with secure groups allowing SSH and ICMP traffic.
    • Established Site-to-Site VPN connection using virtual private gateway with Openswan configuration.
    • Utilized Amazon S3 for critical data backup, enabling versioning and server-side encryption.
    • Implemented IAM roles and policies for secure access management, ensuring least privilege principles.
    • Monitored instance performance, network traffic and VPN connection using AWS CloudWatch.
    • Configured Amazon SNS to send alerts for critical events.
    • Automated infrastructure deployment with AWS CloudFormation for consistent, repeatable setups.
  • Technology Used:

    AWS Instances, security groups, VPC, Site-to-Site VPN, IAM, CloudWatch, SNS, CloudFormation.




Permission granted and sensitive information has been modified.

Sport Corporation Sales Analysis

  • Period: Sep - Nov, 2023
  • Project Description:

    Developed an advanced Power BI dashboard for an international sports corporation to analyze sales data, providing real-time insights into sales performance, discount analysis, and regional success. The dashboard was designed to facilitate data-driven decision-making by presenting key metrics in an interactive and user-friendly format.

  • My responsibility:
    • Created and filtered data using SQL on the on-premise dataset, leveraging a star schema data model with the Sales table at the center for optimized performance.
    • Cleaned and transformed data using Power Query, ensuring data accuracy and consistency.
    • Implemented advanced DAX formulas to create columns and measures for in-depth analysis, including update time display, discount calculations and fiscal year-specific insights.
    • Designed a one-page interactive dashboard with key metrics, including total sales, customer counts, product sales, and discount analysis.
    • Incorporated various visualizations for comprehensive sales insights, making the data easily interpretable.
    • Enabled scheduled refresh to ensure real-time data updates, keeping the dashboard always up-to-date with the latest data.
    • Published the dashboard to the Power BI service for broad access and distribution within the organization.
    • Collaborated with stakeholders to understand business requirements and tailor the dashboard to meet their needs.
    • Conducted user training sessions to ensure effective use and interpretation of the dashboard.
  • Technology Used:

    Microsoft SQL Server, Power BI services, Power Query, DAX.


Permission granted and sensitive information has been modified.


Permission granted and sensitive information has been modified.

Retail Chain Transaction Analysis

  • Period: Jun - Aug, 2023
  • Project Description:

    Developed a comprehensive Power BI dashboard to analyze retail chain transactions, providing actionable insights into product sales, customer behavior, seasonal trends, and the effectiveness of promotions. The dashboard facilitated data-driven decision-making with detailed visualizations and interactive features.

  • My responsibility:
    • Utilized Power Query to clean and transform data, including splitting and unpivoting the Product column.
    • Developer advanced DAX formulas to create calculated columns and measures for in-depth analysis.
    • Created detailed Product Analysis and Customer Analysis pages featuring key metrics, slicers, and interactive visuals such as line charts, treemaps, and ribbon charts.
    • Ensured alignment and consistency across all dashboard pages for a cohesive user experience.
    • Integrated product and customer analysis visuals into a combined Retail Analysis Page with interactive buttons for toggling between views using bookmarks.
    • Designed and presented slides summarizing key insights and findings to stakeholders, facilitating informed decision-making.
    • Conducted stakeholder meetings to gather requirements and incorporate feedback into the dashboard design.
    • Provided training and support to users for effective utilization and interpretation of the dashboard.
  • Technology Used:

    Power BI services, Power Query, DAX, Bookmarks, Slides.


Permission granted and sensitive information has been modified.


Statistical Consultant

University of Chicago, Chicago, IL
September - November 2023

  • Description:

    Managed the project execution of three client-facing consulting projects within two months, working with a team of five consultants to analyze data issues, communicate insights, and provide tailored statistical recommendations through client meetings and comprehensive reports.

  • Details could be checked at Statistical Consulting section in the Research page.