Ruby Qianru
AI Live Meeting





Techstack


Responsibility
  • Spearheading real-time HTTPS communication for web and mobile, enabling live video transmission and AI prediction.
  • Implemented neural networks leveraging TensorFlow hand pose recognition, achieving 80%+ accuracy in classification.
  • Integrated automated data collection system, enhancing user experience, and boosting operational efficiency by 50%.

Design Thinking

What problem am I trying to address?
I noticed the difficulty to quickly interact with peers during multi-user livestream videos (e.g. Zoom, Google meet). For example, in a online class scenario, if a user want to raise hand to ask a question, the user has to click the emoji button -> select emoji -> deselect emoji (three steps) to complete the user flow of interaction with the professor.
How can AI help to solve this problem?
An AI algorithm, potentially computer vision to classify users’ hand postures, and to directly emit signals to the peers.

What data is needed to create an AI to help address the issue?
A series of input data that is able to precisely conclude humans’ hand postures.


Data Collection
The prototype is based on Daniel Shiffman's The Coding Train. I reduced data collection wait time, and extended data collection time, so that the data collection system can automatically input more data samples at a time. This design upgraded the user experience of data collection.


Model Training I

  • Deep Learning model trained with Jupyter Notebook: Link
  • util.py: python funtions to load data (load json data into numpy arrays, shuffle data), preprocess data (slice X_train, y_train into train sets and validation sets), build model (establish neural networks), test model.
  • main.ipynb: main workflow to train machine learning model step by step.


Model Summary (With accuracy: ~0.90)

Frontend Usage

  • Full code examples: Link
  • Frontend code examples: Link

Realtime Communication: WebRTC

  • Implementing a video chat application: Zoom, Microsoft Teams, Google Meets.
  • Technology: WebRTC provides APIs for capturing audio and video streams from the user's camera and microphone. These streams can be transmitted in real-time between peers, enabling video and audio calls directly in the browser without the need for third-party plugins.
  • Experience: Participants can join meetings via web browsers or dedicated applications on various devices.  

Use Digital Ocean to host LINUX OS on Cloud


Setting up the virtual environment


Live Chatbox created using gsap library and DOM
Link

Live video prototype using WebSocket

I tested the web application on webcams of my two laptops. This live video prototype is basing on HTML <canvas> and <image>. The web sockets receives canvas data and emit this data to all other clients. All clients update their src within <image>.
Link

May. 3rd: Phase II Project Planning

To-dos:
  1. User test: Based the prediction result of each hand pose, adjust the threshold of each emoji emission.
  2. ML model improvement: Based on user tests, improve model architecture, add / reduce classes if needed.
  3. Animation improvement: Add more animation effects to each emoji emission for a more engaging user experience.
  4. More functionality: Add feature to indicate whom the emoji is from.
  5. Modularization: If I have more time after Midterm, I will maybe implement React for the frontend, and fast API for the backend.

User Testing I


User testing in a 16-student class on 3/11/2024

Model Training II: TensorFlow Handpose & MediaPipe V2

TensorFlow Handpose V2 has a higher performance and lower latency that helps to improve the performance of our transfer learning model.

User Testing II


User Testing with 8 people on May 3rd.


User Testing with 11 people on May 6th.

More Brainstorming ...

When you hear the words "Artificial Intelligence", what are the first four things that come to your mind? 

- Automation: Task automation
- GPT: ChatGPT
- Diffusion: text-to-image generation
- Computer vision: classification, recognition, etc...


DEVICE OR DIGITAL SERVICE WITH AI
AI FUNCTIONS

Email Inbox: 
AI helps to classify promotions, spams, and important emails.

Check Depositing: 
AI helps to recognize and verify the depositing amount, bank account, and routing number, etc.

Texting and Mobile Keyboards: 
AI helps to guess the next possible word within the context.

Netflix:
AI helps to algorithmatically recommend possible movies / TV series that match your preference.

Google Search:
AI helps to guess what you possibly intend to search.

Social Media Platforms:
AI helps to algorithmatically push feeds that draw your attention.

Automated Message Systems:
AI helps to automatically fill in customers’ name, and send messages massively across the subscribers at an ideal time.

What do we gain by having AI in our everyday lives?

AI has made processing lots of tasks easier, more efficient, and more effective. 
What do we lose by having AI in our everyday lives? 

AI has taken away lots of our attention from our actual lives to excessive screen time.

Use the prompts below to help design an AI system.

WHAT PROBLEM ARE YOU TRYING TO ADDRESS ?
Difficulty in filtering ideal job positions that fall under certain criterias (e.g. international students, master students... )
HOW CAN AI HELPSOLVE THIS ISSUE?
An AI algorithm that read through job descriptions and auto classify position requirements. 

WHAT ROLE WILL HUMANS HAVE IN ADDRESSING THIS ISSUE?
Humans will code a program that is able to automatically take job descriptions as inputs, and filter job positions under certain criterias.

WHAT DATA DO YOU NEED TO CREATE AN AITO HELP YOU ADDRESSYOUR ISSUE?
Possibly texts focusing on job descriptions.

THIS DATA IN A WAY THAT RESPECTS INDIVIDUALS’ PRIVACYAND CONSENT?
Yes.


Teachable Machine

I captured classes of Arduino Nano and Arduino Uno, and feed them into Teachable Machine for training. The preview of this model is showcased as below.

Successful detection of Arduino Uno
Successful detection of Arduino Nano.