React + TensorFlow.js Machine Learning Image Classifier + Google Translate + Google Text-To-Speech
Deployed at https://tfjs-what-is-this.herokuapp.com, this is a web app that allows you to use your phone or laptop, with machine learning library TensorFlow JS, as an image classifier and to even train your own knn classifier with transfer learning. It is further enhanced by connecting APIs from Google Cloud Console and allow you to both see and hear translations in the 100 or so supported languages by Google Translate and Text-to-Speech.
The Tensorflow.js tutorial uses script tags as the entry point for loading files. I’ll install via npm and use Create React App and modularize where possible. I added Material UI with an eye towards developing a progressive web application. I used transfer learning to extend mobileNets model with a KNN-classifier included with TensorFlow models. I referred and refactored the code from the machine learning image classifier tutorial on Tensorflow JS found here:
https://codelabs.developers.google.com/codelabs/tensorflowjs-teachablemachine-codelab/index.html#0
The goal of this solution is to build a “teachable machine”. In this case, a custom image classifier to be trained in the browser
some next improvements possible:
uuid random unique id generator for component mapping
git clone https://github.com/alexanderywang/tfjs-transfer-learning-image-classifier
and navigate to the project folder
cd tfjs-transfer-learning-image-classifier
npm install
REACT_APP_GOOGLE_API_KEY=123456
more in key learning points below…
npm run start-dev
to start the app on http://localhost:3000/
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-core"></script>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-converter"></script>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-backend-webgl"></script>
some debugging issues with react and tensorflow.js:
index.html needed these scripts to run // fixed this by using tf.ready() prior to loading model. tf.ready() returns a promise that resolves when the currently selected backend (or the highest priority one) has initialized. Await this promise when you are using a backend that has async initialization.
Initially, I had a prediction table displayed with 5 rows. Each row has a button that opens a TranslationModal. The modal has a select dropdown of 100 languages. If a language is selected, a useEffect is triggered to make an api call to Google Translate.
When the table is initially rendered, the useEffect in the modal is triggered 5 times. Is there a way to prevent this?
first solution: I was rendering the modal in the table rows with the button in the modal to open/close. The useEffect ran as soon as the table opened up. that makes sense. So I moved the button to the table rows and only opened the modal when the button clicked. so the useEffect still runs once before a language is selected from the dropdown and once when the language is selected. Not perfect, but better!
Better solution: useCallback to memoize the function. see in useGoogleTranslateAPI.js.
There is a commented out TextTSpeech component that provides a form for translating phrases to supported languages and the ability to hear it spoken.
In ReactJS environment variables are fetched from .env files. If you set the variable in .env file and your variables returned undefined check the below items.
Assumption: You have used Create React App (CRA) to bootstrap your application
helpful link:
https://betterprogramming.pub/how-to-hide-your-api-keys-c2b952bc07e6
very helpful article re: useEffect linting errors/warnings https://www.benmvp.com/blog/helper-functions-react-useeffect-hook/
Since we’re emphasizing user privacy with keeping everything client/browser-side, I thought I’d try session storage in place of localStorage. Every time the user closes the browser, the cache will empty again, but refreshing the browser will not. This will save API calls when taking similar pictures but not keep anything in cache once the app is closed. IndexedDB is another possibilty, as is localStorage for some use cases.
Javascript in Machine Learning is relatively new and it’s important that users can use your models and ideas interactively in the browser without having to install anything
CLIENT SIDE BENEFITS THAT ARE HARDER TO ACHIEVE SERVER SIDE:
it worked! MobileNet’s model seems semi-accurate abou 50-75% of the time, depending on the picture quality, lighting, and item.
if you wanted to access a user’s webcam, that would have to happen through the browser. that requires WebRTC or similar APIs. that means serving a web page that contains client-side javascript (or other) code which accesses the user’s webcam through the browser, and then sends a video feed (or single pictures) of that back to the server.
you can only directly access server resources inside the web app, not client-side resources.
if you wanted to do eye tracking browser-side, there’s OpenCV.js which runs completely in the browser. I don’t know if the required procedures for eye tracking have been ported to OpenCV.js but it’s worth a look. you could do the analysis client-side and just send back heatmaps or lists of coordinates.
Saving tensor models were tricky to accomplish but clearly useful when a user wants to train models and have that model persist. These browser storage APIs were very helpful in accomplishing that goal.
Audio data is binary data. You can read the binary data directly from a gRPC response; however, JSON is used when responding to a REST request. Because JSON is a text format that does not directly support binary data, Text-to-Speech returns a response string encoded in Base64. You must convert the base64-encoded text data from the response to binary before you can play it on a device.
JSON responses from the Text-to-Speech include base64-encoded audio content in the audioContent field. For example:
{
"audioContent": "//NExAARqoIIAAhEuWAAAGNmBGMY4EBcxvABAXBPmPIAF//yAuh9Tn5CEap3/o..."
}
in the response body:
{
"audioContent": string,
"timepoints": [
{
object (Timepoint)
}
],
"audioConfig": {
object (AudioConfig)
}
}
so it returns base64 and we need to convert to wav file to use the
https://developer.mozilla.org/en-US/docs/Web/API/HTMLAudioElement/Audio
the request body also offers a lot of customizable options and Google supports over 100 voices…
I wanted to stall spamming the voice button and used a timeout as a work around, but that created memory leaks in the useEffect for translation. So I used sessionStorage to memoize base64 encoding for repeat button clicks. IndexedDB can store audio, but I think this solution is adequate for now with short audio clips. Session Storage can store about 4 million characters and gets cleared on browser/tab exit. As of now, I’ll use session storage as a cache for both translations and for base64 encoded strings for audio.