Automatically detect web compatibility issues
The aim of this project is creating a tool to automatically detect web compatibility issues without human intervention.
The project uses Selenium to collect web page screenshots automatically on Firefox and Chrome.
The crawler loads web pages from the URLs on the webcompat.com tracker and tries to reproduce the reported issues by interacting with the elements of the page. As soon as the page is loaded and after every interaction with the elements, the crawler takes a screenshot.
The crawler repeats the same steps in Firefox and Chrome, generating a set of comparable screenshots.
The data/
directory contains the screenshots generated by the crawler (N.B.: This directory is not present in the repository itself, but it will be created automatically after you setup the project as described in the Setup paragraph).
Now that we have a dataset with labels, we can train a neural network to automatically detect screenshots that are incompatible. We are currently using a Siamese architecture with different Convolutional Neural Networks, but are open to test other ideas.
We plan to employ three training methodologies:
For the unsupervised training, we are using a related problem for which we already have labels (detecting screenshots belonging to the same website). The pre-training can be helpful because we have plenty of data (as we don’t need to manually label them) and we can fine-tune the network we pre-train for our problem of interest.
Python 3 is required.
git-lfs
if available on your system (in case of using PackageCloud).git lfs clone --recurse-submodules REPO_URL
pip install pipenv && pipenv install --dev && pipenv shell
.The pretrain.py or train.py script can be run to train the neural network, with the following options:
-network To select which network architecture to use
-optimizer To select the optimizer to use
-classification_type Either Y vs N + D or Y + N vs D
--early_stoppping (Optional) To stop training when validation accuracy has stopped improving
Real-time communication for this project happens on Mozilla’s IRC network, irc.mozilla.org, in the #webcompat channel.