ArenD100

undefined's photo
Jupyter Notebook

Final

 5  3
# Final_Project ## English Alphabet Machine Learning Model This project utilizes the EMNIST-letters image database. This database is an extension of the MNIST database of handwritten digits. EMNIST datasets can be found at: https://www.nist.gov/node/1298471/emnist-dataset. MNIST datasets can be found at: http://yann.lecun.com/exdb/mnist/. This project imported data by pip installing the emnist (and mnist) libraries. Note - the "data/python-mnist" is from sorki/python-mnist on github. This repository can also be utilized with EMNIST testing. This project did not utilize the sorki folder. #### Notes on EMNIST 'Letters' Dataset "The EMNIST Balanced dataset contains a set of characters with an equal number of samples per class. The EMNIST Letters dataset merges a balanced set of the uppercase and lowercase letters into a single 26-class task. The EMNIST Digits and EMNIST MNIST dataset provide balanced handwritten digit datasets directly compatible with the original MNIST dataset. The EMNIST Letters dataset seeks to further reduce the errors occurring from case confusion by merging all the uppercase and lowercase classes to form a balanced 26-class classification task. In a similar vein, the EMNIST Digits class contains a balanced subset of the digits dataset containing 28,000 samples of each digit." ## About Model This model uses the EMNIST 'Letters' dataset. Refer to 'EMNIST.ipynb' for a detailed look at the model. This model trained at 95% accuracy and test at 91% accuracy. ``` model.fit(X_train, y_train, batch_size=128, epochs=10, shuffle=True, verbose=2) ``` ```Epoch 10/10 1248000/1248000 - 19s - loss: 0.1345 - acc: 0.9512 `` ``` model_loss, model_accuracy = model.evaluate(X_test, y_test, verbose=2) print(f"Loss: {model_loss}, Accuracy: {model_accuracy}") 20800/20800 - 2s - loss: 0.4131 - acc: 0.9130 Loss: 0.41310153188356846, Accuracy: 0.9129807949066162 ## Predicting the Model A: 1, B: 2, C: 3, D: 4, E: 5, F: 6, G: 7, H: 8, I: 9, J: 10, K: 11, L: 12, M: 13, N: 14, O: 15, P: 16, Q: 17, R: 18, S: 19, T: 20, U: 21, V: 22, W: 23, X: 24, Y: 25, Z: 26 This model tested a couple letters from the test. Both were correctly predicted. This model also collected some images from the internet to test. These images came from https://graphemica.com. These images are in the 'images' folder. The first image predicted incorrectly, but the same letter in a different font was predicted correctly. The second imported image seems to be more similar to the emnist dataset. This model attempted to read newly handwritten data (written and imported myself) to test the accuracy of the emnist model made and actual handwriting. The results were not as thought. An image of each letter of the alphabet (capital only) were pictured and uploaded. Their are 4-sets: Pencil, Pen, Sharpie, and Marker. The goal of this was to determine which writing utensil would test the most accurate with the emnist set, however; the pictures imported are all sideways and rotating them made the images much lighter and illegible thus predicting failure for every one tested (only one of each utensil was predicted in the EMNIST.ipynb) A self created model based off the 26 images uploaded for each utensil were created only to have an accuracy of 0. More photos would need to be taken as well as better pixel conversions need to be done to get from 4D down to 2D. ## Conclusions The emnist model itself runs extremely well. The model run with api images need more testing but predicted correct results. Actual handwritten data needs to be redone to come up with better predictions. This is a case of bad data going in, bad data coming out with the self model. Better quality pictures need to be taken and uploaded so rotating is not needed by the 'pillow' library.