ML Model Accuracy/Results

Handwritten Alphanumeric model

Iteration 1:

Model detailsDatasetAccuracy achieved

Model trained and inference on

Existing dataset

99.80%

Model inference on

NIST dataset

62.70%

Average across all datasets

-

81.25%

Reasoning - As model is trained on the existing dataset, doesn't perform good on NIST dataset

Iteration 2:

Model detailsDatasetAccuracy achieved

Model trained on

Existing dataset + NIST misclassifications

83.30%

Reasoning - After training the model on NIST misclassifications, improvement in accuracy

Iteration 3:

Model detailsDatasetAccuracy achieved

Model trained on

Existing dataset + NIST misclassifications

93.90%

Reasoning - After training the model on NIST misclassifications as per previous checkpoint, improvement in accuracy

Iteration 4:

Model detailsDatasetAccuracy achieved

Model trained on

Existing dataset + manually collected dataset from sheets (~1K samples)

93.90%

Reasoning - Training the model upon previous checkpoint and adding manually collected data, improvement in accuracy

Handwritten Digits model

Iteration 1:

Model detailsDatasetAccuracy achieved

Model trained and inference on

Existing dataset

99.90%

Model inference on

NIST dataset

60.00%

Model inference on

Obtained production data (~50 samples)

97.70%

Average across all datasets

-

85.80%

Reasoning - As model is trained on the existing dataset, doesn't perform good on NIST dataset

Iteration 2:

Model detailsDatasetAccuracy achieved

Model trained on

Existing dataset + NIST misclasifications

96.40%

Reasoning - After training the model on NIST misclassifications, improvement in accuracy

Iteration 3:

Model detailsDatasetAccuracy achieved

Model trained on

Existing dataset + NIST misclasifications + production dataset (~50 samples)

99.70%

Reasoning: After training the model on NIST misclassifications and production dataset, improvement in accuracy

Iteration 4:

Model detailsDatasetAccuracy achieved

Model trained on

Existing dataset + NIST misclasifications + manually collected dataset from sheets (~8.6k samples)

98.30%

Reasoning: As averaging upon a large production dataset, the accuracy slightly dips as compared to iteration 3

Sample dataset images

Existing dataset

Handwritten alphanumeric

0a56f9eb1545428f8d9aea0665b7c460 0b0de241b7e64e918fada2f896efdf24 00b6ef24818b4323b4ada81ad433af67 0ac132e3693c47ccb2fe40fb465ba060 0aeaa24048ff4c5d8312d37671c6e08e 0b7f1f188faf453b83704a069791f28f 0bd15fe615db458781bce7dc32a926b5 0adb585eae814b62a56695650cac0666 0b0ffa9c3dcf497c8da259d391f5f159 0b2c4c540a7247629e1dd7a9abd59962

Handwritten digits

0b1a2b29b0904989a3f6df3a13b93d89 0abb9b222a1142bdaabd30b6742ca8b2 0afd4ac792f14b8185d54cb9d0937b3e 0ac2ad40-7af8-44b4-9829-284a2b33416c_printed 00af4fd32ddc42efbfb1222725599ecf 0aa35cc5-4472-455d-a575-7167c15df849_generated 0a8986b29e804002a3136ddf110f3638 0a1d909f2f2847b795ac40cefa452c3e 0__0040c46a-eae7-4219-bffb-7dd418ad9ffb_up_govt 00c9de07f7b640d083680a5c21661a8b

NIST dataset

Handwritten alphanumeric

hsf_0_00017 hsf_0_00043 hsf_0_00017 hsf_0_00020 hsf_0_00010 hsf_0_00017 hsf_0_00020 hsf_0_00008 hsf_0_00025 hsf_0_00010

Handwritten digits

hsf_0_00015 hsf_0_00010 hsf_0_00017 hsf_0_00009 hsf_0_00117 hsf_0_00009 hsf_0_00012 hsf_0_00019 hsf_0_00009 hsf_0_00013

Manually collected

Handwritten alphanumeric

img014-032 img015-044 img016-045 img023-039 img026-001 img029-045 img013-040 img032-053 img034-055 img011-033

Handwritten digits

33665 33679 33758 33745 33650 33753 31372 33362 31075 31065

Some unhandled misclassifications

18103 28926 607 30658 402 32358 24141 30838 24158 32225

Reasoning: Generally occurs if the digits are written in corners of the cell

Last updated