PCI Archaeology

395

Title *

Creating an Additional Class Layer with Machine Learning to counter Overfitting in an Unbalanced Ancient Coin Datasetuse asterix (*) to get italics

Authors *

Sebastian Gampe, Karsten TollePlease use the format "First name initials family name" as in "Marie S. Curie, Niels H. D. Bohr, Albert Einstein, John R. R. Tolkien, Donna T. Strickland"

Year *

2024

Picture *

Abstract *

<p>We have implemented an approach based on Convolutional Neural Networks (CNN) for mint recognition for our Corpus Nummorum (CN) coin dataset as an alternative to coin type recognition, since we had too few instances for most of the types (classes). However, this shift increased an existing problem with our dataset: the extremely unbalaced number of instances per class. While some of our classes consist of only 20 instances, others consist of several hundred. After training our VGG16 model we unsurprisingly observed an overfitting of these “big” classes within the confusion matrix. To reduce this problem, we tried to split the classes with the most images into several smaller ones and called them additional class layers. We use three different machine learning (ML) approaches to perform this breakdown. One is an unsupervised clustering method without additional manual work. The other two are supervised approaches taking into account the motifs of the coins themselves: a) an object detection model that predicts trained entities, and b) a Natural Language Processing (NLP) method to find entities in the textual descriptions of the coins. Based on the combination of obverse and reverse results from these two approaches the new additional class layer were defined. After retraining of our mint recogntion model with these new classes, we evaluated the results based on the confusion matrix. In our case, the best results could be observed by forming additional class layer based on the NLP method.</p>

Indicate the full web address (DOI or URL) giving public access to these data (if you have any problems with the deposit of your data, please contact contact@archaeo.peercommunityin.org). In case all raw data are included in the preprint, indicate the DOI or URL of the preprint. *

https://doi.org/10.5281/zenodo.8298077You should fill this box only if you chose 'All or part of the results presented in this preprint are based on data'. URL must start with http:// or https://

Indicate the full web address (DOI or URL) giving public access to these scripts (if you have any problems with the deposit of your scripts, please contact contact@archaeo.peercommunityin.org). In case all raw scripts are included in the preprint, indicate the DOI or URL of the preprint. *

You should fill this box only if you chose 'Scripts were used to obtain or analyze the results'. URL must start with http:// or https://

Indicate the full web address (DOI, SWHID or URL) giving public access to these codes (if you have any problems with the deposit of your codes, please contact contact@archaeo.peercommunityin.org). In case all raw codes are included in the preprint, indicate the DOI or URL of the preprint. *

https://doi.org/10.5281/zenodo.8298077You should fill this box only if you chose 'Codes have been used in this study'. URL must start with http:// or https://

Keywords (optional)

Machine Learning, Image Recognition, Convolutional Neural Networks, Unbalanced Dataset, Ancient Coins

Methods that require specific expertise (optional)

NonePlease indicate the methods that may require specialised expertise during the peer review process (use a comma to separate various required expertises).

Thematic fields *

Computational archaeology

Suggested reviewers - Suggest up to 10 reviewers (provide names and Email addresses). (Optional)

Simon Carrignon suggested: sce@utk.edu No need for them to be recommenders of PCIArchaeology. Please do not suggest reviewers for whom there might be a conflict of interest. Reviewers are not allowed to review preprints written by close colleagues (with whom they have published in the last four years, with whom they have received joint funding in the last four years, or with whom they are currently writing a manuscript, or submitting a grant proposal), or by family members, friends, or anyone for whom bias might affect the nature of the review - see the code of conduct

Opposed reviewers - Suggest up to 5 people not to invite as reviewers. (Optional)

e.g. John Doe [john@doe.com]

Submission date

2023-08-29 16:26:41

Recommender

Alex Brandsen

Reviewers

or Register
Submit a preprint