Close printable page

Recommendation

A significant contribution to the problem of unbalanced data in machine learning research in archaeology

Alex Brandsen based on reviews by Simon Carrignon, Joel Santos and 1 anonymous reviewer

A recommendation of:

Creating an Additional Class Layer with Machine Learning to counter Overfitting in an Unbalanced Ancient Coin Dataset

Sebastian Gampe, Karsten Tolle (2024), Zenodo, ver.4, peer-reviewed and recommended by PCI Archaeology https://doi.org/10.5281/zenodo.8298077

Read preprint in preprint server

Data used for results

Codes used in this study

Abstract

EN

AR

ES

FR

HI

JA

PT

RU

ZH-CN

Creating an Additional Class Layer with Machine Learning to counter Overfitting in an Unbalanced Ancient Coin Dataset

We have implemented an approach based on Convolutional Neural Networks (CNN) for mint recognition for our Corpus Nummorum (CN) coin dataset as an alternative to coin type recognition, since we had too few instances for most of the types (classes). However, this shift increased an existing problem with our dataset: the extremely unbalaced number of instances per class. While some of our classes consist of only 20 instances, others consist of several hundred. After training our VGG16 model we unsurprisingly observed an overfitting of these “big” classes within the confusion matrix. To reduce this problem, we tried to split the classes with the most images into several smaller ones and called them additional class layers. We use three different machine learning (ML) approaches to perform this breakdown. One is an unsupervised clustering method without additional manual work. The other two are supervised approaches taking into account the motifs of the coins themselves: a) an object detection model that predicts trained entities, and b) a Natural Language Processing (NLP) method to find entities in the textual descriptions of the coins. Based on the combination of obverse and reverse results from these two approaches the new additional class layer were defined. After retraining of our mint recogntion model with these new classes, we evaluated the results based on the confusion matrix. In our case, the best results could be observed by forming additional class layer based on the NLP method.

Machine Learning, Image Recognition, Convolutional Neural Networks, Unbalanced Dataset, Ancient Coins

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

إنشاء طبقة فئة إضافية باستخدام التعلم الآلي لمواجهة التجاوز في مجموعة بيانات العملات القديمة غير المتوازنة

لقد قمنا بتنفيذ نهج يعتمد على الشبكات العصبية التلافيفية (CNN) للتعرف على العملات المعدنية لمجموعة بيانات العملات Corpus Nummorum (CN) الخاصة بنا كبديل للتعرف على نوع العملة، حيث كان لدينا عدد قليل جدًا من الحالات لمعظم الأنواع (الفئات) . ومع ذلك، أدى هذا التحول إلى زيادة المشكلة الحالية في مجموعة البيانات لدينا: العدد غير المتوازن للغاية من المثيلات لكل فئة. في حين أن بعض فصولنا تتكون من 20 مثيلًا فقط، فإن البعض الآخر يتكون من عدة مئات. بعد تدريب نموذج VGG16 الخاص بنا، لاحظنا بشكل غير مفاجئ فرط تركيب هذه الفئات "الكبيرة" داخل مصفوفة الارتباك. لتقليل هذه المشكلة، حاولنا تقسيم الفئات التي تحتوي على أكبر عدد من الصور إلى عدة فئات أصغر وأطلقنا عليها طبقات فئة إضافية. نحن نستخدم ثلاثة أساليب مختلفة للتعلم الآلي (ML) لإجراء هذا التفصيل. إحداها هي طريقة تجميع غير خاضعة للرقابة دون عمل يدوي إضافي. أما النهجان الآخران فهما نهجان خاضعان للإشراف مع الأخذ في الاعتبار أشكال العملات المعدنية نفسها: أ) نموذج اكتشاف الأشياء الذي يتنبأ بالكيانات المدربة، وب) طريقة معالجة اللغة الطبيعية (NLP) للعثور على الكيانات في الأوصاف النصية للعملات المعدنية. واستنادا إلى مزيج من النتائج العكسية والعكسية من هذين النهجين، تم تحديد طبقة الطبقة الإضافية الجديدة. بعد إعادة تدريب نموذج التعرف على النعناع الخاص بنا باستخدام هذه الفئات الجديدة، قمنا بتقييم النتائج بناءً على مصفوفة الارتباك. في حالتنا، يمكن ملاحظة أفضل النتائج من خلال تشكيل طبقة فئة إضافية بناءً على طريقة البرمجة اللغوية العصبية.

التعلم الآلي، التعرف على الصور، الشبكات العصبية التلافيفية، مجموعة البيانات غير المتوازنة، العملات القديمة

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Creación de una capa de clase adicional con aprendizaje automático para contrarrestar el sobreajuste en un conjunto de datos de monedas antiguas desequilibrado

Hemos implementado un enfoque basado en redes neuronales convolucionales (CNN) para el reconocimiento de mentas para nuestro conjunto de datos de monedas Corpus Nummorum (CN) como una alternativa al reconocimiento de tipos de monedas, ya que teníamos muy pocas instancias para la mayoría de los tipos (clases). . Sin embargo, este cambio aumentó un problema existente en nuestro conjunto de datos: el número extremadamente desequilibrado de instancias por clase. Mientras que algunas de nuestras clases constan de sólo 20 instancias, otras constan de varios cientos. Después de entrenar nuestro modelo VGG16, como era de esperar, observamos un sobreajuste de estas clases "grandes" dentro de la matriz de confusión. Para reducir este problema, intentamos dividir las clases con más imágenes en varias más pequeñas y las llamamos capas de clases adicionales. Utilizamos tres enfoques diferentes de aprendizaje automático (ML) para realizar este desglose. Uno es un método de agrupación en clústeres no supervisado sin trabajo manual adicional. Los otros dos son enfoques supervisados que tienen en cuenta los motivos de las monedas mismas: a) un modelo de detección de objetos que predice entidades entrenadas, y b) un método de procesamiento del lenguaje natural (NLP) para encontrar entidades en las descripciones textuales de las monedas. Basándose en la combinación de resultados anversos e inversos de estos dos enfoques, se definió la nueva capa de clase adicional. Después de volver a entrenar nuestro modelo de reconocimiento de menta con estas nuevas clases, evaluamos los resultados en función de la matriz de confusión. En nuestro caso, los mejores resultados se podrían observar formando una capa de clase adicional basada en el método NLP.

Aprendizaje automático, reconocimiento de imágenes, redes neuronales convolucionales, conjunto de datos desequilibrado, monedas antiguas

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Création d'une couche de classe supplémentaire avec l'apprentissage automatique pour contrer le surapprentissage dans un ensemble de données de pièces anciennes déséquilibré

Nous avons mis en œuvre une approche basée sur les réseaux de neurones convolutifs (CNN) pour la reconnaissance des monnaies pour notre ensemble de données de pièces Corpus Nummorum (CN) comme alternative à la reconnaissance des types de pièces, car nous avions trop peu d'instances pour la plupart des types (classes). . Cependant, ce changement a accru un problème existant avec notre ensemble de données : le nombre extrêmement déséquilibré d'instances par classe. Alors que certaines de nos classes ne comportent que 20 instances, d’autres en comportent plusieurs centaines. Après avoir entraîné notre modèle VGG16, nous avons observé sans surprise un surapprentissage de ces « grandes » classes au sein de la matrice de confusion. Pour réduire ce problème, nous avons essayé de diviser les classes contenant le plus d’images en plusieurs classes plus petites et les avons appelées couches de classes supplémentaires. Nous utilisons trois approches différentes d'apprentissage automatique (ML) pour effectuer cette décomposition. La première est une méthode de clustering non supervisée sans travail manuel supplémentaire. Les deux autres sont des approches supervisées prenant en compte les motifs des pièces elles-mêmes : a) un modèle de détection d'objets qui prédit les entités entraînées, et b) une méthode de traitement du langage naturel (NLP) pour trouver des entités dans les descriptions textuelles des pièces. Sur la base de la combinaison des résultats avers et revers de ces deux approches, la nouvelle couche de classes supplémentaire a été définie. Après avoir recyclé notre modèle de reconnaissance de menthe avec ces nouvelles classes, nous avons évalué les résultats sur la base de la matrice de confusion. Dans notre cas, les meilleurs résultats pourraient être observés en formant une couche de classes supplémentaire basée sur la méthode NLP.

Apprentissage automatique, reconnaissance d'images, réseaux de neurones convolutifs, ensemble de données déséquilibré, pièces anciennes

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

असंतुलित प्राचीन सिक्का डेटासेट में ओवरफिटिंग का मुकाबला करने के लिए मशीन लर्निंग के साथ एक अतिरिक्त क्लास लेयर बनाना

हमने सिक्का प्रकार की पहचान के विकल्प के रूप में अपने कॉर्पस न्यूमोरम (सीएन) सिक्का डेटासेट के लिए टकसाल मान्यता के लिए कन्वेन्शनल न्यूरल नेटवर्क्स (सीएनएन) पर आधारित एक दृष्टिकोण लागू किया है, क्योंकि हमारे पास अधिकांश प्रकारों (वर्गों) के लिए बहुत कम उदाहरण थे। . हालाँकि, इस बदलाव ने हमारे डेटासेट के साथ एक मौजूदा समस्या को बढ़ा दिया है: प्रति वर्ग उदाहरणों की बेहद असंतुलित संख्या। जबकि हमारी कुछ कक्षाओं में केवल 20 उदाहरण हैं, जबकि अन्य में कई सौ उदाहरण हैं। हमारे वीजीजी16 मॉडल को प्रशिक्षित करने के बाद हमने आश्चर्यजनक रूप से भ्रम मैट्रिक्स के भीतर इन "बड़े" वर्गों की ओवरफिटिंग देखी। इस समस्या को कम करने के लिए, हमने सबसे अधिक छवियों वाले वर्गों को कई छोटे वर्गों में विभाजित करने का प्रयास किया और उन्हें अतिरिक्त वर्ग परतें कहा। इस ब्रेकडाउन को करने के लिए हम तीन अलग-अलग मशीन लर्निंग (एमएल) दृष्टिकोणों का उपयोग करते हैं। एक अतिरिक्त मैन्युअल कार्य के बिना एक अनियंत्रित क्लस्टरिंग विधि है। अन्य दो पर्यवेक्षित दृष्टिकोण हैं जो सिक्कों के रूपांकनों को ध्यान में रखते हैं: ए) एक ऑब्जेक्ट डिटेक्शन मॉडल जो प्रशिक्षित संस्थाओं की भविष्यवाणी करता है, और बी) सिक्कों के पाठ्य विवरण में संस्थाओं को खोजने के लिए एक प्राकृतिक भाषा प्रसंस्करण (एनएलपी) विधि। इन दो दृष्टिकोणों के विपरीत और विपरीत परिणामों के संयोजन के आधार पर नई अतिरिक्त वर्ग परत को परिभाषित किया गया। इन नई कक्षाओं के साथ हमारे टकसाल मान्यता मॉडल के पुन: प्रशिक्षण के बाद, हमने भ्रम मैट्रिक्स के आधार पर परिणामों का मूल्यांकन किया। हमारे मामले में, एनएलपी पद्धति के आधार पर अतिरिक्त वर्ग परत बनाकर सर्वोत्तम परिणाम देखे जा सकते हैं।

मशीन लर्निंग, इमेज रिकग्निशन, कन्वेन्शनल न्यूरल नेटवर्क, असंतुलित डेटासेट, प्राचीन सिक्के

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

機械学習を使用して追加のクラス層を作成し、不均衡な古代コインデータセットの過学習に対処する

ほとんどの種類 (クラス) に対してインスタンスが少なすぎるため、コインの種類認識の代替として、コーパスヌモラム (CN) コインデータセットのミント認識に畳み込みニューラルネットワーク (CNN) に基づくアプローチを実装しました。。ただし、この変更により、クラスごとのインスタンス数が非常に不均衡になるという、データセットの既存の問題がさらに増大しました。クラスの中には、わずか 20 個のインスタンスで構成されるクラスもありますが、数百個のインスタンスで構成されるクラスもあります。 VGG16 モデルをトレーニングした後、当然のことながら、混同行列内でこれらの「大きな」クラスの過剰適合が観察されました。この問題を軽減するために、最も多くの画像を含むクラスをいくつかの小さなクラスに分割し、それらを追加クラスレイヤーと呼ぶことにしました。この分析を実行するには、3 つの異なる機械学習 (ML) アプローチを使用します。 1 つは、追加の手動作業を必要としない教師なしクラスタリング手法です。他の 2 つは、コイン自体のモチーフを考慮した教師ありアプローチです。a) トレーニングされたエンティティを予測するオブジェクト検出モデル、b) コインのテキスト記述内でエンティティを見つける自然言語処理 (NLP) メソッドです。これら 2 つのアプローチからの表と逆の結果の組み合わせに基づいて、新しい追加のクラス層が定義されました。これらの新しいクラスを使用してミント認識モデルを再トレーニングした後、混同行列に基づいて結果を評価しました。私たちの場合、NLP メソッドに基づいて追加のクラス層を形成することで最良の結果が得られました。

機械学習、画像認識、畳み込みニューラルネットワーク、不均衡データセット、古代コイン

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Criando uma camada de classe adicional com aprendizado de máquina para combater o overfitting em um conjunto de dados de moedas antigas desequilibrado

Implementamos uma abordagem baseada em Redes Neurais Convolucionais (CNN) para reconhecimento de moedas para nosso conjunto de dados de moedas Corpus Nummorum (CN) como uma alternativa ao reconhecimento de tipo de moeda, uma vez que tínhamos poucas instâncias para a maioria dos tipos (classes). . No entanto, esta mudança aumentou um problema existente com o nosso conjunto de dados: o número extremamente desequilibrado de instâncias por classe. Enquanto algumas de nossas classes consistem em apenas 20 instâncias, outras consistem em várias centenas. Depois de treinar nosso modelo VGG16, observamos, sem surpresa, um overfitting dessas “grandes” classes dentro da matriz de confusão. Para reduzir esse problema, tentamos dividir as classes com mais imagens em várias classes menores e as chamamos de camadas de classes adicionais. Usamos três abordagens diferentes de aprendizado de máquina (ML) para realizar essa análise. Um deles é um método de agrupamento não supervisionado, sem trabalho manual adicional. As outras duas são abordagens supervisionadas que levam em consideração os motivos das próprias moedas: a) um modelo de detecção de objetos que prevê entidades treinadas, eb) um método de Processamento de Linguagem Natural (PNL) para encontrar entidades nas descrições textuais das moedas. Com base na combinação dos resultados anverso e reverso dessas duas abordagens, a nova camada de classe adicional foi definida. Após retreinar nosso modelo de reconhecimento mint com essas novas classes, avaliamos os resultados com base na matriz de confusão. No nosso caso, os melhores resultados podem ser observados formando uma camada de classe adicional baseada no método PNL.

Aprendizado de máquina, reconhecimento de imagem, redes neurais convolucionais, conjunto de dados desequilibrado, moedas antigas

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Создание дополнительного уровня классов с помощью машинного обучения для противодействия переоснащению в несбалансированном наборе данных древних монет

Мы реализовали подход, основанный на сверточных нейронных сетях (CNN), для распознавания монетного двора для нашего набора данных монет Corpus Nummorum (CN) в качестве альтернативы распознаванию типа монет, поскольку у нас было слишком мало экземпляров для большинства типов (классов). . Однако этот сдвиг усугубил существующую проблему с нашим набором данных: крайне несбалансированное количество экземпляров на класс. В то время как некоторые наши классы состоят всего из 20 экземпляров, другие состоят из нескольких сотен. После обучения нашей модели VGG16 мы неудивительно наблюдали переобучение этих «больших» классов в матрице путаницы. Чтобы уменьшить эту проблему, мы попытались разделить классы с наибольшим количеством изображений на несколько более мелких и назвали их дополнительными слоями классов. Для выполнения этой разбивки мы используем три разных подхода машинного обучения (ML). Один из них — метод неконтролируемой кластеризации без дополнительной ручной работы. Два других представляют собой контролируемые подходы, учитывающие мотивы самих монет: а) модель обнаружения объектов, которая предсказывает обученные объекты, и б) метод обработки естественного языка (NLP) для поиска объектов в текстовых описаниях монет. На основе комбинации результатов этих двух подходов на аверсе и реверсе был определен новый дополнительный классовый слой. После переобучения нашей модели распознавания монетного двора с использованием этих новых классов мы оценили результаты на основе матрицы путаницы. В нашем случае наилучшие результаты можно было бы наблюдать, формируя дополнительный классовый слой на основе метода НЛП.

Машинное обучение, распознавание изображений, сверточные нейронные сети, несбалансированный набор данных, древние монеты

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

使用机器学习创建额外的类层，以应对不平衡的古钱币数据集中的过度拟合

我们已经实现了一种基于卷积神经网络 (CNN) 的方法，用于对我们的 Corpus Nummorum (CN) 硬币数据集进行薄荷识别，作为硬币类型识别的替代方案，因为对于大多数类型（类）来说，我们的实例太少。然而，这种转变增加了我们数据集现有的问题：每个类的实例数量极其不平衡。虽然我们的一些类仅包含 20 个实例，但其他类则包含数百个实例。在训练我们的 VGG16 模型后，我们毫不奇怪地观察到混淆矩阵中这些“大”类的过度拟合。为了减少这个问题，我们尝试将图像最多的类分成几个较小的类，并将它们称为附加类层。我们使用三种不同的机器学习 (ML) 方法来执行此细分。一种是无监督聚类方法，无需额外的手动工作。另外两种是考虑硬币本身主题的监督方法：a）预测经过训练的实体的对象检测模型，b）自然语言处理（NLP）方法，用于在硬币的文本描述中查找实体。基于这两种方法的正面和反向结果的组合，定义了新的附加类层。使用这些新类重新训练我们的薄荷识别模型后，我们根据混淆矩阵评估结果。在我们的例子中，通过基于 NLP 方法形成额外的类层可以观察到最好的结果。

添加67c3586ff44339f17141a8b5c2985 使用机器学习创建额外的类层，以应对不平衡的古钱币数据集中的过度拟合 aacec96836e14b05be8a43e5b47f62f9 机器学习、图像识别、卷积神经网络、不平衡数据集、古钱币

机器学习、图像识别、卷积神经网络、不平衡数据集、古钱币

Submission: posted 29 August 2023, validated 29 August 2023
Recommendation: posted 29 March 2024, validated 16 April 2024

Cite this recommendation as:
Brandsen, A. (2024) A significant contribution to the problem of unbalanced data in machine learning research in archaeology. Peer Community in Archaeology, 100395. https://doi.org/10.24072/pci.archaeo.100395

Recommendation

This paper [1] presents an innovative approach to address the prevalent challenge of unbalanced datasets in coin type recognition, shifting the focus from coin class type recognition to coin mint recognition. Despite this shift, the issue of unbalanced data persists. To mitigate this, the authors introduce a method to split larger classes into smaller ones, integrating them into an 'additional class layer'.

Three distinct machine learning (ML) methodologies were employed to identify new possible classes, with one approach utilising unsupervised clustering alongside manual intervention, while the others leverage object detection, and Natural Language Processing (NLP) techniques. However, despite these efforts, overfitting remained a persistent issue, prompting the authors to explore alternative methods such as dataset improvement and Generative Adversarial Networks (GANs).

The paper contributes significantly to the intersection of ML techniques and archaeology, particularly in addressing overfitting challenges. Furthermore, the authors' candid acknowledgment of the limitations of their approaches serves as a valuable resource for researchers encountering similar obstacles.

This study stems from the D4N4 project, aimed at developing a machine learning-based coin recognition model for the extensive "Corpus Nummorum" dataset, comprising over 19,600 coin types and 49,000 coins from various ancient landscapes. Despite encountering challenges with overfitting due to the dataset's imbalance, the authors' exploration of multiple methodologies and transparent documentation of their limitations enriches the academic discourse and provides a foundation for future research in this field.

Reference

[1] Gampe, S. and Tolle, K. (2024). Creating an Additional Class Layer with Machine Learning to counter Overfitting in an Unbalanced Ancient Coin Dataset. Zenodo, 8298077, ver. 4 peer-reviewed and recommended by Peer Community in Archaeology. https://doi.org/10.5281/zenodo.8298077

PDF recommendation

Conflict of interest:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article. The authors declared that they comply with the PCI rule of having no financial conflicts of interest in relation to the content of the article.

Funding:
The D4N4 project is funded by the Deutsche Forschungs Gemeinschaft (DFG) in the program “e-research-Technologien”.

Reviews

Evaluation round #2

DOI or URL of the preprint: https://doi.org/10.5281/zenodo.10424274

Version of the preprint: 2

Author's Reply, 28 Mar 2024

Dear Alex,

thank you for your feedback. We now have included a related work section in our paper. We think that it has improved the paper even further.

With kind regards

Sebastian Gampe and Karsten Tolle

https://doi.org/10.24072/pci.archaeo.100395.ar2

Decision by Alex Brandsen, posted 10 Jan 2024, validated 10 Jan 2024

Dear Authors,

thank you for making these revisions, this has significantly improved the paper, and the updated structure makes a lot more sense.

I understand you were/are under time constraints, but unfortunately I believe the paper should include a related work / literature review section listing relevant work related to coin classification and/or countering overfitting. Discussing this related work would enhance the understanding of why the presented methods were chosen.

It's also a requirement from PCI, see https://archaeo.peercommunityin.org/help/guide_for_authors#h_8540551119281613314275613, section 3.2.2.: "The introduction should build on relevant recent and past research performed in the field."

Other than that, I think you addressed all the reviewer's comments, and it should be ready to publish once you add the related work section in the introduction.

I believe the PCI system will automatically set a deadline for these edits, but let me know if you need more time, or if you have any other questions.

Kind regards,

Alex

https://doi.org/10.24072/pci.archaeo.100395.d2

Evaluation round #1

DOI or URL of the preprint: https://doi.org/10.5281/zenodo.8298078

Version of the preprint: 1

Author's Reply, 22 Dec 2023

Dear reviewers,

thank you for your helpful feedback. We have tried to incorporate as many of your suggestions as possible in the new version. Unfortunately, we were not able to implement everything due to time constraints. Nevertheless, we believe that the paper has been significantly improved thanks to your help.

With kind regards

Sebastian Gampe and Karsten Tolle

https://doi.org/10.24072/pci.archaeo.100395.ar1

Decision by Alex Brandsen, posted 24 Oct 2023, validated 24 Oct 2023

Dear Authors,

thank you again for submitting to PCI. I read your paper with great interest, and would like to see it published in the CAA proceedings.

However, the 3 reviewers all recommend revisions before accepting the paper. Please check the individual reviews and make revisions where needed, or address the comments of the reviewers. One common theme across the reviews is the structure/clarity of the paper, I would recommend paying specific attention to this particular issue.

If you have any questions, please do not hesitate to contact me.

Kind regards,

Alex Brandsen.

https://doi.org/10.24072/pci.archaeo.100395.d1

Reviewed by Joel Santos, 15 Oct 2023

This paper comes from a project, D4N4 (Data quality for Numismatics based on Natural language processing and Neural Networks), aimed to develop a machine learning-based coin type recognition model to cover as many coin types as possible from the "Corpus Nummorum" (CN) dataset. This dataset comprises approximately 19,600 coin types and over 49,000 coins from four ancient landscapes (Thrace, Moesia Inferior, Troad, and Mysia).

This paper addresses the challenges they encountered while dealing with a highly unbalanced dataset, where some coin classes had very few images while others had hundreds of photos. Their main focus was not on improving machine learning algorithms but on the setup to overcome overfitting situations.

The authors aimed to enhance their mint recognition model by subdividing larger classes that impact the overfitting into smaller ones. They employed three different methods for this purpose:

Deep Clustering: An unsupervised clustering method. Although they created new classes based on clusters, this approach resulted in inhomogeneous clusters and lower accuracy.

Object Detection: Based on a Regional Convolutional Neural Network (R-CNN) to predict objects on coins and build new classes based on frequent combinations of subjects. However, this approach did not reduce overfitting effectively.

Natural Language Processing (NLP): This method showed the most promise in reducing overfitting by utilizing an NLP pipeline and creating new classes based on the entities found in textual descriptions of coins. Still, the confusion between new and old classes remained an issue.

The authors explored several approaches to mitigate overfitting in their coin recognition model, but none provided a perfect solution. They continue to investigate other methods to address the problem, such as improving the dataset, using Generative Adversarial Networks (GANs) to create virtual coin images, and eventually making their dataset available for other researchers to apply their machine learning methods.

One of the most admirable things in the text is its admission that the three hypotheses to solve the overfitting problem did not work. We should have more academic texts unafraid of showing their difficulties in achieving the intended results. This could be very helpful to projects following similar steps, avoiding previously attempted dead-ends.

While the research is commendable, several areas in the text require criticism or improvement:

1. Clarity and Structure: The text is somewhat convoluted, making it challenging for readers, mainly less technical ones, to follow the main points. It would benefit from a clearer and more structured presentation. For example, right at the beginning of the text (lines 45-46), it presents the paper's goal before stating the issue they are trying to tackle. A minor issue is the English review, which must be done before the publication (e.g., line 41, “an very unbalanced” or line 46, “For this pupose”)

2. Literature Review and Relevance of Methods: The text lacks some literature review. What has been done so far in this or other fields using the chosen methods? Discussing related work would enhance understanding of why the presented methods were chosen. The rationale for selecting each method should be clarified.

3. Data Description: The text mentions the dataset, but it could be more informative about its origins, sources, and potential biases. This information is crucial for assessing the dataset's quality and comparing similar works trying to achieve the same goals.

4. Evaluation Metrics: While the text presents a table with results, it doesn't elaborate on the specific evaluation metrics used. A brief explanation of the metrics (e.g., Top-1 Accuracy) would help readers interpret the results. However, this depends on the target readers of the journal that will publish this article.

5. Overfitting measurement and class choice: The overfitting measurement is done visually (at least in the text). Since this is a technical paper, it would benefit from a more measurable approach. Why the Pergamon and Perinthos were chosen? The justification, the size of the sample for Pergamon, and the fact that the Perinthos collection is very different from the Pergamon one falls short. The reasons for this choice should be connected with the initial problem, the overfitting situation. Are there other classes with fewer samples but with high overfitting problems? Are there classes with a high number of samples but with low overfitting problems?

6. Discussion of Results: The text provides results but lacks an in-depth discussion. A discussion of what the results imply and their significance would be valuable. The text mentions that the three methods were deemed unsuitable. Was that decision only based on the final accuracy? Was it based on the visual overfitting check regarding the sub-classes of Pergamon and Perinthos? Does the overall overfitting of those two classes increase or diminish (being only big inside those same two classes)? If it decreases, how does the initial hypothesis of reducing the overfitting effect to increase accuracy remain? A deeper discussion would be appreciated, with more measurable results on the three hypotheses result presentation.

7. Future Work: The text mentions future steps but could provide a more precise outline of what is planned next. This would give readers a sense of the research's ongoing and prospective significance. Is the approach taken in this paper (reducing the size of certain classes) abandoned?

In summary, the text discusses a study on image recognition and machine learning. However, there is potential to enhance the content's clarity, structure, justifications, and context. By offering more detailed information and explaining the methodology, results, and implications, the text could become more accessible and informative for readers, particularly those not experts in the field. The tackling of these situations would make this article ready for publication. My advice is to review and resubmit.

https://doi.org/10.24072/pci.archaeo.100395.rev11

Reviewed by Simon Carrignon, 22 Oct 2023

The paper provides an excellent example of the overfitting problem often encountered in machine learning and discusses in details the challenges in resolving this issue. I think it's a valuable contribution to the literature on applying Machine Learning techniques in Archaeology and would be an ideal publication for PCI Archaeology. The online tool provided with the paper, along with the detailed explanation of their limitations presented in the paper, stand as a significant resource on itself, that many other researchers will find useful.

However, I have a few comment on the form of the paper's structure that merit consideration before publication.

While I recognize that emphasizing the 'unsuccessful' nature of the outcomes may not seem the most gratifying approach, I really hope researchers will become more comfortable with this transparency in the near future. This case study serves as a good instance of such reporting, where various methods are explored but ultimately fall short of resolving a particular issue. However, I think the paper's title and abstract could still mislead reader that the authors will provide a method that successfully addresses the challenges of overfitting. Clarifying that this isn't the case (particularly with the use of 'creating' in the title, which I find somewhat misleading) would be beneficial. The paper's most important value in my option is its detailed account of the rigorous attempts to combat overfitting through three ML classification methods, none of which fully succeeded due to uneven sampling in the dataset. This insight could prevent others from spending time and effort on similar approaches. I think this point should be explicitly stated in the abstract and introduction. The way it reads for know feels to me that it is still not clear if one of the methods will solve or not the problem.

Still in the introduction, the initial sentences are a bit confusing; it feels like the real introduction doesn't start until line 55. Everything before that (about the goal and focus, from lines 40 - 45) becomes much clearer after the subject and the D4N project are properly presented. A minor reorganization of the introduction, including the aspects I mentioned earlier, could significantly clarify the paper's goals and interests.

Finally, I think that a few words could be added about the general limitation this paper expose on the use of Machine Learning in archaeology. Classification is a huge part of archaeology and isn't much discussed in the paper at all ; while an interesting aspect of this research is it's illustration of how some archaeological problem are unlikely to be solved by machine learning due to the nature of the archaeological record and how machine learning algorithm works. No matter how sophisticated are the neural networks, the archaeological record will always be heavily biased, uneven and uncertain and other statistical methods need to be used to asses this uncertainty. The paper could make an more general and interesting point by addressing some of these issues.

Regarding the online app, the code on Google Colab appears functional; though, I didn't have the opportunity to extensively test it by uploading different images of coins myself.

I would be very happy to read a revised version of the paper.

https://doi.org/10.24072/pci.archaeo.100395.rev12

Reviewed by anonymous reviewer 1, 24 Sep 2023

Summary of the content

This paper presents a method to counter a problem which is well known in the case of coin type recognition: to have an unbalanced dataset for which models will tend to classify the most represented class in the dataset. The authors tried to tackle this problem, by shifting the problem from coin class type recognition to coin mint recognition. This led to more samples per class, though the problem of an unbalanced dataset is still present. They decided to split the biggest classes into smaller ones to obtain a balanced dataset. These newly introduced classes have been incorporated in an ‘additional class layer’. They used three different ML approaches to find new possible classes for the two mint classes with the majority of samples. The first approach is based on an unsupervised clustering method with additional manual work, the other approaches take into account the motifs of the coins themselves. The first relies on an object detection model that predicts trained entities and the second is based on Natural Language Processing (NLP) to find entities in textual descriptions of the coins. Based on the combination of obverse and reverse results the new additional class layer has been defined for each of these two approaches.

Considerations of the work

The motivation of the work is well explained, and properly tackling the problem of unbalanced datasets is fundamental to defining robust models through ML approaches.

However, sometimes the work is difficult to follow and methods and procedures have to be described in a clearer manner. Moreover, the paper presents different ways to refer to figures (e.g., fig. or Figure) and when the authors introduce an acronym they should use it consistently throughout the paper.

In detail

Starting from the abstract, the authors write “One is an unsupervised clustering method without additional manual work. The other two are supervised approaches taking into account the motifs of the coins themselves:”. We suggest authors rewrite it as “One is an unsupervised clustering method without additional manual work. The other two are supervised approaches which explicitly take into account the motifs of the coins themselves:”. This is because we do not know what the unsupervised method is taking into account to cluster the samples, it could be the case it is taking into account the motifs, too.

Moreover, we suggest rewriting “Based on the combination of obverse and reverse results from these two approaches the new additional class layer were defined.” to make clearer the fact that the creation of the new class layer has been defined independently for the method a) and for the method b).

We suggest authors name Regional Convolutional Neural Network as Region Based Convolution Neural Network.

In lines 47 and 50 authors use ‘object detection’ and ‘Object Detection’, we suggest choosing a standard and continuing to use it during the paper.

In lines 53-54, ‘All of the above methods run on Juypter Notebook and Python programming language.’ to be rewritten as ‘All of the above methods run on Juypter Notebook and are written in Python programming language.’.

In lines 58-59 the authors use two times ‘to improve’, we encourage them to use synonyms in sentences that are one after the other. For example, they could change “The goal is to use it to improve and verify the data quality of existing data, and also use it to improve the process of entering new coins.” to “The goal is to use it to improve and verify the data quality of existing data, and also use it to help the process of entering new coins.”.

In lines 63-64 authors write ‘we merged the obverse and reverse images of a coin into a single image showing both (as can be seen below in Figure 3)’, and to see the image the reader has to scroll down the paper. It is better to add a single image near this paragraph. We suggest to put it above.

We suggest the authors use a consistent notation: there are both Figure and fig.. Moreover, we suggest using a similar notation to refer to tables (if referring to figures as fig., let’s refer to tables as tab.).

To help the readability of the images we suggest putting a slope in the text below the figure: the text regarding the names of the columns. For example, let’s use 45 degrees of slope to help the readability.

In line 82 authors say “Most mint classes consist of several different coin types which differ more or less from each other.”. It would be appreciated a figure with some examples of high and low difference in the same mint class.

In line 84, the authors say that in the case of mint recognition “we encounter another problem when training mint recognition: the unbalanced dataset.”, this problem is present also in the case of class type classification, so we suggest making a connection to the previous case “we still encounter the problem of having an unbalanced dataset when training mint recognition, though the number of instances per class has been increased to a manageable number to train DL models”.

In line 85, the authors say “to the different output of the mint”, we suggest explaining better this sentence. What is the output? In the same line, we suggest saying “in the area of interest” and not “in our area”.

In lines 97-98, the authors say that a confusion matrix has the diagonal deep red when the model is 100% correct. The colour depends on the settings set for plotting, we suggest using a more correct language, saying that the matrix would be non-zero only on the diagonal.

In the case of the method Deepclustering, for the fact that Pergamon and Perintohos have 3600 images and 1800 images, respectively, could be the case of considering 15 clusters for Pergamon and half or a lower number for Perinthos? Is there a reason for considering the same number of clusters for the two mints? To consider a number of clusters proportional to the cardinality of the samples of the two mints could possibly lead to a more balanced result.

We suggest explaining better the sentence “This means that our original classes do not live on as a remnant collection with images that could not be merged with others to form a new class.” It seems that the clustering model is using all the images from Pergamon and Perinthos and creating clusters using all the images. This leads to the definition of clusters that comprise both images from the two mints. How do the authors derive the 15 clusters for Pergamon and Perinthos, respectively? This has to be explained better.

In lines 183-184 authors say “which produces a set of bounding boxes to predict the region and the class of an object in an image”. We suggest writing say “which produces a set of region proposals that are likely to contain objects, and uses a CNN to extract features from each region proposal to classifying objects within these regions.”.

Authors say “We trained the R-CNN model on frequently occurring subjects on the coins like “head” or “sitting person”.”. It would be interesting for the reader to know exactly what are the frequently occurring subjects, though authors give the reference of a thesis.

We suggest adding the images of the coins whose description is in Fig. 6.

We suggest removing the citations in ‘Summary and Conclusions’.

Line 254 presents the sentence “Neither approach produced appropriate results for our problem”, what approach? We suppose it is the one based on R-CNN.

Errors

· Lines 29-30 ‘new additional class layer were defined’ to ‘new additional class layers were defined’

· There are many n.d. present in the paper to be removed. Maybe there is some problem with the insertion of the citations.

https://doi.org/10.24072/pci.archaeo.100395.rev13