The paper “Percolation Package – From Script Sharing to Package Publication” by Sophie C. Schmidt and Simon Maddison (2023) describes the development of an R package designed to apply Percolation Analysis to archaeological spatial data. In an earlier publication, Maddison and Schmidt (2020) describe Percolation Analysis and provide case studies that demonstrate its usefulness at different spatial scales. In the current paper, the authors use their experience collaborating to develop the R package as part of a broader argument for the importance of code sharing to the research process.
The paper begins by describing the development process of the R package, beginning with borrowing code from a geographer, refining it to fit archaeological case studies, and then collaborating to further refine and systematize the code into an R package that is more easily reusable by other researchers. As the review by Joe Roe noted, a strength of the paper is “presenting the development process as it actually happens rather than in an idealized form.” The authors also include a section about the lessons learned from their experience.
Moving on from the anecdotal data of their own experience, the authors also explore code sharing practices in archaeology by briefly examining two datasets. One dataset comes from “open-archaeo” (https://open-archaeo.info/), an on-line list of open-source archaeological software maintained by Zack Batist. The other dataset includes articles published between 2018 and 2023 in the Journal of Computer Applications in Archaeology. Schmidt and Maddison find that these two datasets provide contrasting views of code sharing in archaeology: many of the resources in the open-archaeo list are housed on Github, lack persistent object identifiers, and many are not easily findable (other than through the open-archaeo list). Research software attached to the published articles, on the other hand, is more easily findable either as a supplement to the published article, or in a repository with a DOI.
The examination of code sharing in archaeology through these two datasets is preliminary and incomplete, but it does show that further research into archaeologists’ code-writing and code-sharing practices could be useful. Archaeologists often create software tools to facilitate their research, but how often? How often is research software shared with published articles? How much attention is given to documentation or making the software usable for other researchers? What are best (or good) practices for sharing code to make it findable and usable? Schmidt and Maddison’s paper provides partial answers to these questions, but a more thorough study of code sharing in archaeology would be useful. Differences among journals in how often they publish articles with shared code, or the effects of age, gender, nationality, or context of employment on attitudes toward code sharing seem like obvious factors for a future study to consider.
Shared code that is easy to find and easy to use benefits the researchers who adopt code written by others, but code authors also have much to gain by sharing. Properly shared code becomes a citable research product, and the act of code sharing can lead to productive research collaborations, as Schmidt and Maddison describe from their own experience. The strength of this paper is the attention it brings to current code-sharing practices in archaeology. I hope the paper will also help improve code sharing in archaeology by inspiring more archaeologists to share their research code so other researchers can find and use (and cite) it.
References
Maddison, M.S. and Schmidt, S.C. (2020). Percolation Analysis – Archaeological Applications at Widely Different Spatial Scales. Journal of Computer Applications in Archaeology, 3(1), p.269–287. https://doi.org/10.5334/jcaa.54
Schmidt, S. C., and Maddison, M. S. (2023). Percolation Package - From script sharing to package publication, Zenodo, 7966497, ver. 3 peer-reviewed and recommended by Peer Community in Archaeology. https://doi.org/10.5281/zenodo.7966497
DOI or URL of the preprint: https://doi.org/10.5281/zenodo.7966498
Version of the preprint: 1
We have revised the paper taking on board all of the comments and feedback from the reviewers. These were really useful, and we believe that we have addressed all of them.
This article combines a case study of the development of an R package implementing Percolation Analysis with general observations about the importance of code sharing in archaeology. The case study details a process beginning with borrowing code from researchers outside archaeology, refining it to fit archaeological case studies, and collaboration beteween the authors to further refine and systematize the code into an R package that is more easily reusable by other researchers.
All three reviewers have generally positive views of the paper, but they also offer some suggestions for revision. I agree that both the case study and the general argument are important, but also agree with many of the suggestions for revision. The manuscript would particularly benefit from better integration between the case study and the general discussion of code sharing. As Rose's review asks, "what are the lessons learned from the case study..."? The article would benefit from making the connections between the sections more explicit.
I also agree with the suggestion that the authors should reconsider submitting the package to CRAN, although that would probably require considerable effort, and I would not see it as a necessary revision to make the paper publishable. The reviewers raise a number of other issues that I do not need to repeat, but most of the suggested revisions are minor and should be easy to accomodate.
I encourage the authors to consider the reviewers' comments and to revise the paper accordingly.
The topic of the paper is very interersting and up to date. In this article the authors deal with issues that are fundamental to share scripts, algorithms and packages that can be useful for different researches, and thus to make different research experiences comparable with each other and repeatable.
The analysis proposed and the observations developed on its basis are interesting and a useful starting point to begin a true organisation of sharing quantitative investigations tools of many types.
Maybe for a future contribution it would be interesting to have statistics about the developments of the tendency in sharing code through the years, in order to understand or hypothesise which policy is more fruitful for this. Sharing code can be one of the bases for a true dialogue among researchers.
Regarding the form, I feel that the first paragraphs ("Inception" and "Collaboration") could be made clearer and more fluent. In particular, this could be accomplished through the introduction of elements that tie together the sentences, which at the moment seem rather juxtaposed paratactically.
Lastly, one of the main issues for code sharing is properly stressed in the article: that is, the long term support of platforms for sharing. Github seems to show some important weaknesses, from this point of view, but it also seems to be the most used platform. A proposal (maybe in the "Conclusions" paragraph) to solve this issue would be useful and fit this paper.
This is an innovative paper describing the process of developing an R package from the inception of an idea to publication. It is refreshingly brief and readable, and I appreciate the authors openness in presenting the development process as it actually happens rather than in an idealised form. They neatly shows that the package format is a natural progression of more informal means of code-sharing—not something to be scared of—and I share the hope that this will encourage colleagues to set their code free.
There is just one point on which I have reservations – that the package is not on CRAN. I'm aware that there is a debate within the R community about whether CRAN's submission requirements processes are too strict, but the fact is that it remains the accepted standard repository for R packages; I don't think most people would call a package "published" until it's on CRAN. Both leading journals in the R development world—the Journal of Statistical Software (https://www.jstatsoft.org/authors) and the R Journal (https://journal.r-project.org/)—require that packages are on CRAN or Bioconductor before a description can be published. The submission requirements and automated quality checks, while sometimes annoying for the developer, are there so that users can count on packages being reliable, well-documented and not likely to stop working in future versions. I think that the authors would agree that this is very important for scientific software.
Similarly, I can't agree that "CRAN is not necessary for the easy use of a package". The existence of a single comprehensive repository (CRAN) with an easy user interface (install.packages()) is often cited as one of the main reasons R has become so widely used in academia,[1][2][3] especially when compared to alternatives like Python, whose dependency management is more oriented towards the needs of software engineers. In my experience of teaching R, not being able to find and install a package the "normal way" is a major obstacle for newer users, no matter how clearly you try to point to the existence of remotes::install_github(). It also makes it harder to use the package in an analysis formatted as a package-based compendium (an approach popularised in no small part thanks to one of the authors!), because it cannot be listed as a formal dependency. These issues seem especially relevant given that the authors themselves (rightly) emphasise that scientific software must be "findable and accessible" if it is to be useful. Like it or not, in the current R world, findable and accessible means CRAN.
Honestly, in my experience, the CRAN submission process is not nearly as bad as is often made out, especially if you use the scaffolding provided by the devtools and usethis packages. I would strongly encourage the authors to at least try and, if they think it could be useful, I'd be happy to review the package code for any potential issues beforehand. Otherwise, the authors might consider including their package in R-universe (https://r-universe.dev), a CRAN alternative that addresses some (but not all) of the above-mentioned problems.
Minor notes:
References:
[1] https://journal.r-project.org/archive/2020/RJ-2020-028/index.html
[2] https://www.nature.com/articles/517109a
[3] https://academic.oup.com/jrssig/article/15/4/14/7029433
This short manuscript provides a case study of how sharing of code via email within a collaboration eventually turned into the development and publication of an R package. This account is followed by a brief general discussion on how code is shared in archaeology with a strong focus on https://open-archaeo.info/ and a brief excursion to the FAIR principles. It concludes that publishing code might be time-consuming, especially regarding a proper documentation of the code, but overall speeds up the scientific progress because the code is easier to re-use and can be understood by users without the need to get in contact with the developer. Consequently, it recommends code publication over individual code sharing.
citation("package")
. For general guidance on how to cite software and the importance of proper software citations, see https://doi.org/10.7717/peerj-cs.86 and https://doi.org/10.12688/f1000research.26932.2