We notice you are using a browser that our site does not support. Some features on this site may not work correctly. We recommend that you upgrade to a supported browser

Celebrating Open Access Week: Researchers using our Public Data File

Gabriela Mejias's picture

Openness is a key ORCID value, and to follow that principle and celebrate Open Access Week, each year we release our annual public data file. The 2019 file, which is now available, contains a snapshot of all ORCID record data that researchers had marked public in the ORCID Registry at the time that the file was created on October 1, 2019. Our public data file is published under a CC0 waiver and is free for everyone to use — at the time of writing, last year’s file had been viewed over 5,000 times and downloaded more than 3,200 times.

As 2019 is ORCID’s year of the researcher, this time we are happy to share with you two examples of researchers who are using our public data file data for their research.

Dario Rodighiero (Postdoctoral Associate at MIT, Faculty of Comparative Media Studies/Writing)

The Worldwide Map of Research is a project that analyzes the research community in terms of relationships and individual trajectories. It relies on the ORCID public data file — a good example of how a non-profit organization can support making research open and accessible to everyone — and also my way off supporting the ORCID initiative. The project originates from my PhD thesis that illustrates a visual method to represent a faculty of EPFL. Thanks to the support of the Swiss National Science Foundation, my research has now expanded in scale, moving from individual faculty members to analyze world institutions and universities. My interdisciplinary approach allows me to explore the ORCID dataset from two perspectives. The first is purely visual and focuses on the way in which individuals and institutions can be properly and fairly represented using graphic design. The second is about the processing of data, using recent developments in Natural Language Processing and Artificial Intelligence to extract meaningful information. The intersection of these two perspectives enables a new way of doing research by reflecting on computation, visualization, and interpretation of data at the same time. This specific project focuses on three simple steps: 1) the study of the collaborations between institutions (see figure below), 2) the analysis of the individual trajectories of scholars through institutes over time, and 3) the creation of a recommendation system based on collected and generated data. I’m grateful to my supervisor Kurt Fendt,  MIT and my colleagues, Ringgold for allowing me to use their database, the Harvard MetaLab for their intellectual support, Mauro Martino (IBM) and Paolo Ciuccarelli (Northeastern University) for their advice, and Abram Turner (MIT) for the help provided during his internship. 

Robert Eyre (PhD Candidate at the University of Bristol, Department of Engineering Mathematics)

Of all possible career paths, academic researchers have perhaps the most opportunity to travel and migrate internationally as they form new collaboration links and relationships. To study their migrations,  academics’ research outputs can be examined to form a trajectory of affiliations over time. However, this can be difficult when researchers share the same name, a common problem in migration studies that use bibliometric data. To combat this we can extract the CVs of millions of researchers from the public ORCID public data file. This data set is over 300 times larger than the largest known email-based study on scientific migration, conducted by Franzoni et al. in 2012

We plan to extend our use of the ORCID public data file, to identify the effect that select events (such as Brexit or the Eurozone crisis) have had on migration in the research community. We are working on a method to avoid irregularities in the data, such as an over-representation of people who recently obtained their PhD and the over- and under- representations of individual countries. This will be achieved by creating randomized reference models for the observed data and by comparing these models to our observed temporal network obtaining p-values for each possible migration decision in each year. These scores will let us identify in which years an abnormal number of migrations has occurred.  

More about the ORCID public data file

If you are interested in using our public data file, you can download it from the ORCID repository. This year’s file is available in XML format and is further divided into separate files for easier management. One file contains the full record summary for each record. The rest of the data is divided into 11 files which contain the activities for each record including full work data. 

We release the public data file under a CC0 1.0 Public Domain Dedication and use of the public data is in accordance with our Privacy Policy. We have created recommended community norms to use the file.

If you are planning to or already using the public data file for your research, please let us know, we’d love to hear from you!