Free for Everyone, Always: The ORCID Public API and Data File

Alice Meadows's picture

As part of our commitment to openness, we have a public API that is available for community use, and we also release an annual snapshot of publicly available data in the ORCID Registry. We’re always excited to learn about interesting ways these tools are being used by the community! Here are some that we know about; we’d love to learn about others! If you’re using the public API rather than the member one, please remember to still follow our best practices for authenticating and displaying iDs - this helps build a trusted PID infrastructure for everyone’s benefit!

Project THOR

As you may know, ORCID was one of the partners in this EU-funded project, which aimed to “establish seamless integration between articles, data, and researchers across the research lifecycle.” One of the outputs of this project was a Study of ORCID Adoption Across Disciplines and Locations, based on the 2016 ORCID public data file. Among the study’s key findings were:

  • There’s a higher representation of ORCID iDs in the natural, health and applied sciences than in arts, humanities, economic and social sciences
  • However, the proportion of humanists with ORCID iDs is disproportionately high compared with the number of researchers in this field overall (9.6% versus 4.1%)
  • The proportion of humanities users doubled between 2012-16 from 4.1% to 9.6%, but the number of works connected to their records only grew by about 50% during that period, from 3.8% to 5.5%
  • There are far more ORCID iD holders in Europe than in any other region

We’ll be updating this analysis as part of our Academia and Beyond project in 2019 - more on that soon!

OpenCitations

OpenCitations is a scholarly infrastructure organization, directed by David Shotton and Silvio Peroni, which is dedicated to open scholarship and the publication of open bibliographic and citation data using Semantic Web (Linked Data) technologies. The organization is also engaged in advocacy for semantic publishing and open citations. One of its main outputs is the OpenCitations Corpus (OCC), an open database of downloadable bibliographic and citation data that conforms to the OpenCitations Data Model. It has been created and continuously expanded using a set of scripts, available in the OpenCitations GitHub repository, which gather metadata from external services -- including the ORCID Public API -- that describe both the citing and the cited articles involved in a citation. OCC routinely uses the ORCID Public API to try to retrieve ORCID iDs for all authors and editors named in the Crossref metadata for a given DOI. OpenCitations has also recently released BCite (sources available on GitHub), a web application that enables users such as journal editors to obtain 'clean' verified and enriched bibliographic reference text strings, for inclusion in the reference list of the citing article they have in hand. This ensures that accurate rather than erroneous references can be published in the version of record; the references are transformed into RDF data compliant with the OpenCitation Data Model, including ORCID iDs where available. 

Cobaltmetrics

Cobaltmetrics is an altmetrics provider, powered by a knowledge graph that contains billions of identifiers linked by billions of properties. Many different sources are combined to build the graph, mostly in the form of linked metadata shared by publishers, trusted repositories, and identifier registries (see their documentation on URI transmutation). They aim to make privacy and web-scale data mining compatible by using ORCID identifiers as the main contributor identifiers in Cobaltmetrics, and (as of October 2018) have added a total of 4,725,354 identifiers to the knowledge graph using our Public Data File. Cobaltmetrics is now working on contributor-level altmetrics aggregation, with the goal of showcasing what they know about any contributor from all the sources that they monitor. In future, if there’s interest from the community, they will consider a deeper integration with ORCID’s API to pull fresh data into their knowledge graph as often as possible. For more information, please see this Cobaltmetrics blog post.

Science article on migratory scientists

Science magazine journalist, John Bohannon, received the prestigious National Academics communication award for his analysis of scientists’ migration patterns using the ORCID public data file. Among his findings:

  • About one third of scientists who earned their PhD in the UK subsequently moved away, compared with only 15% of scientists in other EU countries
  • The annual influx of scientists to the US stagnated for several years after 2001 - possibly because of the World Trade Center attacks
  • Some researchers are “super-migrators” moving countries frequently for the sake of their career
  • Early career researchers (those who were more recently awarded their PhD) are overrepresented in the ORCID Registry indicating that they’re signing up for an iD faster than older researchers

While John noted the constraints of using ORCID data for this type of analysis, he also believes that: “As ORCID grows into a more comprehensive sample, policymakers will likely use it to track the impact of their efforts to entice research talent. Meanwhile, the data offer a unique glimpse into the migratory lives of the world's knowledge producers.”

Taxonomists on ORCID

You may remember that earlier this year, David Shorthouse wrote about how he’s creating a compendium of taxonomists using a combination of Twitter plus our public API. At the time he had around 1,500 taxonomists on his list; that has now grown to a whopping 5,640 (at time of writing)! For those who don’t already have an iD he’s included a handy link on the home page, to make signing up - or signing in - super easy. As he commented in his original post: “Active campaigns like this engage communities of researchers with the ORCID ecosystem. Its well-constructed public API permits very rapid production of value-added products of benefit to those same communities. There’s potential here for other interesting ways to capitalize on positive feedback-driven network effects.” We agree!

Thanks to all of the above for sharing their use of our public API and data file -- we are proud to be open in name and practice!