How to use the API with "R"

DLUHC Open Data provides highly flexible programmatic access to its data, through the SPARQL endpoint. This API can be used to automate report-writing, publish data visualisations, or interactive tools for exploring the data. This guide describes how to pull data from DLUHC Open Data into the "R" Statistical Programming environment.

This section assumes a basic understanding of manipulating data in R and RStudio, and we'll be using the RStudio integrated development environment.

For this section, we will use R to make an API call to DLUHC Open Data, and load the data into a dataframe. From this point, it is then possible to take advantage of R’s many functions and libraries, even creating interactive tools with Shiny.

To do this, we will use the SPARQL query that we developed in the SPARQL user guide to extract the number of rough-sleepers by local authority:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
SELECT ?areaname ?periodlabel ?value 
WHERE { 	
	?obs <http://purl.org/linked-data/cube#dataSet> <http://opendatacommunities.org/data/homelessness/rough-sleeping/count> .	
	?obs <http://opendatacommunities.org/def/ontology/geography/refArea> ?areauri . 	
	?obs <http://opendatacommunities.org/def/ontology/time/refPeriod> ?perioduri . 	
	?obs <http://opendatacommunities.org/def/ontology/homelessness/roughSleepingObs> ?value . 	
	?areauri rdfs:label ?areaname . 	
	?perioduri rdfs:label ?periodlabel . 
}

We can paste this directly into the SPARQL endpoint, run it, and we get this:

Results of SPARQL in OpenDataCommunitiies

To take this into R, we could simply download the CSV file, and read it into our project. Taking advantage of the API allows us to always be pulling in the most up-to-date information; the data on MHCLG Open Data is regularly updated.

To handle SPARQL APIs like the one on MHCLG Open Data, there is a library available to import, called SPARQL.

The first thing to do, then, is to install the appropriate package from CRAN. Into your RStudio console, type:

install.packages("SPARQL")

And then enable the package in your current project:

library(SPARQL)

The SPARQL package is now available for use in the project. To use the SPARQL function, we simply need to pass in two parameters - the web address of the endpoint, and the SPARQL query itself, encoded. The best way to do this is to create two variables, and load the information into them:

The endpoint variable is straightforward. This will be the same for each call we make against MHCLG Open Data:

endpoint <- "http://opendatacommunities.org/sparql"

For the query string, we first need to encode it. We can use a tool like this: https://meyerweb.com/eric/tools/dencoder/

Simply paste the SPARQL query text into the box, and press the ‘encode’ button. You can then copy this into R.

To load the encoded query string into the query variable, we do this:

query <- "PREFIX%20rdfs%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0A%0ASELECT%20%3Fareaname%20%3Fperiodlabel%20%3Fvalue%20%0AWHERE%20%7B%20%0A%3Fobs%20%3Chttp%3A%2F%2Fpurl.org%2Flinked-data%2Fcube%23dataSet%3E%20%3Chttp%3A%2F%2Fopendatacommunities.org%2Fdata%2Fhomelessness%2Frough-sleeping%2Fcount%3E%20.%0A%3Fobs%20%3Chttp%3A%2F%2Fopendatacommunities.org%2Fdef%2Fontology%2Fgeography%2FrefArea%3E%20%3Fareauri%20.%0A%3Fobs%20%3Chttp%3A%2F%2Fopendatacommunities.org%2Fdef%2Fontology%2Ftime%2FrefPeriod%3E%20%3Fperioduri%20.%0A%3Fobs%20%3Chttp%3A%2F%2Fopendatacommunities.org%2Fdef%2Fontology%2Fhomelessness%2FroughSleepingObs%3E%20%3Fvalue%20.%0A%3Fareauri%20rdfs%3Alabel%20%3Fareaname%20.%0A%3Fperioduri%20rdfs%3Alabel%20%3Fperiodlabel%20.%0A%7D%0A"

This means we now have our endpoint and query string loaded into variables, so we can simply use the SPARQL function to get the data and load it into a dataframe:

odcqueryres <- SPARQL(endpoint,query)

Running this can take a few seconds, depending on the complexity of the query. odcqueryres Is a list, containing two items - a dataframe called results, and an empty object called namespaces. To work with the data, we want to isolate the dataframe. To do this, we just need to run this line:

odcdf <- odcqueryres$results

We can now use this data frame as we would any other in R - combine with other datasets, map the data, or create an interactive data explorer.

To continue exploring our datasets, return to opendatacommunities.org;

How to use the API with "R"

Related Articles