OpenRefine

What is OpenRefine?

OpenRefine is a standalone, open-source desktop application that enables the collection, manipulation, transformation and standardisation of incomplete or inconsistent data without affecting the data’s original structure. Although it looks like a spreadsheet, and works with standard spreadsheet file formats, it operates like a database and can be used to quickly sort, arrange and contextualise data.

Open-source software has been developed by a community of people and is made available in full for others to use, inspect, reuse and contribute to.

What is OpenRefine useful for?

OpenRefine offers a wide variety of functions to collect, normalise and transform data, including:

  • standardising inconsistencies such as date formats or misspelled values
  • resolving or removing duplicates within columns
  • splitting or combining cell values
  • fetching and combining data from multiple sources, including spreadsheets and Web services.
  • working with large datasets

OpenRefine is also used for exploration and discovery within data sets by using facets and filters based on textual, numeric or date information from within the data set. Transformed datasets can be exported and actions taken within a dataset can be saved and reapplied to new sets.

Normalisation is the process of making small changes to the format of your data so that you can accurately analyse it, compare it or combine it with other data. A basic example of normalisation is changing measurements from inches to centimetres in one dataset so that the data can be combined with another set where all measurements are in centimetres.

Where can I get it?

OpenRefine is available free from http://openrefine.org/

OpenRefine workshops

OpenRefine events run regularly throughout the year and include the following workshops:

  • Introduction to OpenRefine

Registering for a workshop

Workshop registration is available through myDevelopment. This aligns with myPlan and training records for staff, and enables more streamlined processes for assigning credit towards the Monash Doctoral Program for Graduate Research students.

To search for upcoming Data Fluency workshops log into myDevelopment from your myMonash portal. Type "Data Fluency" or the software you are interested in (eg. "python") into the search box to find a list of available workshops.

No access to myDevelopment?

If you do not have access to myDevelopment, please complete the relevant form:

Staff Development will validate and approve access. This process will take one business day to complete. An email will be sent to users confirming access details.

Where else to find training

You can find more online training materials for this tool via the Library. Visit LinkedIn Learning or Safari to access a range of videos, eBooks and online courses, or try using Library Search to find other resources to help you master this tool.

If you're still not sure where to start, use the details below to get in touch with the Data Fluency community.

Find out more

For further information or advice come to the weekly drop-in session, join the community discussion on Slack, subscribe to our mailing list or email datafluency@monash.edu.

.

OpenRefine

Can be used for:
  • Collecting data
  • Preparing & cleaning data

View our workshop materials on GitHub