Open Source Big Data Tool

OpenRefine Open Source Big Data Tool

Free Big Data Solution For Handling Large Scale Complex Data

Powerful free big data platform for exploring, transforming, and reconciling large-scale messy data. Extend it with web services and external datasets.

Overview

OpenRefine (previously Google Refine) is an open source big data tool for working with complex datasets. It’s a free big data platform for manipulating data. It helps users to clean up messed-up data and convert it to a different format. Further, OpenRefine allows extending datasets with various web services. OpenRefine has an API that may be used to integrate with third-party services and automate operations. The extensions can be used to expand this big data solution. Users can easily download and install extensions to improve the functionality of the program.

Exploring, facets, transforming, reconciling, exporting, and many more features are available in OpenRefine for working with datasets. Faceting allows users to search a column for patterns, trends, and data variation. The expression feature in OpenRefine allows users to clean and transform data. Wikidata is a free and open knowledge source that both humans and machines can read and edit. OpenRefine has a feature that allows users to fetch and add data to Wikidata. It has a robust tool for exporting data to a variety of formats and external sources. Users can upload data to Google Sheets and add it to Wikidata as well.

This open source big data tool is available for Windows, Mac, and Linux operating systems. Users can find third-party libraries for various programming languages such as PHP, Java, Python, Ruby, and many more for automating the operations. OpenRefine is written in Java and the license for this big solution is BSD-3.

System Requirements

In order to install OpenRefine, you must have the following softwares:

  • Java 8
  • Apache Maven

Features

Following are the key features of OpenRefine:

  • Free and open source
  • Data cleaning and filtering
  • Import data from various formats
  • Data reconciliation and matching
  • Custom query language with GREL and python support
  • Enrich data via APIs
  • Transformation of data
  • Linking data
  • Advanced data operations

Installation

Installing on Linux

Run below command to download the OpenRefine.

$ wget https://github.com/OpenRefine/OpenRefine/releases/download/3.4.1/openrefine-linux-3.4.1.tar.gz

Extract the downloaded file by running below command.

$ tar -xzf openrefine-linux-3.4.1.tar.gz

Start the OpenRefine.

$ ./refine

Refine will then open in your web browser. If it does not start automatically, type http://localhost:3333 to access it.

Installing on Mac

Download Mac kit from https://openrefine.org/download.html.

Open and drag icon into the Applications folder.

Double click on the icon and Refine will open in web browser.

Installing on Windows

Download Windows kit from https://openrefine.org/download.html.

Unzip the downloaded file and double-click on the refine.bat file.

Refine will then open in your web browser. If it does not start automatically, type http://localhost:3333 to access it.

Explore

You may find the following links relevant:

 English