IVE's big data apps create biz values from govt open data

Hong Kong IT Job Advertisement Data Mining ReportHong Kong IVE cloud majors show big data analytics in action by creating business values out of government open data.

What are the most sought-after IT skills in Hong Kong today?

What is the most popular programming language among employers?

What additional programming languages does a JAVA-proficient job candidate ought to know?

If someone is proficient in C++, which district in Hong Kong will he likely find the most job matches? 

More importantly, can you answer the above questions accurately without surveying all the job ads of the week? 

Students doing a cloud major at the Hong Kong Institute of Vocational Education (IVE) can. Indeed, any employer or IT job candidate can answer similar queries with the help of Data-HK.com, a cloud data applications platform created by an award-winning student team at IVE. 

Cyrus Wong, R&D coordinator, department of multimedia and internet technology, Hong Kong IVE

"Our work [on Data-HK] is not a student assignment or project, but real production work that benefits the society!" 

-- Cyrus Wong, R&D coordinator, department of multimedia and internet technology, Hong Kong IVE

Data-HK is a big data analytics project that comprises five cloud-based applications, which provide open access to live public data for research and business purposes. The various data feeds for the big data analytics applications include: 1) Data.One -- public data sets released by the Hong Kong Government; 2) internet keyword analysis from web content; and 3) scientific data from the business sector. To support its huge computation needs, the IVE team uses Amazon Web Services' public cloud services (AWS) to build unlimited capacity to share public data sets. 

The mastermind behind Data-HK is Cyrus Wong, research and development coordinator, department of multimedia and internet technology at Hong Kong IVE (Lee Wai Lee), plus eight students who are currently taking a two-year full-time program called the Higher Diploma in Cloud and Data Centre Administration. 

Big data analytics in action 

The primary source of data for Data-HK, Data.One, was launched by the Office of Government Chief Information Officer (OGCIO) in March 2011. Data.One provides geo-referenced public facilities data and real-time traffic data for free download and value-added reuse by the public. 

Since Data.One provides only real-time data in disparate file formats (.xml, csv, json, etc.), this gave rise to Data.Two, a "user friendly version" of Data.One produced by the Data-HK team. Unlike Data.One, Data.Two archives all Data.One datasets and converts them from inconsistent data formats to unified Restful API for easier data retrieval. 

On top of Data.Two, the Data-HK team has built five different big data analytics applications. The first three projects below uses public data sets from Data.One, while the last two projects use other publicly available data:

- LicenCheck (an Android and web app that provides a map view of Hong Kong with markers locating all licensed restaurants);

- Missing HK (an Android app that lists all wanted persons in Hong Kong);

- HK Traffic Live (a web app that renders real-time traffic and weather information on a Hong Kong map);

- IT Jobs Analysis (a web app that uses data science to investigate and extract the keywords from 192,000 IT job ads to help make informed decisions about one's IT career development); and

- DSE English Learning (An Android App that text analytics and word database to find the most commonly used words appeared in past HKDSE English Exam) 

Value creation takes priority 

"These big data analytics applications add value to the existing government datasets," Wong said. "Take LicenCheck for example, while the government provides the addresses of licensed restaurants, Data-HK converts the addresses into geo location searchable on a map." According to Wong, inspectors at the Food and Environmental Hygiene Department can inspect restaurant licenses efficiently by following the suggested routes on the map. 

"Data-HK aims to investigate the application potential of the PSI (Public Sector Information) deeper and wider as we realize the huge potential values of PSI. Through these applications, it will also encourage the government departments to disclose more their data to the public," Wong said. 

"Based on these open data, more and more applications will be developed and more people in the society will be benefited. We will keep a close cooperation with the government departments, corporations, and public media to develop Hong Kong's knowledge-based economy." 

Wong reiterated: "Our work [on Data-HK] is not a student assignment or project, but real production work that benefits the society!" 

100Gb public data archive on public cloud 

While the public data sets on Data.One are real-time data without archive, IVE's Data.Two platform archives all the public datasets into Amazon Simple Storage Service (Amazon S3) in Singapore's data centers, and the data are available for free download. 

"It is very nice to use the versioning feature of Amazon S3 to handle our project. We are planning to share the bucket with the pay by requester way." 

"AWS does not limit the number of versions of our data. As we create a new version of our data in the cloud every two minutes, we now have about 200,000 versions [of Data.Two] on Amazon S3." 

At present, Data-HK does not charge a fee for using its service. In the future, it may charge users by the bandwidth users consume. 

"There is no need to back up our data once we upload them to the cloud," Wong added. "Once S3 declares success [upload] it replicates multiple copies in the cloud within the same data center location."

Cloud storage cost: US$6 per month

"We don't have any real physical computer [to work with the Data-HK project]," said Wong. Indeed the team did not pay a dollar to put together the infrastructure for Data-HK. Instead, the Data-HK team obtained an AWS Education Grant, which takes the form of AWS free usage credits for lecturers and students to use any of the AWS services. 

Below is the full list of AWS services that the Data-HK team uses: 1) Amazon Elastic Compute Cloud (Amazon EC2); 2) Amazon Simple Storage Service (Amazon S3); 3) Amazon Relational Database Service (Amazon RDS); 4) Amazon Route 53 (scalable Domain Name System (DNS) web service); 5) Elastic Loading Balancing (ELB); 6) Amazon Elastic MapReduce (Amazon EMR) (a web service that process vast amount of data); 7) Amazon CloudFront (a web service for content delivery using AWS global network of edge locations); 8) Auto Scaling; and 9) Amazon CloudWatch (a web service that provides monitoring for AWS cloud resources). 

Wong declined to disclose the amount awarded by AWS Education Grant, but said the team has consumed "a very minor portion of the grant" as at today. "We have uploaded 100Gb of data on AWS so far, and it costs us just US$6 per month," he said. 

Hong Kong ICT award

On April 7, the Data-HK team at IVE (Lee Wai Lee) was awarded the Golden Prize (Champion) at the Hong Kong ICT Awards 2014: Best Student Invention Award (Tertiary Institution category) with their cloud data application platform Data-HK. 

"Enabled by AWS cloud technology, IVE students are 100% focusing on project development, without spending any time on meaningless work such as installing window or addressing conflicting DNS! The students launched their work with small Amazon EC2 instances and handle the workloads with auto scaling which is nice," Wong concluded.