WEB Sensorizer: An Architecture for Regenerating Cyber Physical Data Streams from the Web
What is WEB Sensorizer?
WEB sensorizer is a novel tool for acquiring real-world data in a simple yet extensible way. Its major idea is to put “virtual” sensors on web pages that contain meaningful values sensed from the real world. Right figure shows such web pages that contains air quality information sensed in corresponding cities. The numbers shown in these pages are updated periodically, and the past numbers becomes unaccessible since they are stored deep in a database. Virtual sensors put on these pages periodically transmit the numbers, which are scraped from them. Since there are a number of web pages that contain real world data, and also deploying a virtual sensor needs just a few steps of GUI manipulation, virtual sensing can generate a huge amount of data that help understand the real world. On the system’s aspect, our virtual sensing technique is a set of the following components.
This client-side tool is an extension module of the Chrome browser that enables browser users to deploy virtual sensors on almost arbitrary elements on a web page.
Probe is the server-side program that inputs a virtual sensor definition, which includes the URL of a web page and the target elements’ XPath, and periodically scrapes the element values from the page. It also has functionality to explore similar structure’s WEB pages and sensorize the page automatically. Probe uses Java version of SOXFire API.
The data scraped from web pages is published to SOXFire. Then, the data are transmitted to their subscribers via XMPP protocol. So far we have sensorized more than 400 thousands of WEB pages and being generating more than 20GB/day sensor data stream via SOXFire.
How can I use it ?
Please see manual. Also following video is fun to understand WEB Sensorizer.