GSOD Example
The Global Surface Summary of Day (GSOD) is a dataset produced by the National Climatic Data Center (NCDC) in Asheville NC, using daily summaries of weather conditions compiled by the USAF Climatology Center. Elements available from each station include min, mean, and max temperatures, dew point, pressures, visibility, wind speeds, and precipitation.
The entire GSOD dataset is quite large (more than 7GB at the time of writing), but a small sample covering 6 Texas cities for the year 1988 is provided for purposes of running this demo.
Prerequisites
Before attempting to start this demo, you should have a running Cassandra cluster, and a Newts REST endpoint. Setting up Cassandra is beyond the scope of this document, so have a look at the Cassandra Wiki if you need help with that. Directions on setting up a REST endpoint can be found in the README for that module.
Building
To build the GSOD example code, run::
mvn install
Importing Data
To import the included data, run::
java -cp target/newts-gsod-1.0.0-SNAPSHOT-jar-with-dependencies.jar \
org.opennms.newts.gsod.ImportRunner -p 100 \
ftp.ncdc.noaa.gov/pub/data/gsod/1988
The importer accepts a single argument for the name of a directory that is searched recursively for GSOD data files. You can load additional data by changing this argument accordingly::
java -cp target/newts-gsod-1.0.0-SNAPSHOT-jar-with-dependencies.jar \
org.opennms.newts.gsod.ImportRunner -p 100 \
/path/to/additional/data
The import process connects to Cassandra directly, if necessary you can override the Cassandra hostname, port, and keyspace name using system properties. For example::
java -Dcassandra.keyspace=newts -Dcassandra.host=localhost -Dcassandra.port=9042 \
-cp target/newts-gsod-1.0.0-SNAPSHOT-jar-with-dependencies.jar \
org.opennms.newts.gsod.ImportRunner -p 100 \
ftp.ncdc.noaa.gov/pub/data/gsod/1988
Usage for Importer
java -cp target/newts-gsod-1.0.0-SNAPSHOT-jar-with-dependencies.jar org.opennms.newts.gsod.ImportRunner2 [options] sourceDir
sourceDir : the source directory that contains gsod data to import. These must be gzip’d files -n (–samples-per-batch) sample-count : the maxinum number of samples to include in each post to the repository (default: 1000) -p (–parallelism) thread-count : when using direct the size of the thread pool that posts the results. (defaults to 1 ie no parallelism) -q (–max-work-queue-size) batch-count : when using direct the max size of the work-queue (defaults to thread-count * 3) -u (–url) url : publish data via a Newts REST server at the given url (default: use direct access via Newts API)
Starting Demo Webserver
Issue the following to start the web server::
java -cp target/newts-gsod-1.0.0-SNAPSHOT-jar-with-dependencies.jar org.opennms.newts.gsod.Web
View Examples
You can either view individual graphs of the 6 Texas stations, or see a report of all 6 for the Summer of 1988.