LocationTech GeoWave leverages the scalability of a distributed key-value store for effective storage, retrieval, and analysis of massive geospatial datasets. It currently does so by providing plugins to connect GeoTools and PDAL to multiple key-value stores. The primary goal of GeoWave is to bridge the gap between popular geospatial projects, the realm of distributed key-value stores, and distributed processing frameworks. Geospatial operations tend to be an afterthought, or do not mesh well with many of these storage and compute capabilities. Through GeoWave we intend to make them first class supported citizens.
Explicitly in scope for this project are:
- Providing bindings between geospatial toolkits (which don't natively support large scalable data stores), and distributed key-value stores.
- Apache Accumulo was the first implementation of this.
- HBase, Cassandra, DynamoDB and BigTable support have now been added.
- Other datastores will be evaluated on the criteria of
- Scalability / distributed nature
- Lack of existing capability
- Userbase size and support
- New or novel features and capabilities
- Data store integration should go beyond simple storage and retrieval of data, and focus on the entire concept of interacting with these datasets in realtime: taking advantage of dynamic server side processing and large scale pre-processing (ie. utilizing concepts like spatial subsampling, distributed rendering, geometry simplification, vector tiles, raster pyramids, statistics, etc.)
- Provide bindings between geospatial toolkits and distributed compute frameworks
- Map Reduce (under Yarn) and Accumulo Iterators were the first implementations of this.
- Spark integration for certain algorithms has been added
- Tie-ins to GeoTrellis should be considered in the future
- Other frameworks will be evaluated on the criteria of
- Unique or different capabilities
- User interactivity and presentation
- Userbase size and support
- The intent of these systems are to allow people to ask meaningful questions, interact with the data, and develop custom high level questions based on our low level building blocks and components (e.g. clustering, probabiliy density estimates, hotspot analysis, etc. might be the low level building blocks provided by GeoWave which an end user or outside developer can leverage to answer questions such as what will happen where and when).
- Identify the geospatial toolkits to provide bindings for in (1) and (2)
- GeoTools / GeoServer is the current primary implementation.
- PDAL support has recently been added, and Mapnik support is coming soon
- GeoGig support is currently on our backlog, and something we are very, very interested in.
- Other tookits will be evaluated on the basis of
- Lack of support for distributed systems
- Current userbase size and support
- General applicability to all the supported instances in (1) and (2)
Design goals for the above include
- Users of the geospatial systems integrated should be able to operate those systems in a natural manner with as little awareness of the distributed backend as possible.
- Users/Developers should be able to opt in / take advantage of the features, capabilities (such as cell level security, etc.), and other aspects of the distributed system if they want to (it just shouldn't be required / sane defaults should be provided when needed)
The project should provide a flexible spatio-temporal analytics platform, leveraging third party algorithms as much as possible and providing transparent interoperability (a common index, persistence, and data model, leveraging GeoTools as a commonality where appropriate).
Geotools will be used as much as possible as a common geospatial framework to tie the various components (storage, compute, systems) together.
The project is intended to easily integrate across language boundaries. Although the project is written in Java, there are currently routines to generate C++ bindings.