OPTIMAL PARTITIONS OF DATA IN HIGHER DIMENSIONS

OPTIMAL PARTITIONS OF DATA IN HIGHER DIMENSIONS

BRADLEY W. JACKSON, JEFFREY D. SCARGLE, AND CHRIS CUSANZA, DAVID BARNES, DENNIS KANYGIN, RUSSELL SARMIENTO, SOWMYA SUBRAMANIAM, TZU-WANG CHUANG**

Abstract. Consider piece-wise constant approximations to a function of several parameters, and the problem of finding the best such approximation from measurements at a set of points in the parameter space. We find good approximate solutions to this problem in two steps: (1) partition the parameter space into cells, one for each of the N data points, and (2) collect these cells into blocks, such that within each block the function is constant to within measurement uncertainty. We describe a branch-and-bound algorithm for finding the optimal partition into connected blocks, as well as an O(N2) dynamic programming algorithm that finds the exact global optimum over this exponentially large search space, in a data space of any dimension. This second solution relaxes the connectivity constraint, and requires additivity and convexity conditions on the block fitness function, but in practice none of these items cause problems. From the wide variety of intelligent data understanding applications (including cluster analysis, classification, and anomaly detection) we demonstrate two: partitioning of the State of California (2D) and the Universe (3D).

Data and Resources

Additional Info

Field Value
Maintainer Elizabeth Foughty
Last Updated March 31, 2025, 15:27 (UTC)
Created March 31, 2025, 15:27 (UTC)
accessLevel public
accrualPeriodicity irregular
bureauCode {026:00}
catalog_@context https://project-open-data.cio.gov/v1.1/schema/catalog.jsonld
catalog_@id https://data.nasa.gov/data.json
catalog_conformsTo https://project-open-data.cio.gov/v1.1/schema
catalog_describedBy https://project-open-data.cio.gov/v1.1/schema/catalog.json
harvest_object_id 9748baf9-fe36-4b26-912f-9fe4f82c66fd
harvest_source_id 61638e72-b36c-4866-9d28-551a3062f158
harvest_source_title DNG Legacy Data
identifier DASHLINK_230
issued 2010-10-13
landingPage https://c3.nasa.gov/dashlink/resources/230/
modified 2020-01-29
programCode {026:029}
publisher Dashlink
resource-type Dataset
source_datajson_identifier true
source_hash e7052d9720ae91c9a58a1bfc6847d39826e57667351ec197aafba7b545c7c9da
source_schema_version 1.1