## Abstract

A k-decision tree t (or k-tree) is a recursive partition of a matrix (2D-signal) into k ≥ 1 block matrices (axis-parallel rectangles, leaves) where each rectangle is assigned a real label. Its regression or classification loss to a given matrix D of N entries (labels) is the sum of squared differences over every label in D and its assigned label by t. Given an error parameter ε ∈ (0, 1), a (k, ε)-coreset C of D is a small summarization that provably approximates this loss to every such tree, up to a multiplicative factor of 1 ± ε. In particular, the optimal k-tree of C is a (1 + ε)-approximation to the optimal k-tree of D. We provide the first algorithm that outputs such a (k, ε)-coreset for every such matrix D. The size |C| of the coreset is polynomial in k log(N)/ε, and its construction takes O(Nk) time. This is by forging a link between decision trees from machine learning – to partition trees in computational geometry. Experimental results on sklearn and lightGBM show that applying our coresets on real-world data-sets boosts the computation time of random forests and their parameter tuning by up to x10, while keeping similar accuracy. Full open source code is provided.

Original language | English |
---|---|

Title of host publication | Advances in Neural Information Processing Systems 34 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021 |

Editors | Marc'Aurelio Ranzato, Alina Beygelzimer, Yann Dauphin, Percy S. Liang, Jenn Wortman Vaughan |

Publisher | Neural information processing systems foundation |

Pages | 30352-30364 |

Number of pages | 13 |

ISBN (Electronic) | 9781713845393 |

State | Published - 7 Oct 2021 |

Event | 35th Conference on Neural Information Processing Systems, NeurIPS 2021 - Virtual, Online Duration: 6 Dec 2021 → 14 Dec 2021 |

### Publication series

Name | Advances in Neural Information Processing Systems |
---|---|

Volume | 36 |

ISSN (Print) | 1049-5258 |

### Conference

Conference | 35th Conference on Neural Information Processing Systems, NeurIPS 2021 |
---|---|

City | Virtual, Online |

Period | 6/12/21 → 14/12/21 |

### Bibliographical note

Funding Information:This research was supported by The ISRAEL SCIENCE FOUNDATION, grant number 379/21.

Publisher Copyright:

© 2021 Neural information processing systems foundation. All rights reserved.

## Keywords

- cs.LG
- cs.DS

## ASJC Scopus subject areas

- Computer Networks and Communications
- Information Systems
- Signal Processing