Skip to main content
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)Volume 11811 LNCS, 2019, Pages 322-33626th International Symposium on String Processing and Information Retrieval, SPIRE 2019; Segovia; Spain; 7 October 2019 through 9 October 2019; Code 232899

Linear Time Maximum Segmentation Problems in Column Stream Model(Conference Paper)(Open Access)

  Save all to author list
  • aDepartment of Computer Science, University of Helsinki, Helsinki, Finland
  • bUral Federal University, Ekaterinburg, Russian Federation

Abstract

We study a lossy compression scheme linked to the biological problem of founder reconstruction: The goal in founder reconstruction is to replace a set of strings with a smaller set of founders such that the original connections are maintained as well as possible. A general formulation of this problem is NP-hard, but when limiting to reconstructions that form a segmentation of the input strings, polynomial time solutions exist. We proposed in our earlier work (WABI 2018) a linear time solution to a formulation where minimum segment length was bounded, but it was left open if the same running time can be obtained when the targeted compression level (number of founders) is bounded and lossyness is minimized. This optimization is captured by the Maximum Segmentation problem: Given a threshold M and a set of strings of the same length n, find a minimum cost partition P where for each segment, the compression level is bounded from above by M. We give linear time algorithms to solve the problem for two different (compression quality) measures on P: the average length of the intervals of the partition and the length of the minimal interval of the partition. These algorithms make use of positional Burrows–Wheeler transform and the range maximum queue, an extension of range maximum queries to the case where the input string can be operated as a queue. For the latter, we present a new solution that may be of independent interest. The solutions work in a streaming model where one column of the input strings is introduced at a time. © 2019, Springer Nature Switzerland AG.

Author keywords

Dynamic programmingFounder reconstructionPan-genome indexingPositional Burrows–Wheeler transformRange maximum queue

Indexed keywords

Engineering controlled terms:Clustering algorithmsInformation retrievalPolynomial approximationQueueing theory
Engineering uncontrolled termsBiological problemsCompression qualityLinear-time algorithmsLinear-time solutionsLossy compressionsPolynomial-timeRange maximum queryRange maximum queue
Engineering main heading:Dynamic programming

Funding details

Funding sponsor Funding number Acronym
Academy of Finland309048
  • 1

    This work was partially supported by the Academy of Finland (grant 309048).

  • ISSN: 03029743
  • ISBN: 978-303032685-2
  • Source Type: Book Series
  • Original language: English
  • DOI: 10.1007/978-3-030-32686-9_23
  • Document Type: Conference Paper
  • Volume Editors: Brisaboa N.R.,Puglisi S.J.
  • Publisher: Springer

  Cazaux, B.; Department of Computer Science, University of Helsinki, Helsinki, Finland;
© Copyright 2019 Elsevier B.V., All rights reserved.

Cited by 6 documents

Rizzo, N. , Equi, M. , Norri, T.
Elastic founder graphs improved and enhanced
(2024) Theoretical Computer Science
Sahlin, K. , Baudeau, T. , Cazaux, B.
A survey of mapping algorithms in the long-reads era
(2023) Genome Biology
Equi, M. , Norri, T. , Alanko, J.
Algorithms and Complexity on Indexing Founder Graphs
(2023) Algorithmica
View details of all 6 citations
{"topic":{"name":"Suffix Array; Query; Succinct Data Structures","id":1881,"uri":"Topic/1881","prominencePercentile":91.12082,"prominencePercentileString":"91.121","overallScholarlyOutput":0},"dig":"4d6a3f6d95d44ada2f6c5de5a78f5a4dd6264ee9ee60c96c37fb3014e97d90e6"}

SciVal Topic Prominence

Topic:
Prominence percentile: