You are here: PSPad forum > English discussion forum > Re: delete duplicate "huge" lines

Re: delete duplicate "huge" lines

#1 delete duplicate "huge" lines

Posted by: maki | Date: 2014-12-29 01:31 | IP: IP Logged

How to delete duplicate lines:

File utf-8
284 MB (~28 million lines)
None of the best editors 64-bit did not perform the task. Extremely slow and non-stop freezing.
I think that in PowerShell was able to, but I need a "tested" code to remove duplicates

After removing the duplicates should be reduced to around ~300k lines.

Do you have a good idea? sad smiley
delete duplicate

"-number" or "number"

image

Edited 4 time(s). Last edit at 2014-12-29 01:35 by maki.

Options: Reply | Quote | Up ^


#2 Re: delete duplicate "huge" lines

Posted by: Freeman | Date: 2014-12-29 06:48 | IP: IP Logged

Make duplicate line unique via sed script. You can download sed for Windows from the projects like UnxUtils. It is not clear uniqueness in the terms of RBMS, but usable as a rough process. After it, you can use PSPad on shrunk file.

Options: Reply | Quote | Up ^


#3 Re: delete duplicate "huge" lines

Posted by: maki | Date: 2014-12-29 11:34 | IP: IP Logged

PSPad - Not work regex (error in the search)

Notepad++ - work!

Remove duplicate:
^(.*?)$\s+?^(?=.*^\1$)

Edited 2 time(s). Last edit at 2014-12-29 11:34 by maki.

Options: Reply | Quote | Up ^


#4 Re: delete duplicate "huge" lines

Posted by: pspad | Date: 2014-12-29 12:04 | IP: IP Logged

Why do you remove duplicate lines using regular expression?

Options: Reply | Quote | Up ^


#5 Re: delete duplicate "huge" lines

Posted by: maki | Date: 2014-12-29 12:36 | IP: IP Logged

Just a moment ago I had the idea of regex.

But I found another solution that works super-fast (a few seconds) and effectively

Polish/English software:
Duplicate Finder 1.4
onedrive.live.com

Options: Reply | Quote | Up ^


#6 Re: delete duplicate "huge" lines

Posted by: pspad | Date: 2014-12-29 12:38 | IP: IP Logged

Please make a difference between universal text editor and specialized software with one purpose.

Options: Reply | Quote | Up ^


#7 Re: delete duplicate "huge" lines

Posted by: maki | Date: 2014-12-29 12:41 | IP: IP Logged

PSPad does not support text with lots of lines:

Example:
284 MB / 28 million lines

not always using PSPad program but I'm using with your ideas (eg. regex, etc.) ^^

Options: Reply | Quote | Up ^






Editor PSPad - freeware editor, © 2001 - 2024 Jan Fiala, Hosted by Webhosting TOJEONO.CZ, design by WebDesign PAY & SOFT, code Petr Dvořák, Privacy policy and GDPR