You are here: PSPad forum > English discussion forum > Re: delete duplicate "huge" lines
Posted by: maki | Date: 2014-12-29 01:31 | IP: IP Logged
How to delete duplicate lines:
File utf-8
284 MB (~28 million lines)
None of the best editors 64-bit did not perform the task. Extremely slow and non-stop freezing.
I think that in PowerShell was able to, but I need a "tested" code to remove duplicates
After removing the duplicates should be reduced to around ~300k lines.
Do you have a good idea?
delete duplicate
"-number" or "number"
Edited 4 time(s). Last edit at 2014-12-29 01:35 by maki.
Posted by: Freeman | Date: 2014-12-29 06:48 | IP: IP Logged
Make duplicate line unique via sed script. You can download sed for Windows from the projects like UnxUtils. It is not clear uniqueness in the terms of RBMS, but usable as a rough process. After it, you can use PSPad on shrunk file.
Posted by: maki | Date: 2014-12-29 11:34 | IP: IP Logged
PSPad - Not work regex (error in the search)
Notepad++ - work!
Remove duplicate:
^(.*?)$\s+?^(?=.*^\1$)
Edited 2 time(s). Last edit at 2014-12-29 11:34 by maki.
Posted by: pspad | Date: 2014-12-29 12:04 | IP: IP Logged
Why do you remove duplicate lines using regular expression?
Posted by: maki | Date: 2014-12-29 12:36 | IP: IP Logged
Just a moment ago I had the idea of regex.
But I found another solution that works super-fast (a few seconds) and effectively
Polish/English software:
Duplicate Finder 1.4
onedrive.live.com
Posted by: pspad | Date: 2014-12-29 12:38 | IP: IP Logged
Please make a difference between universal text editor and specialized software with one purpose.
Posted by: maki | Date: 2014-12-29 12:41 | IP: IP Logged
PSPad does not support text with lots of lines:
Example:
284 MB / 28 million lines
not always using PSPad program but I'm using with your ideas (eg. regex, etc.) ^^
Editor PSPad - freeware editor, © 2001 - 2024 Jan Fiala, Hosted by Webhosting TOJEONO.CZ, design by WebDesign PAY & SOFT, code Petr Dvořák, Privacy policy and GDPR