Author | Post | |||
moose |
Hi, I've just seen that some words are twice in wordlist all.txt (http://www.bright-shadows.net/download/wordlists/all.txt) Example: disney, cisco I guess if you checked it with a script you might find many more. It would be good if this would be corrected. moose |
|||
28.08.2011 16:31:10 |
|
|||
quangntenemy |
I think that's normal. I still remember reducing the Argon word list from 2 GB to like 500 MB once just by removing duplicates |
|||
26.09.2011 14:24:31 |
|
|||
moose |
Well, if the order isn't important I could create a new wordlist and give them to the admins. (This task is so easy in Python #!/usr/bin/python # -*- coding: utf-8 -*- import re def natural_sort(l): convert = lambda text: int(text) if text.isdigit() else text.lower() alphanum_key = lambda key: [ convert(c) for c in re.split('([0-9]+)', key) ] return sorted(l, key = alphanum_key) print("Start reading file.") file = open('tbswordlist2.txt','r') words = [] for line in file: words.append(line) file.close() print("Finished reading file.") print("%i Lines" % len(words)) words = set(words) words = natural_sort(list(words)) # just to make it easier to see that no words are twice print("Finished unifikation and sorting. %i Lines." % len(words)) file = open('wordlist.txt','w') file.writelines(words) file.close() tbswordlist1.rar: before: 1,450,251 words and 4.32mb. After: 1,450,184 words and 3.8mb as tar.gz tbswordlist2.rar: before: 1,301,376 words and 2.40mb. After: 650,688 words and 1.7mb as tar.gz all-word.txt before and after: 53091 words If any admin wants to upload these, I can send them to you. |
|||
Edited by moose on 26.09.2011 15:16:07 | ||||
26.09.2011 15:11:25 |
|
|||
Erik |
Hello, I removed the duplicates in all.txt and tbswordlist2. Best wishes, Erik |
|||
05.04.2017 01:14:25 |
|