logo
0 anonymous
Views: 1269167 Challenges: 342
Users: 12684 Online: 0

Double items in wordlists – 4 Posts

  • 10/16/2024 13:00
    moose's Avatar moose 00
    Not SpecifiedNot SpecifiedNot SpecifiedNot SpecifiedNot SpecifiedNot SpecifiedNot SpecifiedNot SpecifiedNot SpecifiedNot SpecifiedNot SpecifiedNot SpecifiedNot Specified
    Hi,

    I've just seen that some words are twice in wordlist all.txt (http://www.bright-shadows.net/download/wordlists/all.txt)
    Example: disney, cisco
    I guess if you checked it with a script you might find many more. It would be good if this would be corrected.

    moose
  • 10/16/2024 13:00
    quangntenemy's Avatar quangntenemy 00
    Not SpecifiedNot SpecifiedNot SpecifiedNot SpecifiedNot SpecifiedNot SpecifiedNot SpecifiedNot SpecifiedNot SpecifiedNot SpecifiedNot SpecifiedNot SpecifiedNot Specified
    I think that's normal. I still remember reducing the Argon word list from 2 GB to like 500 MB once just by removing duplicates \":D\"
  • 10/16/2024 13:00
    moose's Avatar moose 00
    Not SpecifiedNot SpecifiedNot SpecifiedNot SpecifiedNot SpecifiedNot SpecifiedNot SpecifiedNot SpecifiedNot SpecifiedNot SpecifiedNot SpecifiedNot SpecifiedNot Specified
    Well, if the order isn't important I could create a new wordlist and give them to the admins.
    (This task is so easy in Python \":D\"
    #!/usr/bin/python
    # -*- coding: utf-8 -*-
    
    import re
    
    def natural_sort(l): 
        convert = lambda text: int(text) if text.isdigit() else text.lower() 
        alphanum_key = lambda key: [ convert(c) for c in re.split('([0-9]+)', key) ] 
        return sorted(l, key = alphanum_key)
    
    print(\"Start reading file.\")
    file = open('tbswordlist2.txt','r')
    words = []
    for line in file:
        words.append(line)
    file.close()
    print(\"Finished reading file.\")
    
    print(\"%i Lines\" % len(words))
    words = set(words)
    words = natural_sort(list(words)) # just to make it easier to see that no words are twice
    print(\"Finished unifikation and sorting. %i Lines.\" % len(words))
    
    file = open('wordlist.txt','w')
    file.writelines(words)
    file.close()

    tbswordlist1.rar: before: 1,450,251 words and 4.32mb. After: 1,450,184 words and 3.8mb as tar.gz
    tbswordlist2.rar: before: 1,301,376 words and 2.40mb. After: 650,688 words and 1.7mb as tar.gz
    all-word.txt before and after: 53091 words

    If any admin wants to upload these, I can send them to you.
  • 10/16/2024 13:00
    Erik's Avatar Erik 00
    Not SpecifiedNot SpecifiedNot SpecifiedNot SpecifiedNot SpecifiedNot SpecifiedNot SpecifiedNot SpecifiedNot SpecifiedNot SpecifiedNot SpecifiedNot SpecifiedNot Specified
    Hello,

    I removed the duplicates in all.txt and tbswordlist2.

    Best wishes,
    Erik