Travco is a better word-smith that I. So the majority of this post will be a copy/paste from this original link here. At the end, I want to get into the breakdown of these lists that lanix13 and I have.
We had five members on team Crevasse - zach.sanchez113, Travco, PinkieMcBluey, paymentrequired402, and lanix13. We had debated categorizing ourselves as Pro team (we did have information security professionals on the team), but considering some of our shortcomings we had doubts on if it made sense. Three of our members were participating in Crack Me If You Can for the first time, one of which had never done password cracking before the competition, another of which would be on a plane for a chunk of the competition. We also had spent a limited amount of time prepping, we had no hash/crack coordination or submission system/automation three days before competition, and most of our cracking hardware were decidedly not rigs built for password cracking. After discussing it and most members being ok with categorizing either way, Travco tossed an ancient flash drive into the air which landed logo-down and we categorized ourselves as a Street Team.
Technical DifficultiesFor the competition we had a disorganized complement of mostly personal machines all running Nvidia cards. We used a sum total of: 1x Nvidia A100, 1x 3090, 1x 3070, 1x 2070, 1x 2060, 1x 1080Ti, 1x 1080, 2x 1070, 6x 1060, 1x 980ti, 1x 970 spread across 12 machines.
One of the things we knew we needed to have was a means to coordinate and deduplicate cracked submissions, as looking at past CMIYC competitions and the rules for the current competition, there was a real chance of being ejected from the competition if duplicates were repeatedly sent. Travco took on both in the days prior and during the competition and became the defacto team captain.
As one might expect, a fair amount of time was spent/wasted by members of our team coordinating files and dictionaries between the 12 machines being used. Although Travco set up an instance of Hashtopolis [1] right before the event for tracking hashes, most of the functionality of Hashtopolis (managing agents/distributed hashing on disparate machines) ultimately wasn't used. None of us had used Hashtopolis before, and some of us were understandably a little wary of giving RCE to everyone else on machines that were mixed-purpose/personal machines - so it was used primarily for hash submission and dictionary sharing.
We did briefly attempt, and then chose to not bother with extracting hashes from the VM after both paymentrequired402 and lanix13 had issues getting the VM to function in VMWare. It was made clear at the beginning of the competition that the VM file would not be needed by street teams, although it might give an advantage. Travco later discovered it worked just fine in VirtualBox. Although paymentrequired402 had experience with dumping out domain controllers and actually wrote a tool for quickly handling ntds.dit files (Kraken) [2] the history3 files were already close enough to being released that the time was probably better spent doing other things.
Submission Hiccup #0: "Um, I'm sorry is that an email? Yea, that'll be $50 to send those." Travco had early difficulty in testing submission automation prior to the competition, and discovered that apparently AT&T as an ISP blocks outbound SMTP. As a result, final submission of cracked plaintexts was done manually. In hindsight this was probably for the best, as the automation Travco hastily set up in the couple of days before Defcon was hardly fool-proof, and manual submission allowed us to find the other two issues.
Submission Hiccup #1: "Where do these carriage returns keep coming from?" A consistent portion of the plaintexts we were submitting to CoreLogic were being rejected. Travco as the de-facto person submitting cracks took on figuring out why. The final submission and much of the hash cracking was done in Linux or Windows Subsystem for Linux, some of us were handling hashes at some point on a windows machine (insert spiderman pointing meme here), or occasionally pasting newly cracked hashes into the web interface of hashtopolis. This had the unfortunate consequence of carriage returns leaking in and invalidating the submission. Despite the presumed commonness of this issue, the carriage return is apparently not filtered out by Hashtopolis. Travco ended up making edits to his submission-tracking/deduplication script to filter these out and resubmit them.
Submission Hiccup #2: "Uh, what do you mean it wasn't accepted? Did we seriously get a hash collision in NTLM?" A smaller number of the plaintexts we were submitting to CoreLogic, even after the fixes to carriage returns, were still being rejected. Travco also took on figuring this one out. As it turns out some of the (unloved) machines being used had a version of hashcat so outdated that they unnecessarily converted some characters (e.g. accented Latin characters, UTF-8 characters) into $HEX[abcdef0123456789] format. Each of these were manually identified, decoded, and resubmitted (including a few in a scramble in the last 5 minutes of the competition).
Techniques and CoordinationWe kept in touch over Discord mentioning what techniques we were using and how successful they were. Our team was small enough, and the plaintexts were simple enough that a little bit of communication and each person taking on what they thought was the best technique to try was enough to prevent substantial overlap. Informal roles developed organically within the team, generally:
paymentrequired402 took on recycling/reprocessing newly cracked plaintexts with rules to produce additional cracks
both paymentrequired402 and lanix13 ran lengthy dictionary attacks of previously leaked/dumped passwords
PinkieMcBluey took on searching through the plaintexts for patterns to make targeted cracks
Travco took on running combinator/hybrid/prince/rephraser [3] attacks
zach.sanchez113 performed a mix of attacks, including writing a script to extract candidate strings directly from the Bible (after Pinkie identified a lot of Bible-phrases in plaintexts, more on that later).In general, (because we were competing as a street team and could only submit hashes for history6 - history3) many of the plaintexts we cracked were simple enough to be directly obtainable by running dictionary attacks using passwords from historical dumps/leaks/hacks with a ruleset like dive. Despite our widely-spread hashing hardware, we were still able to overtake most other Street teams with these attacks shortly after new hash files were released thanks to paymentrequired402 and lanix13. What may have differentiated us from other teams in the long run was the extent/breadth of some of these dictionaries (some of the attacks for which only really concluded in the last couple of hours of the competition), and the team's ability to see a potential pattern, run a quick attack searching for more like that pattern, and then immediately start longer-running targeted attacks.
As an example: PinkieMcBluey (who was password cracking for the first time, extra credit is due) was able to identify the various forms of media (song lyrics, artists, videogames, movie titles), Latin American locations, and bible verses/chapters as some of the longer plaintexts that we probably weren't hitting completely early in the competition. She worked on pulling and compiling a dictionaries of lyrics/artists/games/movies and convinced Travco to work on cracking Bible phrases. Travco directly downloaded the King James Bible, did some pre-formatting like removing page numbers and empty lines, and fed it directly to rephraser [3] and began cracking them within minutes. Although this was fairly successful, rephraser was made to mimic human behavior and generated passphrases without the special characters, whitespace, and punctuation from the source. Some of the plaintexts (especially for the King James Bible) decidedly weren't something a human would have generated organically because they were the exact source formatting including quirks of case, spacing, punctuation, and passage numbers. Thankfully zach.sanchez113 took it upon himself to write some python to make a dictionary out of various clippings with the source formatting, and was able to submit a large chunk of the missing plaintexts from this category.
From Lanix13: I started with just the Rockyou wordlist with a all-rules set based on best64/nsa65/hob064/my custom rules/dead0ne/d3adhob0/dive/kamaji34k/historical ALL rule similar to the OneRuleToRuleThemAll.rule. Basically a big combination of all the rules, but cleaned up for specific hash types, etc., then I add my custom rules to it from previous observations and previous competitions. Next I progressed to my custom wordlist leveraging the same rules including the temp-all-rule (combination of all the above). Next I progressed to the HIBP wordlist from the 2020 HIBP export with the above rules.
From paymentrequired402: At the end of the competition the final things I was doing (that weren't long-dictionary attacks) were: shuf data/wordlists/cmiyc-recycle.txt | pp64 --pw-min=9 | hashcat -a 0 -m 1000 -w 4 -O --username --debug-mode=1 --debug-file=data/rules/random_rule_hits.txt cmiyc-2021-h3456_passwd -g 200000 and hashcat -m 1000 -O -w 4 --username --potfile-path=data/potfiles/kraken.pot -r data/rules/all.rules cmiyc-2021-h3456_passwd data/wordlists/cmiyc-recycle.txt -d 2 which doesn't seem like much, but it was enough.
ConclusionWe really enjoyed CMIYC, thank you KoreLogic for putting it on. We think the choice to use only a single hash-type (NTLM) removed a lot of potential tedium and opened the door to more time spent agonizing over potential patterns with mostly-blind attacks and what was/wasn't working. We did really wish some of the plaintexts thrown to the street teams were longer (e.g. 15+) and more reflective of what a human might actually have as a passphrase, but when hash cracking you have to find what you have regardless of how unlikely it is.
An awed congratulations to team Hashcat and CynoSure Prime for again sweeping the top of the Pro board, and thank you team Hashcat for the especially detailed writeup.
A special cheers to the street teams dropdeeztables and achondritic for setting the bar high for most of the competition and making the middle and end of the competition a scramble for us when we realized we might actually be able to win.
[1] https://github.com/hashtopolis/server
[2] https://github.com/ltdenard/kraken
[3] https://github.com/travco/rephraser
So what makes up these list? well the standard things you can find on the internet like rockyou.txt, SecLists, Collections 1-5, Anti-Public, etc. These are the items that most people would think to put in their list, but there's also items that have to be collected as time goes on more frequently. Items like the ones below:
twitter_hash_tags.txt
common_phrases.txt
imdb_names.txt
imdb_title_akas.txt
imdb_title_basics.txt
imdb-titles.txt
knowyourmemes-july-15-2019.txt
movie-lines-raw.txt
song-titles-uk.txt
top100artist-lyrics-raw.txt
urban_dictionary.txt
wikipedia_titles_category.txt
wiktionary-may-20-2019.txt