I had a problem where I needed to parse through nearly a gigabyte of log files, each line of which represented an operation, and interpret that data.
This wasn't a task I was given as much as it was a task that I gave myself because I was curious about a mismatch in the data we were seeing from it. An idle curiosity that I wanted to throw an hour at.
In short, each of these log files represented a run of the program. An operation could fail or succeed. Successes would be present in the next log file and errors would be reattempted.
The problem was that I noticed that a very small number of these operations would show up in the log as an error, but never appear in the next log as a success. It would just disappear.
This occurred to an incredibly small number. We predicted 67 out of 2.3 million operations.
So the plan for this utility is: Iterate through each line of each file. When you find an error, hold onto it, then when you find a success, check to see if it matched any of the error lines. If it did, remove that error line. At the end, any errors remaining are our orphans.
I shit this together in 30 minutes, ran it, went off to lunch. After lunch, it wasn't finished, but whatever, it's a lot of data, this may take awhile. I'll just let it run in the background while I do other shit.
The next day, it was still executing. I inspected the process - it was definitely doing work, but at the rate it was going, it would take roughly 60 hours to interpret everything. Whatever, it's nearly the weekend, I'll just let it go.
Until late this afternoon - Processes started hanging. Explorer.exe stopped and refused to start up again. I could not start Task Manager. I could not ctrl+alt+del. This process was consuming every resource it could get its hands on. With great regret, I killed the machine, losing over a day of execution time.
I rewrote the code giving a shit about performance. Same idea, just a slightly different implementation. Only took 10 or so minutes, maybe 20 lines of code. Started it up, then went to inspect the process's Read/Write data. By the time I did, it was done.
Rewriting the program brought execution time down from 60 hours to 20 seconds.
PROGRAMMING