47
Now that AI has become the main tool used by developers to write code, even in open source environments, it will be how feds will slip in backdoors to applications because nobody is going to review the logic of 20000 lines written by AI in a single commit.
Unless projects completely ban use of AI and only allow small commits, this is going to be inevitable. I've been seeing so many applications merging AI slop to their code on github already.
I think supply chain is probably more viable still. Though I think it's reasonable to assume huge unreviewed commits, as others mentioned, will inevitably introduce severe vulnerabilities that will be effectively backdoors even if the models aren't malicious (and I do assume they will increasingly be so).
As an example of how the two could work together, an LLM could preferentially use a particular library into which they have inserted a vulnerability. This attack may not be particularly long-lived but it's easier to hide than an unprotected API endpoint or similar. One corrupted library could be used by hundreds or thousands of targeted projects. Technically only one subversion even needs to be corrupted - the one they pin. Even easier if they make it a non-open component of the library, like a binary blob that isn't reproducible. Declare it a low level optimized library.
I would simply not merge a 20,000 line commit
AI is provably going to crash before that happens. It's too expensive and the public hates it.
I've seen devs going full bazinga brain with AI. People who've never written code are pushing vibe coded PR's nowadays. It's a mess.
Wider AI bubble might collapse but vibe coding will stick around regardless since cheap models also exist.
It wont last forever because those bazinga brain devs are the ones that were always shit at programming. In the short term the businesses will be creaming their pants because these assholes are like “look at me im cranking out millions of lines of utter garbage” but it will eventually hit a point where the business now has a massive bloated application that nobody can understand and is breaking in ways nobody can debug.
The core job of a software engineer is not to shit out code that’s actually a very bad idea cos software is fundamentally a liability. The best thing a software engineer can do is delete code or not write any at all. Its not really any different to the CV builder dev that wants to turn everything into rpc connected microservices because they dont know the basics of writing modular code. I am eternally at war with these people and I hate them
So, serious question then:
If code has a grammatically correct or correct syntax form to use, how is AI generated code detected from human created code? Wouldn't they both follow similar structuring?
I know sometimes a long line of code can be "abbreviated" (lack of a better understanding or term ) so that it saves a few bytes and eliminates possible steps or errors, but is AI writing out the long version of code with a lot of comments or commented out instructions or verbiage?
I guess I don't understand outside of, AI being: 'this is a very long sentence that I don't know how to structurally shorten" vs human "I shortened it"something along those lines...
AI is incapable of making engineering decisions based on context unless you very specifically ask it to do something a certain way so for example I was writing a cli tool to help automate some processes but knowing this tool is throwaway I didn’t bother making it like the most robust opting for clarity and easier modification. Another colleague tried to write the same tool with github copilot because they can’t handle the fact theyre useless at everything and it came out functionally the same sure but somehow many hundreds to thousands of lines more verbose.
Somehow idk how this guy does it but all the comments in the code are complete gibberish and will also be massively verbose over a single line comment I would put in to be more meaningful. The best part is i still wrote mine faster manually because he would hit errors and then ask the AI to fix these errors where because the context of its prior code has so much nonsense in it further loses the plot and breaks it more. And then he ran out of credits and simply stopped doing anything. I don’t run out of credits though.
Businesses only really care about the output but having a simple application bloat into many thousands of lines more is a debt you really dont want to pay back later. Its easy to see a codebase that had a human make human decisions in it down to the “TODO: Fix this later”
Also I swear most AI companies tell the llm to produce more to drive up token usage cos some of the things it comes up with. My attempts at using this myself usually devolves into frustration because its like asking a junior dev to do something and I end up getting annoyed and doing it myself faster. Just because syntax is correct and the code runs doesnt mean the application is good. Oh and another thing is it prioritises older standards all the time. I know python inside out upside down and its vast library ecosystem it generally trends towards using libraries and standards that are out of date or even unsupported targetting something closer to 3.10 when im using 3.14. If you aren’t interrogating your dependencies you may as well be asking for critical failure later but nobody cares about the future anymore!
Oh and last one as the codebase grows its concept of consistency falls off massively and it starts treating the new functionality separated from the rest. As a result you get duplicated logic all over the place. Now im not the biggest believer in solid principles it can produce some over engineered solutions but repeating the same logic over and over is a problem you cant easily get yourself out of.
Just because syntax is correct and the code runs doesnt mean the application is good.
This is something that no one seems to get (at least no one who doesn't actually write code). Technically correct and actually correct are in two different universes.
I spent two months re-writing our entire codebase from scratch last year because it had built up ~5 years of tech debt and was using systems and APIs that are no longer supported. There were a few concepts worth keeping, but not a single file was allowed to survive until it was rewritten.
Now that Python's type hints are more mature, that was a big one. The old code was using either no hints or old hints, and if you've ever worked in a fully typed (as in literally every line including private code) codebase, it's hard to go back. AIs kinda suck at that though because the vast majority of Python code is unhinted.
Mypy gives me a headache for some things but its invaluable. It kinda just makes me wish for a strictly typed python tbh. Its also one of those languages you can fuck yourself into a corner real badly if you don’t have some strict rules for it. One guy I worked with loved using language features for no actual good reason until the codebase was practically its own domain specific language the IDE just highlighted in bulk as “idk wtf this is”. It would take days to understand to make a change and all it was doing was running a regression model on a date and value lmao
Yeah, my entire codebase is ~50-75k lines and it runs most of the company. I have a sort of "DSL" that's more just a wrapper library for the APIs we need to use. Means I can implement the optimal way to do something and have the production code just be a function/method call. That's incredibly useful when you are writing design automations that need to be very clear to read and change as a project evolves.
I've had a couple juniors over the years, and even without AI it was amazing just how much code they'd write. Like a tool script that kind worked and is 2000+ lines, but after my review/rewrite it's more generalized with error handling and it's 300 lines.
Oh and another thing is it prioritises older standards all the time
LLMs are trained on existing data. There is going to be less data bout new stuff than there is about older stuff.
ⓘ This user is suspected of being a cat. Please report any suspicious behavior.
Yes an llm fundamentally only looks backwards it cant look forwards. It relies on nerds like me utilising new language features properly to produce relevant slop but honestly i feel like that kind of person is a dying breed and higher level languages are being treated no differently to assembly where its just machine output thats never questioned or even looked at.
Been seeing a big comeback of basic exploits like sql injection its kinda funny. Its kinda funny how many “engineers” i now work with that dont know what that even is
It's a lot easier for the LLM to lose the plot when they're writing. So when the sentence is like "I'm hungry, so I'm going to put some waffles in the toaster" and the goal is "expand this out so I can put anything in the toaster. But write it in a way that makes it so I can use any other appliance later." The output would be like "I'm hungry so I'm going to put some silently approved pancake into the microwave once it's also an approved appliance."
So it could be grammatically correct. It might even work, but you accumulate these little kinks in the codebase.
2 prompts in its already lost the plot on a really basic frontend application. I think at best I’ve had it create the first draft of a greenfield project for me cos I dont really like doing that and then I just take it from there
Hah, so in a formatting or logical sense, it works, but it's obviously not written how humans talk and so appears robotic... Got it, thank you that was easier to understand than I thought lol.
I have unfortunately had to vibe code a few things for work since there's a lot going on, and I don't have uninterrupted times to work on stuff, so concentration is almost nonexistent. Using ai to patch or fix or fill has been helpful! But I can see for community projects are problematic!
Yes, the context of what you're doing certainly matters. I had to get into an archive file that I didn't know what to do with one time. I used AI to build me a little app to visualize its content. Was it perfect? I doubt it. Did it function and get me the info I was looking for? Yeah!
As a rule of thumb, would you trust Momo Yaoyorozu to build you a solution or do you need to be able to blame someone when the solution blows up regardless of having a human build it?
You don't need a single 20000-line commit for that. Backdoors are already on the wild and are introduced slowly in multiple small steps.
In my opinion, AI will probably 1. Help design these steps and/or 2. Introduce them without the author even knowing.
You're right.
This seems doomerish. Can't they use AI to review the logic of commits?
Prompting OpenClaw with "find all back doors --- make no mistakes" is kind of funny, but seems viable.
Sure, but the models can be told to ignore certain backdoors. The models also lie all the time for any reason or no reason at all. Since AI coding is not really a trust and predictablity based system there's no way you can know for sure at any given moment that you don't have backdoors without a human examining the code line by line, or by building your own AI that you can trust
The models also lie all the time
A coworker of mine was tasked with having an AI agent generate a security report of our latest effort. It returned about 10 things, only 2 were semi valid. The silliest one was a claim that our regex function for stripping out non-word characters was not adequate because "\w allows . characters" thereby enabling path traversal attacks. FYI, \w very explicitly does NOT allow . characters.
