AI code editors have been rapidly gaining traction among developers, whether they are using no-code editors such as base44, using AI code editors like Cursor, or AI assistants like CoPilot. I’ve been working with GitHub CoPilot for the last year or so, and recently have been coding more and more with Cursor - and these are some of my thoughts on coding with AI, with an emphasis on Cursor.
For reference: our codebase is React (Web), React-native with Expo (Mobile), python with Flask (API), sqlalchemy ORM and PostgreSQL DB. We manage our code in multiple Bitbucket repositories.
Early on CoPilot was proving very efficient in automating code completion and generation, and lately with Cursor and newer versions of CoPilot this functionality only improved. However, I found that AI can be very helpful in other coding functions:
A main difference I observed in Cursor code generation results lies in the characteristics of the codebase itself.
For new projects, Cursor really excels, ramping up new functionality quickly - but not necessarily efficiently. Code and component reuse, efficient loops and memory usage, React rendering and sqlalchemy queries - all of these seem to often be missed. You may wonder if we still ‘need’ these practices (more on that later), but to the extent that we do, in new projects Cursor tends to generate less efficient code. This can also get worse during further iterations. Prompting Cursor to fix an already-implemented feature, or extend it, often results in new code additions which ignore the option to refactor the existing code.
For evolving or established codebases, it seems that Cursor is pretty good in drawing from existing code to do ‘more of the same’ (which is frequently what we need, for example when adding RESTful API for a new type of resource). Where things sometimes take a turn is with hallucinations and when doing something new. I’ve seen hallucinations mostly in the form of the AI inventing classes, models and functions that do not exist, just because these ‘match’ the codebase style. For example, we had a bulk_upsert_by_constraint function and the AI created an API endpoint that used bulk_upsert_by_condition - a non-existent function. The danger here (specifically for interpreted languages like python and javascript/typescript) is that the use of non-existing symbols will not be identified before deploying the code, leading to mysterious failures in production. The other challenge is doing something different than existing code - while Cursor can definitely be up to the task, it often generates code that is very different from the existing code and not necessarily leveraging code infrastructure correctly. For example, using native postgres queries instead of ORM, or using fetch instead of axios client in React.
We don’t have much legacy code, as BeeHero is still very much a startup, but I did notice that with older technologies or niche technologies Cursor hallucinations were ramping up quickly - specifically, our AWS CDK code got non-existing constructs, construct functions and CloudWatch metrics.
I’ve played around with different approaches to AI prompts, with varying results.
When I’m using ‘specific’ prompts, I’m telling the AI exactly what I want to be done and how - e.g. ‘Create a new test case for this API endpoint, authenticate using this function, use a payload with these characteristics and assert these checks’. Cursor usually executes these kinds of instructions to a T. The downside is that it isn’t very different from ‘standard’ code completion, and doesn’t yield as much time and effort saving as possible.
‘Descriptive’ prompts will be more high-level but still provide a lot of details, for example ‘Create new test cases for this API, and check authentication, valid and invalid payload, and that the response matches the DB’. These kinds of prompts usually result with Cursor doing more than the ‘specific’ prompts, but on the other hand require more attention to the generated code and whether it actually fulfilled what you had in mind.
‘Restrictive’ prompts can actually be combined with any of the others - the idea is to frame the boundaries or the playfield of what we want Cursor to do. For example, ‘create new test cases for the API endpoint but do not create additional fixtures, and maintain the same coding standard as existing tests’. ‘Restrictive’ prompts are good when you want to allow Cursor much leeway to work, but are concerned that it may alter code infrastructure, reusable code or produce unusable code.
Finally, at times it’s worth trying a ‘high level’ prompt and let Cursor take as much load off our shoulders as possible. For example, ‘test the API endpoint’ - which results in Cursor producing multiple test cases of the various code flows it ‘understands’.
Obviously, this is not a one-size-fits-all approach, and I will usually start with a ‘descriptive’ or ‘high-level’ prompt, review the results, accept or reject as needed and then refine with additional more ‘restrictive’ or ‘specific’ prompts
...but I’ll save it for future posts. Instead I’d like to close with a few nitpicks and oddities: