Coding with Intelligence - some ‘Cursory’ Thoughts

AI code editing is fast becoming the practice-of-the-day for developers - here are some thoughts based on my experience with AI coding, and specifically Cursor IDE
Inbar Shani
Chief Software Architect

AI code editors have been rapidly gaining traction among developers, whether they are using no-code editors such as base44, using AI code editors like Cursor, or AI assistants like CoPilot. I’ve been working with GitHub CoPilot for the last year or so, and recently have been coding more and more with Cursor - and these are some of my thoughts on coding with AI, with an emphasis on Cursor.

For reference: our codebase is React (Web), React-native with Expo (Mobile), python with Flask (API), sqlalchemy ORM and PostgreSQL DB. We manage our code in multiple Bitbucket repositories.

Not just code generation

Early on CoPilot was proving very efficient in automating code completion and generation, and lately with Cursor and newer versions of CoPilot this functionality only improved. However, I found that AI can be very helpful in other coding functions:

  • Searching for code - know the feeling of looking at a bit of UI someone else developed, wondering where is the code that implements this specific drop down? Well, wonder no more. Just ask the AI to find the code by its functionality, or even by uploading a screenshot of the UI, and voilà - you get a list of components which implement it
  • Explaining code - once I find the code, it often takes some time and effort to understand the flow, various arguments and parameters, use of React hooks, python unique syntax etc. But why work hard? Just ask the AI to explain the code for you - and you’ll get a summary of the main components, flow, the role of various functions, and you can further ask the AI to clarify or provide more details
  • Assist in design - now that we found the code and understood how it works, we can get to implementing our new requirements. But hold your horses! How about some planning first? What are your priorities and guidelines going into a feature? Why not ask the AI? I’ve found out that giving a brief description of what I’m about to do, and then asking the AI to come up with concerns and questions, created a checklist of design questions that can guide my implementation. Interestingly, at times the AI was able to capture the design guidelines of my codebase and suggest implementing along those lines, however at other times the checklist was more of the ‘generic’ type, pointing to performance and maintainability concerns. Still, it can be very helpful for the less-experienced engineers or someone coming into a less familiar codebase

Codebases - new, evolving, established or legacy

A main difference I observed in Cursor code generation results lies in the characteristics of the codebase itself. 

For new projects, Cursor really excels, ramping up new functionality quickly - but not necessarily efficiently. Code and component reuse, efficient loops and memory usage, React rendering and sqlalchemy queries - all of these seem to often be missed. You may wonder if we still ‘need’ these practices (more on that later), but to the extent that we do, in new projects Cursor tends to generate less efficient code. This can also get worse during further iterations. Prompting Cursor to fix an already-implemented feature, or extend it, often results in new code additions which ignore the option to refactor the existing code.

For evolving or established codebases, it seems that Cursor is pretty good in drawing from existing code to do ‘more of the same’ (which is frequently what we need, for example when adding RESTful API for a new type of resource). Where things sometimes take a turn is with hallucinations and when doing something new. I’ve seen hallucinations mostly in the form of the AI inventing classes, models and functions that do not exist, just because these ‘match’ the codebase style. For example, we had a bulk_upsert_by_constraint function and the AI created an API endpoint that used bulk_upsert_by_condition - a non-existent function. The danger here (specifically for interpreted languages like python and javascript/typescript) is that the use of non-existing symbols will not be identified before deploying the code, leading to mysterious failures in production. The other challenge is doing something different than existing code - while Cursor can definitely be up to the task, it often generates code that is very different from the existing code and not necessarily leveraging code infrastructure correctly. For example, using native postgres queries instead of ORM, or using fetch instead of axios client in React.

We don’t have much legacy code, as BeeHero is still very much a startup, but I did notice that with older technologies or niche technologies Cursor hallucinations were ramping up quickly - specifically, our AWS CDK code got non-existing constructs, construct functions and  CloudWatch metrics.

Prompts - specific, descriptive, restrictive or high level?

I’ve played around with different approaches to AI prompts, with varying results.

When I’m using ‘specific’ prompts, I’m telling the AI exactly what I want to be done and how - e.g. ‘Create a new test case for this API endpoint, authenticate using this function, use a payload with these characteristics and assert these checks’.  Cursor usually executes these kinds of instructions to a T. The downside is that it isn’t very different from ‘standard’ code completion, and doesn’t yield as much time and effort saving as possible.

Descriptive’ prompts will be more high-level but still provide a lot of details, for example ‘Create new test cases for this API, and check authentication, valid and invalid payload, and that the response matches the DB’. These kinds of prompts usually result with Cursor doing more than the ‘specific’ prompts, but on the other hand require more attention to the generated code and whether it actually fulfilled what you had in mind.

Restrictive’ prompts can actually be combined with any of the others - the idea is to frame the boundaries or the playfield of what we want Cursor to do. For example, ‘create new test cases for the API endpoint but do not create additional fixtures, and maintain the same coding standard as existing tests’. ‘Restrictive’ prompts are good when you want to allow Cursor much leeway to work, but are concerned that it may alter code infrastructure, reusable code or produce unusable code.

Finally, at times it’s worth trying a ‘high level’ prompt and let Cursor take as much load off our shoulders as possible. For example, ‘test the API endpoint’ - which results in Cursor producing multiple test cases of the various code flows it ‘understands’.

Obviously, this is not a one-size-fits-all approach, and I will usually start with a ‘descriptive’ or ‘high-level’ prompt, review the results, accept or reject as needed and then refine with additional more ‘restrictive’ or ‘specific’ prompts

There’s a lot more to say...

...but I’ll save it for future posts. Instead I’d like to close with a few nitpicks and oddities:

  • Cursor is not great in creating new code files and deciding on how to organize the code structure. I frequently find it better to create the files myself, and then ask Cursor to generate their content
  • I’m on the fence about the value of the ‘context’ for Cursor chat. I’m not sure that the effort of providing context results in better code than mentioning the context in the prompt, or even allowing Cursor to figure it out itself
  • Team work with Cursor is also changing - cursor changes tend to be wider in scope and produce less-organized code than our manual changes, which changes the nature of Pull Requests and the review process. Is there a point commenting on reusable code? Afterall, Cursor will be the one to change it in the future, if it doesn’t care about reusing code, why should we? And should we limit the scope of change just so the reviewer will not have a hard time going over 10+ files?
  • Cursor often parses the code file in order to make changes, and this works slowly or outright fails for large code files (e.g. some of our python files are ~4K lines of code)
  • Cursor ‘Agent’ mode sometimes goes into a loop of generating a change, noticing issues (linter, non-existing references), trying to fix or revert, noticing issues and so on
  • And finally - Cursor needs Internet connection… doesn’t go too well with working while on my train ride home…