You Can't Visit the Office. Here's How to Evaluate a Nearshore Dev Partner Anyway.

When you hire locally you lean on two signals without noticing them: you walk the office and you meet the team. Both vanish the moment you hire nearshore. What is left is a sales call, a deck, and a price. Most buyers compensate by defaulting to the price, because it is the only number on the table, and then spend the next six months discovering what the other three axes would have told them for free.

The fix is not more reference calls. References are curated and tell you about the project the agency wants you to hear about. The fix is to score the few things that are observable before you sign, weight them for your situation, and let the comparison be explicit instead of a feeling you back-fill with justification later. Four axes carry almost all of the predictive weight: execution velocity, case-study depth, tech-stack alignment, and communication maturity. Score each one to five. The shop that wins is rarely the one that was winning on price.

Radar chart comparing two agencies across four axes scored 1 to 5. Shop A is spiky — very high on execution velocity, near-zero on case-study depth. Shop B is smaller but even across all four axes. A balanced smaller polygon often beats a spiky larger one.

Axis 1: Execution velocity

Velocity is not how fast they say they move. It is whether they can name a real timeline on a real piece of work and defend the number. Ask them to walk through the last thing they shipped: what the scope was, what the estimate was, when it actually landed, and what moved in between. A shop with genuine velocity answers in specifics and is comfortable telling you where the estimate was wrong. A shop without it answers in adjectives. “We’re very agile” is a one out of five.

The reason this axis matters more for a nearshore partner than a local hire is that you cannot see the work happening. With an in-house team a stalled sprint is visible by Wednesday. With a remote partner you find out at the demo, two weeks later, when the thing you needed is not there. Velocity is the axis that protects you from that gap, so the question is really about predictability, not speed. A shop that ships in three weeks every time beats one that ships in two weeks or six depending on the month.

Axis 2: Case-study depth

This is the axis most agencies fail, and the data says so plainly. In an analysis of 8,223 social posts from 100 Latin American software agencies, case-study content ranged from 0.3 percent to 6.3 percent of an agency’s output. Most shops publish almost no detailed account of work they have actually delivered. They publish opinions about the industry instead, because an opinion costs nothing to produce and cannot be checked.

Depth means the case study contains a number you could be sued over. “We improved performance” is not depth. “We cut p95 latency on the checkout path from 1.8 seconds to 340 milliseconds by moving three synchronous calls behind a queue” is depth, because it commits to specifics that a real engagement produces and a fabricated one does not. When you ask for a case study and get a logo wall, score it a one. When you get a story with a metric, a constraint, and a thing that went wrong, score it a five. The presence of the thing that went wrong is the strongest single signal on this axis, because only people who were actually in the work know where it hurt.

Axis 3: Tech-stack alignment

Every shop will say yes to your stack. Alignment is about whether their real, demonstrated depth overlaps with what your project actually needs, which is a different question from what they will claim in a sales call. A fintech-heavy agency can probably write your React frontend, but if your problem is a high-traffic Drupal content platform with a complex permissions model, “we know PHP” is not the same as having scaled that specific thing. The same social dataset showed agencies cluster hard around trendy stacks, AI appeared in 396 posts, while proven enterprise tools like Drupal were essentially absent from public conversation despite continued enterprise demand. The narrative an agency presents and the work it has genuine depth in are frequently two different sets.

Score this axis by asking for depth in your specific problem area, not your broad language. If you run Drupal at scale, ask what breaks first under load and listen for whether the answer is theoretical or scarred. If you need an AI agent in production, ask about their hallucination rate and how they measure it, not whether they “do AI.” A five on this axis is a shop that has hit the exact wall you are about to hit and can describe the far side of it. A three is a competent generalist who will learn your problem on your budget. A one is enthusiasm.

Axis 4: Communication maturity

This is the axis that decides whether the other three ever reach you. A fast, deep, perfectly-aligned team that goes quiet for ten days is functionally a slow one, because you cannot act on work you cannot see. Communication maturity is the set of habits that keep a remote engagement legible: a fixed sync cadence, proactive flagging of blockers before they become slips, written decisions you can refer back to, and a named person who owns the relationship when something goes wrong at 6pm your time.

Test it before you sign by watching how they handle the sales process itself, because the sales process is the most communicative they will ever be. Did they follow up when they said they would? Did they answer the hard question directly or route around it? Did anything they sent you have a date and an owner on it, or was it all adjectives? The way a shop communicates while trying to win you is the ceiling, not the floor. It does not improve after the contract is signed.

The weighting is yours, not ours

The four axes are constant. The weights are not, and pretending there is one universal ranking is where most evaluation advice goes wrong. A ten-person startup that needs a feature shipped before a funding milestone should weight execution velocity above everything and can tolerate a thinner case-study record if the velocity signal is strong. A 250-person enterprise integrating a partner into a regulated, audited delivery process should weight communication maturity highest, because in that context an undocumented decision is a future liability, not an inconvenience.

Heatmap of recommended axis weights for three buyer profiles. The 10-person startup row is darkest on execution velocity (45%); the 250-person enterprise row is darkest on communication maturity (35%). The same framework produces different winners depending on who is buying.

Run the same scores through different weights and the winner changes. That is the point. A shop that scores 5 on velocity and 2 on communication is the right call for the startup and the wrong call for the enterprise, from identical data. When you write the weights down before you score, you stop the most common failure in vendor selection, which is choosing on one axis you happened to care about that week and rationalizing the rest afterward.

Why we framed it this way

We build this evaluation the same way when we are the ones being evaluated, which is the honest reason this post exists. CAMF Solutions runs a partner-network model: no large standing office, a small vetted specialist team assembled for each engagement, and a deliberate focus on the long-tail maintenance work that most pitch decks skip. That model scores differently on these axes than a 200-person shop does, and we would rather you score it openly than be surprised by it. We tend to be strong on alignment and communication and honest about where a giant body-shop would out-muscle us on raw headcount.

If you are evaluating an AI or custom-software partner this quarter, send us the one project you are least sure how to scope and the platform it runs on. We will send back how we would score ourselves against these four axes for that specific work, including where we would not be the right call. No sales follow-up.

Send a scoping request →

The deeper value of writing the axes down is not the score. It is that the act of scoring forces you to ask the four questions that a polished sales call is designed to keep you from asking. The agency that answers all four in specifics, including the answer that costs it the deal, is usually the one you want.