I Built a Weather Prediction Bot That Made $959 in Paper Trades (Here's How)

A commercial pilot built a Kalshi weather prediction bot using Python, NOAA forecasts, and Kelly Criterion position sizing. 137 paper trades, $959.27 profit, 98.3% of Monte Carlo simulations profitable. Here's exactly how it works and why it's not live yet.

I'm 46 years old. I have a commercial pilot certificate, 369.5 flight hours, and a very clear problem: I need about 1,130 more hours to hit the minimums that get you a real flying job. At $150 to $200 an hour for a decent aircraft rental, that's somewhere between $170,000 and $225,000 I need to generate from somewhere that isn't my day job managing a car electronics shop in Merced.

That number has been sitting on my desk for a while. I've been building toward it through a few different angles — an iOS app, KDP workbooks, some content work — all orchestrated through OpenClaw, my AI agent platform. But a few months ago I started thinking about prediction markets. Specifically, Kalshi. Specifically, the weather markets.

Here's where my head went: I'm a pilot. Weather is not a hobby topic for me. I've been reading METARs, TAFs, area forecasts, and prog charts for two years — the same skills I built while earning my commercial and instrument ratings. I know what NOAA's models are good at and where they fall apart. And every time a weather event rolls through, there are real money markets on Kalshi asking things like "Will the high temperature in Phoenix exceed 105°F today?" or "Will it rain more than 0.25 inches in Seattle this week?"

I thought: I probably have better-than-random intuition here. What if I built a bot to make those trades systematically and found out?

Six months later, the bot has 137 paper trades and $959.27 in simulated profit. Here's exactly how it works.

Why Weather Prediction Markets

Kalshi is a regulated prediction market exchange. You trade on yes/no questions about future events. Unlike sports betting or crypto, the underlying "asset" is something with actual data behind it — government forecasts, historical records, numerical weather models that have been refined for decades.

Weather markets appealed to me for a few specific reasons.

First, the edge is knowable. Temperature forecasts from NOAA and Open-Meteo aren't random. They're probabilistic outputs from physics-based models. If I can extract a probability estimate that's better calibrated than what the market is pricing, I have a real edge.

Second, the events resolve fast. A "will it hit 100°F in Phoenix today?" market resolves by end of day. That's very different from holding a position for weeks. Fast resolution means I learn quickly whether my model is working.

Third, as a pilot, I already had the domain knowledge. I didn't need to learn what a 500mb chart means. I didn't need to understand the difference between ensemble spread and deterministic output. That was already in my head from instrument training — the same skills I built while earning my instrument rating and commercial certificate.

So I started building.

How the Bot Works

Data Sources

The bot pulls from two forecast APIs: NOAA's National Digital Forecast Database (NDFD) and Open-Meteo. NOAA is the authoritative US source. Open-Meteo provides ensemble model data with uncertainty spreads. Together they give me a probability distribution for temperature, precipitation, and wind at any US location.

For each market, the bot identifies the relevant location and variable, queries both APIs, and builds a forecast ensemble. The ensemble spread tells me how confident the models are. Wide spread means high uncertainty; tight spread means the models agree.

Probability Estimation

For bracket markets (things like "will the high exceed X?"), I use a Normal CDF approach. I take the ensemble mean as the expected value and the spread as the standard deviation, then calculate the probability that the outcome falls above or below the threshold.

So if the ensemble says Phoenix high temperature tomorrow has a mean of 103°F with a standard deviation of 4°F, I can calculate the exact probability that it exceeds 105°F. That probability becomes my edge estimate against whatever price the market is offering.

Strategies

The bot runs two strategies, and this is where the real lesson is.

CALIBRATION is the primary strategy. It looks for markets where my probability estimate and the market price disagree by a meaningful margin. If I think there's a 70% chance something happens and the market is paying 55 cents on the dollar (implying 55%), that's a 15-point edge and I take the trade.

BOUNDARY_FADE was a second strategy I added, looking to fade markets priced near the boundary — extreme events, thin probability ranges. It seemed clever in theory. The data killed it. BOUNDARY_FADE averaged negative $19.38 per trade. It's disabled now.

Position Sizing: Kelly Criterion

This part matters more than most people think. I use a fractional Kelly Criterion to size each position. Kelly tells you the mathematically optimal fraction of your bankroll to bet given your edge and the odds. Full Kelly is aggressive enough to cause significant drawdowns, so I use a fraction of it.

The formula: f = (edge * odds) / (odds - 1), then I take a fraction of that (typically 25-50% Kelly) based on my confidence in the edge estimate. Position sizes are capped so no single trade risks more than a defined percentage of the bankroll.

Storm Mode

One detail I'm proud of: the bot has a storm mode. Normally it scans for new markets every 30 minutes. When there's a significant weather event in the forecast — a front moving through, a major precipitation event, an extreme heat warning — it shifts to scanning every 10 minutes. More markets open during events, they're more volatile, and the edge windows close faster.

What the Numbers Say

After 137 paper trades on a $1,000 simulated bankroll:

$959.27
Total Paper Profit
137
Total Trades
52.7%
Win Rate
2.44x
Win/Loss Ratio

Breaking it down by strategy:

Strategy Trades Profit Avg/Trade
CALIBRATION 124 $907.79 +$7.32
BOUNDARY_FADE 13 −$251.94 −$19.38

CALIBRATION alone: 124 trades, $907.79 profit, $7.32 average per trade. Average win of $21.17 against an average loss of $8.68. That 2.44x win-to-loss ratio is doing a lot of heavy lifting alongside the 52.7% win rate.

BOUNDARY_FADE: 13 trades, -$251.94. Disabled immediately once I saw those numbers.

Monte Carlo Validation

I ran 10,000 Monte Carlo simulations using the CALIBRATION strategy's win rate, average win, average loss, and a realistic position size range. The results:

That's not a guarantee of future results. Monte Carlo assumes the future looks like the past. But 98.3% profitable across 10,000 runs with median profit of $873 is a signal worth taking seriously.

The edge isn't huge. $7.32 per trade on a $1,000 bankroll is not going to fund a Citation X. But it's real, it's consistent, and it compounds.

The Six Live Gates

Paper trading is paper trading. I'm not touching live money until I'm confident the edge is real and durable. So I built a gate system. Six criteria that have to pass simultaneously before the bot goes live.

Currently: 3 out of 6 gates passing. I need all six before I flip KALSHI_LIVE_MODE = True. I have a $100 live account sitting funded and ready. It stays there until the gates clear.

My best estimate for when that happens: around early April 2026, assuming the win rate and drawdown metrics continue improving. I'm not rushing it. Going live early is how you find out your edge wasn't real.

What I Got Wrong

BOUNDARY_FADE is the obvious one. I was pattern-matching to a trading concept that sounds good in a book and didn't do the work to validate it before trading it. By the time I had 13 trades of data, it had already cost me $252 in simulated profit. That's actually a cheap lesson in paper trading terms. In live trading it would have been real money.

The other thing I underestimated: how much the model spread matters. Early on I was using point forecasts from NOAA without accounting for forecast uncertainty. A mean temperature of 98°F means something very different when the ensemble spread is 2°F versus 8°F. I now weight my confidence in an edge by the spread. Tight spread, high confidence, bigger position. Wide spread, uncertain, smaller position or skip.

What's Next

The short answer: I wait for 6/6 gates to pass, then go live with $100 and let it run for 90 days. If the live results track the paper results within a reasonable margin, I scale up. If they don't, I go back to paper trading and figure out what changed.

I'm also looking at extending the model to precipitation and wind markets, not just temperature. Temperature is the cleanest signal because NOAA's temperature forecasts are very good. Precipitation is noisier. But there might be edge in specific market types where the noise works in my favor.

The longer game: if this generates a consistent $500 to $1,000 a month, that's two to five flight hours funded every month. Not enough to change my timeline dramatically on its own. But combined with the workbooks, the app revenue, and a couple of other projects, it moves the needle. I wrote about the full picture of how I'm using AI to build side income as a pilot — the Kalshi bot is one piece of a larger strategy. For the full menu of how pilots actually close the hour gap affordably, I also put together a guide to the cheapest ways to build flight hours in 2026.

369.5 hours and counting. The bot doesn't sleep. Neither does the compound interest.

✈️ Following the Full Journey

I track every income stream building toward my flying career on the Flight Funded tracker. Kalshi bot P&L, app revenue, KDP book sales, all of it. Live numbers, updated automatically.


Nick Rae is a commercial pilot (ASEL/AMEL/IR) with 369.5 hours, a car electronics shop manager, and a builder based in Merced, CA. He's building every income stream he can find to fund the remaining 1,130 flight hours between him and a real flying job. Follow along on X @nickrae or check the Flight Funded tracker for live numbers.
← All Posts Next: Setting Up OpenClaw →