Currently, I am exploring the mathematical foundations of intelligence in large models and sequential decision-making. More specifically, I want to ground the concept of AI trust in mathematical interpretability rather than philosophy, and delve deeper into these areas through Reinforcement Learning methods.
I enjoy building small but complete research systems, writing technical notes, and turning vague ideas into runnable artifacts. A core probe of five hand-crafted features—token confidence, trajectory continuity, reflection count, novelty, neuron width—predicts solution correctness from the reasoning trace alone.
This declaration calls for action to address the challenges posed by the use of artificial intelligence within mathematics research. It is the result of a community initiative and endorsed by the International Mathematical Union (IMU).
The canonical pipeline is StandardScaler → TruncatedSVD → LogisticRegression:
def svd_predict(X, n_components=50):
scaler = StandardScaler()
svd = TruncatedSVD(n_components=n_components)
clf = LogisticRegression(max_iter=1000)
X_scaled = scaler.fit_transform(X)
X_latent = svd.fit_transform(X_scaled)
clf.fit(X_latent, y)
return clf.predict_proba(X_latent)[:, 1]
| Domain | Method | AoA |
|---|---|---|
| Math (AIME, HMMT) | CoT features | 0.958 |
| Science (GPQA) | Confidence | 0.799 |
| Coding (LiveCodeBench) | CoT features | 0.434 |
ICML 2026AI4MathReviewer
软方案 1 · Literata + Nunito Sans + JetBrains Mono · 17px