Syft AI | AI Advancements: NSF and Nvidia invest $152M in open AI models; Delta Thailand forecasts double-digit growth

AI Advancements: NSF and Nvidia invest $152M in open AI models; Delta Thailand forecasts double-digit growth

Artificial Intelligence News and Updates

Total 1328 words · 6 mins read

Key Takeaways

NSF and Nvidia invest $152M in OMAI project for open AI models.
Delta Thailand forecasts double-digit growth, with AI products at 50% of sales.
Cohere's valuation hits $6.8B after $500M funding; Joelle Pineau appointed CAIO.
Anthropic launches learning modes for Claude.ai and Claude Code to all users.
Mistral AI launches Mistral Medium 3.1, a multimodal LLM for enterprises.

AI Breakthroughs

MetaKGRAG improves LLM reasoning with closed-loop refinement.

Researchers propose MetaKGRAG, a novel framework enhancing Large Language Models' reasoning by introducing a Perceive-Evaluate-Adjust cycle for path-aware refinement, outperforming existing KG-RAG frameworks.

AmbiGraph-Eval: LLMs struggle with ambiguous graph queries.

Researchers introduce AmbiGraph-Eval, a benchmark for evaluating Large Language Models on handling ambiguous graph queries, finding that even top models struggle.

Active Reading enhances LLM fact learning and recall significantly.

On August 13, 2025, a paper introduced Active Reading, a framework designed to improve the reliability of fact learning and recall in Large Language Models, achieving significant knowledge absorption compared to vanilla finetuning.

PacifAIst: Benchmark reveals LLM safety alignment variations.

Researchers introduce PacifAIst, a benchmark to evaluate Large Language Models' behavioral alignment with human safety, with Google's Gemini 2.5 Flash achieving the highest Pacifism Score at 90.31%.

LLMs fine-tuned for Symbolic Regression outperform traditional methods.

Researchers propose fine-tuning Large Language Models for Symbolic Regression tasks, introducing SymbArena, a benchmark with 148,102 diverse equations, showing that SymbolicChat outperforms traditional numerical methods.

EffiEval enables efficient LLM benchmarking with reduced data.

Researchers present EffiEval, a training-free approach for efficient benchmarking of large language models, addressing data redundancy while maintaining high evaluation reliability.

FedShard accelerates federated data unlearning with fairness guarantees.

Researchers introduce FedShard, a federated unlearning algorithm that guarantees both efficiency fairness and performance fairness, accelerating data unlearning 1.3-6.2 times faster than retraining from scratch.

Multi-agent framework automates evaluation of mobile intelligent assistants.

Researchers propose an automated multi-modal evaluation framework for mobile intelligent assistants using large language models and multi-agent collaboration, showing effectiveness in predicting user satisfaction.

M3-Agent: Multimodal agent with long-term memory outperforms baselines.

On August 13, 2025, a paper introduced M3-Agent, a multimodal agent with long-term memory, processing visual and auditory inputs and outperforming baselines on M3-Bench.

HKT: Biologically inspired knowledge transfer for compact neural networks.

On August 13, 2025, a paper introduced HKT, a framework inspired by biological inheritance for transferring features from a larger, pretrained network to a smaller model, improving performance and compactness.

Task diversity shortens ICL plateau in language model training.

A paper published on arXiv on August 13, 2025, reveals that training language models on multiple diverse in-context learning tasks simultaneously shortens the loss plateaus.

UDA reduces preference bias in LLM evaluations by up to 63.4%.

On August 13, 2025, a paper introduced UDA, a framework designed to reduce preference bias in pairwise evaluations of Large Language Models, reducing inter-judge rating standard deviation by up to 63.4%.

NeuronTune modulates neurons for balanced safety-utility in LLMs.

On August 13, 2025, a paper introduced NeuronTune, a fine-grained framework that dynamically modulates sparse neurons to achieve simultaneous safety-utility optimization in LLMs.

AINL-Eval 2025 task detects AI-generated scientific abstracts in Russian.

The AINL-Eval 2025 Shared Task focuses on detecting AI-generated scientific abstracts in Russian, using a dataset of 52,305 samples and attracting 10 teams.

SLowED: Safe distillation maintains safety of Small Language Models.

Researchers propose SLowED, a safe distillation method to maintain the safety of Small Language Models during chain-of-thought distillation, consisting of Slow Tuning and Low-Entropy Masking.

ChatGPT, Gemini, Claude exhibit gender-based narrative biases.

The paper explores gender-based narrative biases in stories generated by ChatGPT, Gemini, and Claude, revealing persistent biases in character descriptions, actions, and relationships.

Industry Watch

SuperOps, AWS launch AI agent marketplace for MSPs, IT teams.

SuperOps and AWS are launching an AI agent marketplace for MSPs and IT teams, allowing developers to publish and monetize AI agents for IT workflow simplification.

Ex-Twitter CEO Agrawal raises $30M for AI agent web search.

Ex-Twitter CEO Parag Agrawal launched Parallel Web Systems to build infrastructure for AI agents to search the web, raising $30 million from Khosla Ventures and others.

Multiverse launches SuperFly, ChickBrain AI models for IoT, devices.

On August 14, 2025, Multiverse Computing launched two new AI models: SuperFly, with 94M parameters for IoT devices, and ChickBrain, at 3.2 billion parameters for devices like MacBooks.

AI acceleration: GhatGPT growth, Anthropic scaling, Grok 4 accuracy.

On August 14, 2025, data points indicated AI acceleration, including GhatGPT's rapid user growth, Anthropic's revenue scaling, and Grok 4 model's ~18% accuracy on the HLE benchmark.

Real-World AI

GoViG: Autonomous navigation instruction from visual observations.

Researchers introduce GoViG, a new task for autonomously generating navigation instructions from egocentric visual observations, showing significant improvements on the R2R-Goal dataset.

Animate-X++: Universal character image animation framework based on DiT.

Researchers propose Animate-X++, a universal character image animation framework based on DiT, generating high-quality videos from a reference image and target pose sequence for various character types.

CGAD: Causal Graph Profiling for cyberattack detection in infrastructure.

On August 13, 2025, a paper introduced CGAD, a Causal Graph-based Anomaly Detection framework for reliable cyberattack detection in public infrastructure systems, achieving superior adaptability and accuracy.

Video-EM enhances long-form video understanding with episodic events.

Researchers introduce Video-EM, a framework for long-form video understanding, modeling keyframes as temporally ordered episodic events and achieving 4-9 percent performance gains over baselines.

Deep learning automates coronal brain tissue segmentation with high accuracy.

Researchers have developed a deep learning model to automate the segmentation of coronal brain tissue slabs from photographs, achieving a median Dice score over 0.98.

LLMs and Behavior Trees enable interpretable robot control.

Researchers propose a framework combining Large Language Models with Behavior Trees for interpretable robot control, enabling robots to interpret natural language instructions and execute actions with an average accuracy of 94%.

VIPCGRL improves human-aligned procedural level generation via DRL.

Researchers propose VIPCGRL, a deep reinforcement learning framework for human-aligned procedural level generation, incorporating text, level, and sketches to enhance human-likeness.

AI identifies elephant poaching hotspots via satellite imagery.

A Computer Vision model using satellite imagery can help identify elephant poaching hotspots, enabling more effective deployment of resources without disturbing local species.

Run LLMs locally: Ollama app, self-hosting for performance, privacy.

Ollama's new app allows users to run large language models locally with a GUI interface, while a guide focuses on self-hosting, detailing hardware and software requirements for faster performance and privacy.

RL trains software engineering agents, doubling baseline accuracy.

Researchers developed a reinforcement learning framework for training long-context, multi-turn software engineering agents, achieving 39% Pass@1 accuracy on the SWE-bench Verified benchmark.

Noise-adapted Neural Operator for Robust Non-Line-of-Sight Imaging.

A paper published on arXiv on August 13, 2025, presents a noise-adapted neural operator for robust non-line-of-sight imaging, introducing a noise estimation module and a parameterized neural operator for rapid image reconstruction.

Follow What Matters to You