Which model suits humanities research?

For Finnish and Nordic sources, Viking and Poro are strongest. For multilingual literature, Qwen 3.5 and Mistral Small 4 work well. Fine-tuning on your own corpus significantly improves results.

How much data fits in a local RAG?

On an M-series Mac, 10k–100k PDFs easily. On a workstation (Strix Halo, DGX Spark), millions of pages. The bottleneck is usually disk, not memory.

AI Assistant for Researchers — offline AI for unpublished datasets

Researchers need more from AI than Wikipedia-summarising. You need a tool that reads your citation library, remembers what's in it, cites correctly, and stays offline with your unpublished data. Sinun AI is built for exactly this on local open models (Viking, Qwen 3.5, DeepSeek V3.2) and a local RAG stack (SQLite + sqlite-vec).

Library management and semantic search

Drop thousands of PDFs, interview recordings and notes into a local folder. Sinun AI indexes them into a sqlite-vec database you can search semantically — without a single page leaving the network. Queries run in milliseconds, not seconds.

Safe handling of unpublished data

Unpublished data is a researcher's capital, and most university ethics boards forbid sending it to foreign commercial AI services. A local model runs on your laptop or workstation without network — ethically defensible, technically easy.

Reproducibility: same model, same version, same result

ChatGPT outputs are not reproducible — versions change without notice. With a local open model (Viking 33B v1.0, Qwen 3.5 27B) you can pin the version, cite the weights in your paper and repeat the analysis a year later.

Frequently asked

Which model suits humanities research?: For Finnish and Nordic sources, Viking and Poro are strongest. For multilingual literature, Qwen 3.5 and Mistral Small 4 work well. Fine-tuning on your own corpus significantly improves results.
How much data fits in a local RAG?: On an M-series Mac, 10k–100k PDFs easily. On a workstation (Strix Halo, DGX Spark), millions of pages. The bottleneck is usually disk, not memory.

Updated 2026-04-21

The researcher's second brain — offline, on your own data

Library management and semantic search

Safe handling of unpublished data

Reproducibility: same model, same version, same result

Frequently asked

Related

Want your own local AI assistant?