Content-Aware File System.
pip install intellifs
Note: intellifs only indexes plain text files, HTML, XML and PDF soures by default.
pip install "unstructured[all-docs]"
Refer unstructured installation documentation for more control over document types.
Display help section
ifs
Usage: ifs COMMAND
Content-Aware File System.
╭─ Commands ───────────────────────────────────────────────────╮
│ embedder Default embedder. │
│ index Index a file or directory. │
│ search Perform semantic search in a directory. │
│ shell Interactive Shell. │
│ version Display application version. │
╰──────────────────────────────────────────────────────────────╯
╭─ Parameters ─────────────────────────────────────────────────╮
│ help,-h Display this message and exit. │
╰──────────────────────────────────────────────────────────────╯
index command
ifs index help
Usage: ifs index [ARGS]
Index a file or directory.
╭─ Arguments ──────────────────────────────────────────╮
│ * PATH Path to file or directory. [required] │
╰──────────────────────────────────────────────────────╯
Indexing a file.
ifs index ./Cyber.pdf
Indexing a directory.
ifs index ./test_docs
search command
ifs search help
Usage: ifs search [ARGS] [OPTIONS]
Perform semantic search in a directory.
╭─ Arguments ───────────────────────────────────────────────────────────╮
│ DIR Start search directory path. [default: /home/synacktra] │
╰───────────────────────────────────────────────────────────────────────╯
╭─ Parameters ──────────────────────────────────────────────────────────╮
│ * --query -q Search query string. [required] │
│ --max-results -k Maximum result count. [default: 5] │
│ --threshold -t Minimum filtering threshold value. │
│ --return -r Component to return. [choices: path,context] │
╰───────────────────────────────────────────────────────────────────────╯
Search in current directory
ifs search --query "How does intellifs work?"
Search in specific directory
ifs search path/to/directory --query "How does intellifs work?"
Get specific amount of results
ifs search -q "How does intellifs work?" -k 8
Control threshold value for better results
ifs search -q "How does intellifs work?" -t 0.5
Get specific component of results [default: path mapped contexts JsON]
ifs search -q "How does intellifs work?" -r path
embedder command
ifs embedder help
Usage: ifs embedder [OPTIONS]
Default embedder.
╭─ Parameters ────────────────────────────────────╮
│ --select -s Select from available embedders. │
╰─────────────────────────────────────────────────╯
ifs embedder
{
"model": "BAAI/bge-small-en-v1.5",
"dim": 384,
"description": "Fast and Default English model",
"size_in_GB": 0.13
}
Uses https://github.com/synacktraa/minifzf for selection.
ifs embedder --select
shell commandStarts an interactive shell.
https://github.com/synacktraa/intellifs/assets/91981716/cfc7894a-90d7-49f9-bac1-c5a76ddc0690
FileSystemfrom intellifs import FileSystem
ifs = FileSystem()
By default it uses default embedder. You can specify a different
Embedderinstance too.
from intellifs.embedder import Embedder
ifs = FileSystem(
embedder=Embedder(model="<model-name>", dim=<model-dimension>)
)
Use
Embedder.available_modelsto list supported models.
index methodIndexing a file
from intellifs.indexables import File
ifs.index(File(__file__))
Indexing a directory
from intellifs.indexables import Directory
ifs.index(Directory('path/to/directory'))
is_indexed methodVerify If a
FileorDirectoryhas been indexed.
file = File(__file__)
ifs.is_indexed(file)
ifs.is_indexed(file.directory)
search methodSearch in current directory
ifs.search(query="How does intellifs work?")
Search in specific directory
ifs.search(
directory=Directory('path/to/directory'),
query="How does intellifs work?"
)
Get specific amount of results
ifs.search(query="How does intellifs work?", max_results=8)
Control threshold value for better results
ifs.search(
query="How does intellifs work?", score_threshold=0.5
)
The FileSystem is a sophisticated file system management tool designed for organizing and searching through files and directories based on their content. It utilizes embeddings to represent file contents, allowing for semantic search capabilities. Here’s a breakdown of its core components and functionalities:
Metadata and IndexFileSystem ClassThe FileSystem class is the heart of the system, integrating various components to facilitate file indexing, searching, and management.
Upon initialization, the FileSystem prepares the environment for indexing and searching files and directories with the following steps:
undefinedEmbedder Setup: An embedder is initialized to generate vector embeddings from file content. If a custom embedder is not provided, the system defaults to a pre-configured option suitable for general-purpose text embedding.
undefinedLocal Storage Initialization: The system sets up a local storage mechanism to cache the embeddings and metadata. This involves:
map.json) within the cache directory to maintain a record of collection names associated with base paths.undefinedBase Path Handling: The FileSystem intelligently handles base paths to accommodate the file system structure of different operating systems.
C:, D:). This allows the system to manage files and directories across different drives distinctly./var, /home). This approach facilitates indexing and searching files in a structured manner consistent with UNIX-like directory hierarchies.undefinedCollection Management: Utilizes a local persistent vector database, managed through the qdrant_client, to store and retrieve embeddings and metadata.
Allows for semantic search within specified directories or globally across all indexed files. Searches are performed using query embeddings to find the most relevant files based on their content embeddings.