企業におけるAIエージェントの作成: Box、MongoDB Atlas、OpenAI、LangChainを組み合わせてインテリジェントなドキュメント検索を実現する方法

組織では、さまざまなプラットフォームに散らばったドキュメントがあふれています。社内のドキュメントを検索するだけでなく、実際にその内容を理解し、会話形式のインテリジェントな応答を提供するAIエージェントを作成できるとしたらどうでしょうか。

このガイドでは、Box (ドキュメントの保存に使用)、MongoDB Atlas (ベクトル検索に使用)、OpenAI (言語理解に使用)、LangChain (オーケストレーションに使用) の機能を組み合わせた高性能なAIエージェントの作成について説明します。

今回作成する機能
テクノロジスタック
基盤の設定
環境の構成
ドキュメントの取り込みと処理
MongoDB Atlasのベクトル検索インフラストラクチャの作成
インテリジェントな検索ツールの作成

今回作成する機能

このチュートリアルが終わる頃には、以下を実行できるエンタープライズグレードのAIシステムを作成できます。

意味論的理解によって、Boxに保存されているドキュメントをインテリジェントに検索する
適切な情報源への帰属を伴う会話形式の応答を提供する
複数のインタラクションにわたってコンテキストを記憶する

今回のシステムでは、財務収益報告書を処理し、「技術系企業が直面している大きな課題は何ですか」などの複雑な質問に、情報源に裏付けられた詳細な応答で回答します。

テクノロジスタック

Box: インテリジェントコンテンツ管理

Boxは、安全なドキュメントリポジトリとして機能します。シンプルなファイルストレージとは異なり、Boxは、エンタープライズグレードのセキュリティ、バージョン管理、APIアクセスを提供するため、ビジネスアプリケーションに最適です。ドキュメントにシームレスかつ安全にアクセスするには、Boxのクライアント資格情報許可 (CCG) 認証を使用します。

MongoDB Atlas: ベクトル検索

MongoDB Atlasは、インテリジェントな検索エンジンとして機能します。そのベクトル検索機能を使用すると、キーワードだけでなく、意味に基づいてドキュメントを検索することができます。Atlasは、埋め込み類似度の複雑な計算を処理すると同時に、企業に必要な拡張性を提供します。

OpenAI: 言語理解

OpenAIのモデルは、(埋め込みによる) ドキュメントの理解と応答の生成の両方を強化します。検索用の埋め込みと応答用のGPT-4の組み合わせにより、コンテンツをしっかり理解するシステムを作成できます。

LangChain: オーケストレーションレイヤー

LangChainは、ドキュメント処理、エージェント作成、ワークフロー管理のためのツールを提供し、すべてを結び付けます。そのLangGraphフレームワークにより、高度なマルチステップ推論とツールの使用が可能になります。

基盤の設定

まず、すべてのサービスへの接続を確立します。この基盤により、AIエージェントはドキュメントへのアクセス、埋め込みの保存、インテリジェントな応答の生成が可能になります。

依存関係をインストール

最初に、新しいPythonプロジェクトを作成し、必要なパッケージをインストールします。

# Using uv for fast dependency management
uv sync

環境の構成

サービスの資格情報を使用して.envファイルを作成します。

# Box Enterprise Configuration
BOX_CLIENT_ID=your_box_client_id
BOX_CLIENT_SECRET=your_box_client_secret  
BOX_SUBJECT_ID=your_box_user_id

# OpenAI Configuration
OPENAI_API_KEY=your_openai_api_key

# MongoDB Atlas Configuration
MONGODB_URI=mongodb+srv://username:password@cluster.mongodb.net/

Boxへの接続を確立

BoxのCCG認証により、企業のアプリケーションに最適な、安全なサーバー間通信が可能です。以下に、a_init_data.pyで接続を確立する方法を示します。

from box_sdk_gen import (
    BoxCCGAuth,
    CCGConfig,
    BoxClient,
    FileTokenStorage,
    BoxAPIError,
)

def get_box_client() -> BoxClient:
    """
    Initialize and return a Box client using the Box CCG Auth.
    """
    client_id = os.getenv("BOX_CLIENT_ID")
    client_secret = os.getenv("BOX_CLIENT_SECRET")
    user_id = os.getenv("BOX_SUBJECT_ID")

    # Create a BoxCCGConfig instance
    box_config = CCGConfig(
        client_id=client_id,
        client_secret=client_secret,
        user_id=user_id,
        token_storage=FileTokenStorage(".ccg.db"),
    )
    # Create a BoxCCGAuth instance
    box_auth = BoxCCGAuth(box_config)
    # Create a BoxClient instance
    return BoxClient(box_auth)

この方法により、アプリケーションは、ユーザーへのログインプロンプトなしで安全にBoxのリソースにアクセスできるようになります。これは企業の自動化ワークフローに不可欠です。

ドキュメントの取り込みと処理

次に、ドキュメントをBoxにアップロードして、LangChainで処理し、インテリジェントな検索のために準備するパイプラインを作成します。

ドキュメントをBoxにアップロード

Boxは、安全なドキュメントリポジトリとして機能します。以下に、upload_sample_data関数が適切なエラー処理を行いながらドキュメントのアップロードを処理する方法を示します。

from box_sdk_gen import (
    CreateFolderParent,
    UploadFileAttributes,
    UploadFileAttributesParentField,
)

def upload_sample_data(
    box_client: BoxClient,
    parent_folder_id: str = DEMO_FOLDER_PARENT_ID,
    local_folder_path: str = SAMPLE_DATA_LOCAL_PATH,
) -> str:
    try:
        box_folder = box_client.folders.create_folder(
            name=os.path.basename(local_folder_path),
            parent=CreateFolderParent(id=parent_folder_id),
        )
    except BoxAPIError as e:
        if e.response_info.body["status"] == 409:
            # Folder already exists, get its ID
            box_folder = box_client.folders.get_folder_by_id(
                e.response_info.body["context_info"]["conflicts"][0]["id"]
            )
    print(f"Created folder: {box_folder.name} ({box_folder.id})")
    # Upload files to the new folder
    local_folder_path = os.path.abspath(local_folder_path)
    for root, _, files in os.walk(local_folder_path):
        for file_name in files:
            file_path = os.path.join(root, file_name)
            parent = UploadFileAttributesParentField(id=box_folder.id, type="folder")
            file_attributes = UploadFileAttributes(
                name=file_name,
                parent=parent,
            )
            with open(file_path, "rb") as file_stream:
                try:
                    box_file = box_client.uploads.upload_file(
                        attributes=file_attributes, file=file_stream
                    ).entries[0]
                    print(f"Uploaded file: {box_file.name} ({box_file.id})")
                except BoxAPIError as e:
                    if e.response_info.body["status"] == 409:
                        print(
                            f"File already exists: {file_name} ({e.response_info.context_info['conflicts']['id']})"
                        )
    return box_folder.id

LangChainを使用してドキュメントを処理

LangChainのBoxLoaderは、Boxとシームレスに統合でき、ドキュメントコンテンツを抽出して処理します。load_documents_from_box関数でこの統合を示します。

from langchain_box.document_loaders import BoxLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

def load_documents_from_box(
    box_client: BoxClient,
    box_folder_id: str,
) -> List[Document]:
    auth_token = box_client.auth.retrieve_token().access_token
    loader = BoxLoader(
        box_developer_token=auth_token,
        box_folder_id=box_folder_id,  # type: ignore
    )
    data = loader.load()
    # Split documents into chunks
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=200, chunk_overlap=20)
    return text_splitter.split_documents(data)

チャンク化戦略は非常に重要です。今回の構成では、200文字のチャンクと20文字のオーバーラップを使用しており、コンテキストを維持しながら正確な検索ができるように最適化されています。

MongoDB Atlasのベクトル検索インフラストラクチャの作成

MongoDB Atlasは、ベクトル埋め込み (意味論的意味を捉えた数値表現) を使用して、ドキュメントのチャンクを検索可能なナレッジベースに変換します。

ベクトルストレージの設定

create_vector_index関数は、ベクトル検索インフラストラクチャ全体を作成します。

from langchain_mongodb import MongoDBAtlasVectorSearch
from langchain_openai import OpenAIEmbeddings
from pymongo import MongoClient

def create_vector_index(docs: List[Document]) -> None:
    # drop collection if it exists
    client = MongoClient(MONGODB_URI)
    db = client["langchain_db"]
    if "earnings_reports" in db.list_collection_names():
        db["earnings_reports"].drop()
    # Initialize the MongoDB Atlas vector search index
    embedding_model = OpenAIEmbeddings()
    # Instantiate vector store
    vector_store = MongoDBAtlasVectorSearch.from_connection_string(
        connection_string=MONGODB_URI,
        namespace="langchain_db.earnings_reports",
        embedding=embedding_model,
        index_name="vector_index",
    )
    # Add data to the vector store
    vector_store.add_documents(docs)
    # Use helper method to create the vector search index
    vector_store.create_vector_search_index(dimensions=1536)

また、完全一致検索用にファイルコンテンツ検索インデックスも作成します。

from langchain_mongodb.index import create_fulltext_search_index

def create_search_index() -> None:
    # Connect to your cluster
    client = MongoClient(MONGODB_URI)
    # Use helper method to create the search index
    create_fulltext_search_index(
        collection=client["langchain_db"]["earnings_reports"],
        field="text",
        index_name="search_index",
    )

ベクトル検索とは

ベクトル検索は、テキストを、意味論的意味を捉えた高次元の数値表現 (埋め込み) に変換することで機能します。「会社の課題」を検索すると、完全に一致する単語が存在しない場合でも、「ビジネス上の障害」や「会社の問題」に関するドキュメントが検索されます。

インテリジェントな検索ツールの作成

次に、AIエージェントがドキュメントをインテリジェントに検索するために使用できる専用ツールを作成します。

ベクトル検索ツール

このツールは、意味論的類似性に基づいてドキュメントを検索します。

from langchain.agents import tool
from langchain_mongodb import MongoDBAtlasVectorSearch
from langchain_openai import OpenAIEmbeddings

@tool
def vector_search(user_query: str) -> str:
    """
    Retrieve information using vector search to answer a user query.
    """
    # Instantiate the vector store
    vector_store = MongoDBAtlasVectorSearch.from_connection_string(
        connection_string=MONGODB_URI,
        namespace="langchain_db.earnings_reports",
        embedding=OpenAIEmbeddings(),
        index_name="vector_index",  # Name of the vector index
    )
    retriever = vector_store.as_retriever(
        search_type="similarity",
        search_kwargs={"k": 5},  # Retrieve top 5 most similar documents
    )
    results = retriever.invoke(user_query)
    # Concatenate the results into a string
    context = "\n\n".join(
        [f"{doc.metadata['title']}: {doc.page_content}" for doc in results]
    )
    return context

ファイルコンテンツ検索ツール

このツールは、正確なクエリに対するテキストの完全一致を検索します。

from langchain_mongodb.retrievers.full_text_search import (
    MongoDBAtlasFullTextSearchRetriever,
)

@tool
def full_text_search(user_query: str) -> dict:
    """
    Retrieve movie plot content based on the provided title.
    """
    client = MongoClient(MONGODB_URI)
    collection = client["langchain_db"]["earnings_reports"]
    # Initialize the retriever
    retriever = MongoDBAtlasFullTextSearchRetriever(
        collection=collection,  # MongoDB Collection in Atlas
        search_field="text",  # Name of the field to search
        search_index_name="search_index",  # Name of the search index
        top_k=1,  # Number of top results to return
    )
    results = retriever.invoke(user_query)
    for doc in results:
        if doc:
            del doc.metadata["embedding"]
            doc.metadata["page_content"] = doc.page_content
            return doc.metadata
        else:
            return "Document not found"

LangChainエージェントの作成

このエージェントは、システムの中枢として機能し、使用すべきツールや、それらの結果を一貫性のある応答にどのようにまとめるかを決定します。

エージェントの構成

from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_openai import ChatOpenAI
from langchain_core.tools import BaseTool

def get_llm_with_tools():
    # Initialize the LLM
    llm = ChatOpenAI(model="gpt-4o")
    # Create a chat prompt template for the agent, which includes a system prompt and a placeholder for `messages`
    prompt = ChatPromptTemplate.from_messages(
        [
            (
                "You are a helpful AI agent."
                " You are provided with tools to answer questions about tech companies earnings."
                " Think step-by-step and use these tools to get the information required to answer the user query."
                " Do not re-run tools unless absolutely necessary."
                " If you are not able to get enough information using the tools, reply with I DON'T KNOW."
                " You have access to the following tools: {tool_names}."
            ),
            MessagesPlaceholder(variable_name="messages"),
        ]
    )
    tools = [vector_search, full_text_search]
    # Provide the tool names to the prompt
    prompt = prompt.partial(tool_names=", ".join([tool.name for tool in tools]))
    # Prepare the LLM by making the tools and prompt available to the model
    bind_tools = llm.bind_tools(tools)
    llm_with_tools = prompt | bind_tools
    return llm_with_tools

システムプロンプトでは、ツールの使用に関する明確なガイドラインを定義し、エージェントが情報源に裏付けられた信頼できる応答を提供することを保証します。

LangGraphワークフローの実装

LangGraphにより、高度なマルチステップ推論ワークフローが可能になります。d_langgraph.pyで、状態の管理を備えた会話型エージェント一式を実装しています。

グラフの状態管理

from typing import Annotated
from typing_extensions import TypedDict
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages

# Define the graph state
class GraphState(TypedDict):
    messages: Annotated[list, add_messages]

AgentノードとToolsノード

def agent(state: GraphState) -> Dict[str, List]:
    """
    Agent node
    Args:
        state (GraphState): Graph state
    Returns:
        Dict[str, List]: Updates to messages
    """
    llm_with_tools = get_llm_with_tools()
    # Get the messages from the graph `state`
    messages = state["messages"]
    # Invoke `llm_with_tools` with `messages`
    result = llm_with_tools.invoke(messages)
    # Write `result` to the `messages` attribute of the graph state
    return {"messages": [result]}

def tools_node(state: GraphState) -> Dict[str, List]:
    tools = get_tools()
    # Create a map of tool name to tool call
    tools_by_name = {tool.name: tool for tool in tools}
    result = []
    # Get the list of tool calls from messages
    tool_calls = state["messages"][-1].tool_calls
    # Iterate through `tool_calls`
    for tool_call in tool_calls:
        # Get the tool from `tools_by_name` using the `name` attribute of the `tool_call`
        tool = tools_by_name[tool_call["name"]]
        # Invoke the `tool` using the `args` attribute of the `tool_call`
        observation = tool.invoke(tool_call["args"])
        # Append the result of executing the tool to the `result` list as a ToolMessage
        result.append(ToolMessage(content=observation, tool_call_id=tool_call["id"]))
    # Write `result` to the `messages` attribute of the graph state
    return {"messages": result}

ワークフローのルーティングロジック

def route_tools(state: GraphState):
    """
    Uses a conditional_edge to route to the tools node if the last message
    has tool calls. Otherwise, route to the end.
    """
    # Get messages from graph state
    messages = state.get("messages", [])
    if len(messages) > 0:
        # Get the last AI message from messages
        ai_message = messages[-1]
    else:
        raise ValueError(f"No messages found in input state to tool_edge: {state}")
    # Check if the last message has tool calls
    if hasattr(ai_message, "tool_calls") and len(ai_message.tool_calls) > 0:
        return "tools"
    return END

MongoDBによる詳細なグラフアセンブリとメモリの追加

企業のアプリケーションは、会話全体のコンテキストを記憶する必要があります。今回の実装では、永続メモリにMongoDBのチェックポイントを使用します。

from langgraph.checkpoint.mongodb import MongoDBSaver

def get_compiled_graph() -> CompiledStateGraph:
    """
    Main function to execute the graph.
    """
    # Instantiate the graph
    graph = StateGraph(GraphState)
    # Add "agent" node using the `add_node` function
    graph.add_node("agent", agent)
    # Add "tools" node using the `add_node` function
    graph.add_node("tools", tools_node)
    # Add an edge from the START node to the `agent` node
    graph.add_edge(START, "agent")
    # Add a conditional edge from the `agent` node to the `tools` node
    graph.add_conditional_edges(
        "agent",
        route_tools,
        {"tools": "tools", END: END},
    )
    # Add an edge from the `tools` node to the `agent` node
    graph.add_edge("tools", "agent")
    # Initialize a MongoDB check_pointer
    client = MongoClient(MONGODB_URI)
    check_pointer = MongoDBSaver(client)
    # Instantiate the graph with the checkpointer
    return graph.compile(checkpointer=check_pointer)

エージェントを呼び出すメソッドの追加

def execute_graph(thread_id: str, user_input: str, app: CompiledStateGraph) -> None:
    config = {"configurable": {"thread_id": thread_id}}
    input = {
        "messages": [
            (
                "user",
                user_input,
            )
        ]
    }
    for output in app.stream(input, config):
        for key, value in output.items():
            print(f"Node {key}:")
            print(value)
    print("\n---FINAL ANSWER---")
    print(value["messages"][-1].content)

完成したシステムのテスト

企業向けのAIエージェントの動作を確認しましょう。今回のmain関数では、実際のシナリオを使用してシステムを実演します。

デモの実行

def main() -> None:
    app = get_compiled_graph()
    execute_graph("001", "What are the biggest challenges facing tech companies?", app)
    /* 技術系企業が直面している大きな課題は何ですか */
    execute_graph("001", "What earnings reports has a comment from Brett Iversen?", app)
    /* Brett Iversen氏のコメントがある収益報告書はどれですか */

サンプルデータの処理

このシステムでは、sample_data/Q4 Tech earnings-Demo/ディレクトリに含まれている以下の実際の財務ドキュメントを処理します。

Apple_analysis.docx
Tesla_analysis.docx
Microsoft_analysis.docx
Meta_analysis.docx
NVIDIA_analysis.docx

予想されるシステムの応答

「技術系企業が直面している大きな課題は何ですか」と質問すると、システムでは以下の処理が行われます。

エージェントによる推論: vector_searchツールの使用を決定する
ツールの実行: ドキュメントの埋め込みを検索する
結果の処理: 複数のレポートから関連するコンテンツを検出する
応答の生成: 包括的な回答を作成する

応答のサンプル:

---FINAL ANSWER---
The biggest challenges facing tech companies include:

1. **Shifting Preferences for Western Technology**: Companies like Apple 
face challenges related to changing consumer preferences, particularly 
in global markets.

2. **Complex Tech Stack Issues**: Companies such as Microsoft encounter 
challenges across every layer of their technology stack, impacting 
their operational efficiency and capacity for innovation.

3. **Scalability and Cost Efficiency for Inference at Scale**: NVIDIA 
and similar companies face challenges in providing the necessary 
throughput and maintaining cost efficiency to handle the increasing 
complexity of large-scale AI and data processing.

These challenges highlight the constantly evolving nature of the tech 
industry, requiring continuous adaptation and innovation to maintain 
competitiveness and growth.

---FINAL ANSWER---
Earnings reports from Microsoft include comments from Brett Iversen, 
who is the Vice President of Investor Relations.

(参考訳)

---最終的な回答---
技術系企業が直面している大きな課題は以下のとおりです。

1. **欧米の技術に対する嗜好の変化**: Appleのような企業は、特に世界市場において、
消費者の好みの変化に関連した課題に直面しています。

2. **複雑なテクノロジスタックの問題**: Microsoftのような企業は、テクノロジスタックのの
さまざまなレイヤーで課題に直面しており、業務効率やイノベーション能力に影響を受けています。

3. **大規模な推論のための拡張性とコスト効率**: NVIDIAや同様の企業では、ますます複雑に
なる大規模なAIやデータ処理に対応するために必要なスループットを提供し、コスト効率を
維持するという課題に直面しています。

上記の課題は、テクノロジ業界の絶えず進化する性質を浮き彫りにしており、競争力と成長を
維持するためには継続的な適応と革新が必要になります。

---最終的な回答---
投資家向け広報担当バイスプレジデントであるBrett Iversen氏のコメントは、Microsoftの
収益報告書に含まれています。

まとめ: 企業のAIへの取り組み

構造化されているかどうかにかかわらず、他のデータソースに対してこの仕組みを強化することで何ができるか想像してみてください。会社のどの分野の全体像も把握できるでしょう。

このシステムは、最新のAIによってどのように企業のドキュメント管理をストレージの問題からインテリジェンス資産に変えることができるかを示しています。ドキュメントはもはや単なるファイルではなく、インサイトの提供、質問への回答、意思決定の支援に利用できる、照会可能なナレッジベースです。

企業でAIを効果的に活用するためには、テクノロジだけでなく、これらのツールを組み合わせてセキュリティ、拡張性、ユーザーの信頼を維持しながら、実際のビジネスの問題を解決する方法を理解することが重要です。