Tutorials

How to use tg_json_parser

Overview

This module parses JSON files exported from Telegram Desktop and converts chat messages into a structured format for analysis. Zero external dependencies - uses only Python standard library.

Getting Telegram chat data

1.Open the channel you want to download, click on the three dots in the top right corner of the chat
2.Choose export chat history
3.Change the format from HTML to machine-readable JSON

Installation

Prerequisites


  • Python 3.8 or higher

  • Git (for cloning the repository)


Cloning the repository

Overview

This module parses JSON files exported from Telegram Desktop and converts chat messages into a structured format for analysis. Zero external dependencies - uses only Python standard library.

Getting Telegram chat data

1.Open the channel you want to download, click on the three dots in the top right corner of the chat
2.Choose export chat history
3.Change the format from HTML to machine-readable JSON

Installation

Prerequisites


  • Python 3.8 or higher

  • Git (for cloning the repository)


Cloning the repository

git clone https://github.com/our_new_public_git/tg_json_parser.git #TODO change it to the real git 
git clone https://github.com/our_new_public_git/tg_json_parser.git #TODO change it to the real git 


No dependencies needed

Usage

Python Library

The TgJsonParser class processes Telegram JSON exports and converts them into structured CSV files suitable for analysis.

1. Loading JSON Data

from tg_json_parser import TgJsonParser

from tg_json_parser import TgJsonParser
<h1>Initialize the parser</h1>
<p>parser = TgJsonParser()</p>
<h1>Load JSON export file (replace with your file path)</h1>
from tg_json_parser import TgJsonParser
<h1>Initialize the parser</h1>
<p>parser = TgJsonParser()</p>
<h1>Load JSON export file (replace with your file path)</h1>


The load_json() method reads and validates the Telegram export file. The file must have a .json extension and contain valid JSON data.

2. Parsing Messages

# Extract only content messages
parser.extract_messages()
<h1>Access the structured data</h1>
# Extract only content messages
parser.extract_messages()
<h1>Access the structured data</h1>


The method .extract_messages() stores and processes the raw JSON data, extracts chat name and converts all messages to structured dictionaries.


  • parser.messages - raw data as a list of dictionaries;

  • parser.content_messages: Regular user messages converted to dictionaries with standardized keys like msg_id, posting_time, sender, media_type, content, etc;

  • parser.member_actions: Service messages (member joins, leaves, and other chat events) organized as structured data

print(f"Raw message: {parser.messages[0]}")
print(f"Content message: {parser.content_messages[0]}")
print(f"Raw message: {parser.messages[0]}")
print(f"Content message: {parser.content_messages[0]}")


For more convenient analysis, the data can be converted to DataFrame format. In that case, pandas should be installed and imported.

import pandas as pd
import pandas as pd


3. Saving to CSV

# Save to CSV files in specified directory
parcer.save_chat(output_path='data/processed/Sample_Chat', save_actions=True)
# Save to CSV files in specified directory
parcer.save_chat(output_path='data/processed/Sample_Chat', save_actions=True)

This creates CSV files in the output directory:


  • content_messages.csv - All regular messages

  • member_actions.csv - Service messages (if save_actions=True was used)

Complete Example

from tg_json_parser import TgJsonParser
<h1>Initialize and process</h1>
from tg_json_parser import TgJsonParser
<h1>Initialize and process</h1>


Command Line Interface

For quick processing without writing Python code:

python tg_json_parser.py input.json output_directory [--save_actions]
python tg_json_parser.py input.json output_directory [--save_actions]

Arguments:


  • input.json - Path to Telegram JSON export file

  • output_directory - Directory where CSV files will be saved

  • --save_actions - Optional flag to include service messages


Examples:

# Basic export
python tg_json_parser.py data/raw/Sample_ChatExport/result.json data/processed/Sample_Chat
# Basic export
python tg_json_parser.py data/raw/Sample_ChatExport/result.json data/processed/Sample_Chat

Output Structure

The parser creates CSV files with the following structure:

content_messages.csv:


  • msg_id - Unique message identifier

  • sender - Message sender name

  • sender_id - Sender's unique ID

  • posting_time - When message was sent

  • edited - When message was last edited

  • media_type - Type of content (text, photo, video_file, etc.)

  • content - Message text or media caption

  • file - Path to media file (if any)

  • reply_to_msg_id - ID of replied message

  • forwarded_from - Original sender for forwarded messages

  • reactions - Message reactions data

  • chat_name - Name of the chat/channel


member_actions.csv (when save_actions=True):


  • Service messages like member joins, leaves, and other chat events

CONTACT US
 INSTITUTE FOR EUROPEAN, RUSSIAN AND EURASIAN STUDIES 1957 E St NW Washington, DC 20052

1957 E St., NW, Suite 412,
Washington, DC 20052

russiaprogram@gwu.edu
+1 (202) 9946340

CONTACT US
 INSTITUTE FOR EUROPEAN, RUSSIAN AND EURASIAN STUDIES 1957 E St NW Washington, DC 20052

1957 E St., NW, Suite 412,
Washington, DC 20052

russiaprogram@gwu.edu
+1 (202) 9946340