sadsafsfd

3. Scrape the data using LDAScraper

To automate the fetching and processing of LDA data, you can use LDAScraper. Below are installation and usage instructions.

pip install git+https://github.com/Ru-Pro/lda_scraper.git
cd

Import the scraper and provide the API key obtained from the LDA website:

from lda_scraper import LDAScraper
api_key = "your_api_key_here"

As an example, let's fetch all filings related to Russia. We select the filings endpoint and define filter parameters with the possible values that will return all relevant filings:

# For LD-1 (registrations) and LD-2 (quarterly activity reports) filings, including amendments
baseurl = 'https://lda.senate.gov/api/v1/filings/'

# The keys of the parameters are the fields in the database,
# and the values are lists of keywords to search for in those fields.
parameters = {
    'client_ppb_country': ['RU'],
    'affiliated_organization_country': ['RU'],
    'foreign_entity_country': ['RU'],
    'client_country': ['RU'],
    'filing_specific_lobbying_issues': ['russian',  # any of these keywords in the lobbying issues field
                                        'russia',
                                        'kremlin',
                                        'putin']
}

scraper = LDAScraper(
    baseurl=baseurl,
    api_key=api_key,
    parameters=parameters
)

scraper.scrape_all()   # fetch from API → saved as JSONL files

Parsing all the JSONL files into a single DataFrame, filling missing values.

scraper.parse_all()

# After parsing, the data is in scraper.disclosers_df
scraper.disclosers_df.head()

Saving the parsed DataFrame to CSV, JSONL or parquet files in processed_data_folder.

scraper.save_csv()
scraper.save_json()
scraper.save_parquet()