Introduction to Pytheas

In this article we will present what is Pytheas and how you can access it.

Available here :

A tool from CorText : gathering data on YouTube

Pytheas is a simple web interface used to download data from Youtube. The intended goal for this project is to let user to easily gather data from youtube and empowering their usage on it. Exported data is in a JSON file with a format similar to Google Youtube API v3. You can then import data to CorText in order to go further in text analysis.

Tutorial of use will be explained later, but below a quick overview could look like this :

YouTube as a social media :  videos, comments and captions

As other social platform, YouTube gets a huge amount of data. His particularity come from it is video focused content sharing. A lot of metadata is produced and we assumed that its possible to start analysis by this way. YouTube follow a structured way to organize itself so we can retrieve it based on their documentation.M

Main resources for our interest as social scientist could be seen as this :

  • videos
  • comments
  • captions

Finally, we can retrieve data and metadata associated with :

  • a texutal search query
  • a custom curated videos list
  • a video list from channel
  • a video list from playlist

There is no doubt that analyzing video content is full of information, but the resources needed could be quickly huge. So we assume in this project that YouTube Data could be both a starting point to understand video content, but also discovering information on the use of this social media only permitted and complementary to a metadata approach and analysis on various datasets.

Using the Youtube Data API v3

In front of this we have an API provided by Youtube and supported by Google ecosystem named « Youtube Data API v3 » (v3 because there is a previous history from those data by API-style). This API is free and let you access data on YouTube structured by logic from their engineers and concepts.

It’s important to note that : this approach is not perfect mainly because we are depending of a technical structure and his supporting organizations, themselves related to a lot power issues and social askings. But a good starting point could be to use and decrypt this resource purposed by Google.

An api key from Google

Note that to access Pytheas you will need to follow those steps to activate your(s) access :

  1.  Go to the google console developers and activate “YouTube Data API v3” :

  2. Then you will want to “create new identifiers”. Take care that it will be linked to your own account, meaning Google let you access this data with a maximal rate by day

  3. Choose “api key” directly (other parameters will not be required for us)

  4. Finally, it will generate an “api key” : a following of characters that you can copy/paste directly in Pytheas


Next time we will present a use case of this interface with description of data from Youtube and what you can observe with it. On top of that you could always come to ask us about particular context.

Access Pytheas here :

Cortext team conferencing during LISIS annual meeting 2024