API

google_translate.GoogleTranslator

google_translate.UserAgentSelector

google_translate.ProxySelector

google_translate.cache.Cache


Note

Valid output types are: text, dict, json

Note

Source and destination language can be either in the short format (e.g. “en”)

or in long format (e.g. “english”). Source language can also take the ‘auto’ value.

Info dictionary structure example:

{
    "original_text": "word",
    "translation": "translated-word",
    "romanization": "romanized-word",
    "has_typo": False,
    "src_lang": "en",
    "extra": {
        "verbs": {
            "verb1": ["trans1"],
            "verb2": ["trans1", "trans2", "trans3", "trans4"]
        },
        "nouns": {
            "nouns1": ["trans1", "trans2", "trans3"]
        },
        "adjectives": {
            "adjective1": ["trans1", "trans2"]
        },
        "adverbs": {
            "adverb1": ["trans1"],
            "adverb2": ["trans2", "trans3"]
        },
        "prepositions": {
            "preposition1": ["trans1"]
        }
    },
    "match": 1.0
}

Main classes

class google_translate.GoogleTranslator(proxy_selector=None, ua_selector=None, simulate=False, https=True, timeout=10.0, retries=5, wait_time=1.0, random_wait=False, encoding=u'UTF-8')

Uses the Google translate API to provide different functionalities.

GoogleTranslator currently provides four different functionalities.

  • word translation
  • source language detection
  • romanization
  • typo detection
Parameters:
  • proxy_selector (ProxySelector) – Object used to pick a proxy.
  • ua_selector (UserAgentSelector) – Object used to pick a user-agent.
  • simulate (boolean) – When True no real requests will be sent.
  • https (boolean) – Enable-disable HTTPS.
  • timeout (float) – Socket timeout in seconds.
  • retries (int) – Maximum attempts number before giving up.
  • wait_time (float) – Time in seconds to wait between requests.
  • random_wait (boolean) – When True GoogleTranslator will wait a random amount of seconds instead of using the wait_time.
  • encoding (string) – Encoding to use during data encode-decode.
add_header(header)

Add HTTP header to user specific headers.

Parameters:header (tuple) – Tuple that contains two values the header name and the header value (e.g. (‘Host’, ‘test.com’)).

Note

User specific headers overwrite the default headers.

Raises:ValueError
detect(word, output=u'text')

Detect the source language of the given word(s).

Parameters:
  • word (string - list<string>) – Word(s) to process.
  • output (string) – Output return type. See Output Types for a list of valid output types (default: text).
Returns:

String with the source language name-code.

Raises:

ValueError

get_info_dict(word, dst_lang, src_lang=u'auto', output=u'text')

Returns the information dictionary for the given word.

For a list with available dictionary keys, see Info Dict.

Parameters:
  • word (string - list<string>) – Word(s) to process.
  • dst_lang (string) – Destination language to use. See Language Format for valid language formats.
  • src_lang (string) – Source language to use (default: auto).
  • output (string) – Output return type. See Output Types for a list of valid output types (default: text).
Raises:

ValueError

romanize(word, src_lang=u'auto', output=u'text')

Get the romanization of the given word(s).

Parameters:
  • word (string - list<string>) – Word(s) to process.
  • src_lang (string) – Source language of the given word(s). See Language Format for valid language formats (default: auto).
  • output (string) – Output return type. See Output Types for a list of valid output types (default: text).
Returns:

String with the romanized word.

Raises:

ValueError

translate(word, dst_lang, src_lang=u'auto', additional=False, output=u'text')

Translate the given word(s) from src_lang to dst_lang.

Parameters:
  • word (string - list<string) – Word(s) to translate.
  • dst_lang (string) – Language to translate the given word(s). See Language Format for valid language formats.
  • src_lang (string) – Source language of the given word(s) (default: auto).
  • additional (boolean) – When True translate will return additional translations.
  • output (string) – Output return type. See Output Types for a list of valid output types (default: text).
Returns:

If additional is True.

Dictionary with the additional translations.

If additional is False.

String with the translated word.

Raises:

ValueError

word_exists(word, lang=u'en', output=u'text')

Check if the given word(s) exist in language.

Parameters:
  • word (string - list<string>) – Word(s) to check.
  • lang (string) – Language to check. See Language Format for valid language formats (default: en).
  • output (string) – Output return type. See Output Types for a list of valid output types (default: text).
Returns:

True if the word exists else False.

Raises:

ValueError

Other classes

class google_translate.UserAgentSelector(user_agent=None, user_agent_file=None, http_mode=False, single_ua=False)

Select user-agent base on some criterias.

UserAgentSelector supports three basic modes. Single user-agent given by the user, multiple user-agents from a file (one per line) and multiple user-agents loaded from the handy list of ‘techblog.willhouse.com’[1]. You can also enable all the modes together or different combinations of them. Note that the user defined ua always overwrites the other user-agents (from file or HTTP). By default when multiple user-agents are used the get_useragent() method returns a user-agent randomly, but if you set single_ua to True then UserAgentSelector will pick a user-agent during the initialization and stick with it.

Examples

Use single user-agent:

>>> ua_selector = UserAgentSelector("Mozilla9000")

>>> ua_selector.get_useragent()

Use multiple user-agents from HTTP:

>>> ua_selector = UserAgentSelector(http_mode=True)

>>> ua_selector.get_useragent()
Parameters:
  • user_agent (string) – User defined user-agent.
  • user_agent_file (string) – Absolute path to file that contains multiple user-agent entries, one per line.
  • http_mode (boolean) – When True UserAgentSelector will try to retrieve a list of user-agents from the place that HTTP_URL points to.
  • single_ua (boolean) – When True UserAgentSelector will pick a single ua and stick with it, even when multiple user-agents are defined (from file or HTTP)
get_useragent()

Returns a user-agent to the user.

class google_translate.ProxySelector(proxy=None, proxy_file=None, prevent_fallback=False, random_selection=False)

Select proxy base on some criterias.

ProxySelector supports two proxy modes. Single proxy given by the user or multiple proxies from a file. You can also use both of them in which case the user defined proxy overwrites the proxies from the file. ProxySelector can choose multiple proxies either by using the given sequence or by picking one randomly. ProxySelector can detect duplicate entries & invalid ip addresses (currently only IPv4 is supported). Finally the user has the ability to remove non working proxies.

Examples

Use single proxy (not very handy):

>>> proxy_selector = ProxySelector("127.0.0.1:8080")

>>> proxy = proxy_selector.get_proxy()

Use multiple proxies from file:

>>> proxy_selector = ProxySelector(proxy_file="my_proxies")

>>> proxy = proxy_selector.get_proxy()
Parameters:
  • proxy (string) – User defined proxy.
  • proxy_file (string) – Absolute path to file that contains multiple proxy entries, one per line.
  • prevent_fallback (boolean) – When True ProxySelector will always return a proxy even if the proxy does not work (good to avoid making a request without one).
  • random_selection (boolean) – When True ProxySelector will pick a proxy randomly unlike the normal sequence mode.
get_proxy()

Returns a proxy back to the user.

static is_valid_proxy(proxy)

Static method to check if the given proxy is valid.

remove_proxy(proxy)

Removes the given proxy from the ProxySelector.

Parameters:proxy (string) – Proxy string in the format “ip:port”.
Returns:True on success else False.
class google_translate.cache.Cache(max_size, valid_period)

Store objects for a period of time.

Cache like object that uses a dictionary to store multiple objects for a period of time. This cache can also be stored in a json file for later use.

Examples

Simple use case:

>>> from google_translate.cache import Cache

>>> cache = Cache(100, 3600.0) # Store items for 1 hour
>>> cache.add('key', 'value')
>>> value = cache.get('key')

>>> cache.store('mycache')     # Store our cache in 'mycache' file

>>> new_cache = Cache(200, 300.0)
>>> new_cache.load('mycache')  # Load items from 'mycache' file
>>> new_cache.remove_old()     # Remove all the old items
Parameters:
  • max_size (int) – Maximum number of items that the cache can store. The cache automatically removes the oldest item when it reaches the max_size.
  • valid_period (float) – Time in seconds that the cache items are valid. This value can be changed after the object initialization.
add(key, obj)

Add new item to the cache.

Adds the key-obj combination to the cache. If the cache reaches the max_size then the oldest item is automatically removed.

Parameters:
  • key (hashable type) – Key under which the obj will be stored.
  • obj (object) – Object to store in the cache.
get(key)

Get item from the cache.

Returns the item corresponding to the given key if the given key exists else None. Note that if the timestamp of the item is old then it will return None even if the key exists.

Parameters:key (hashable type) – Key to search for.
get_oldest()

Returns the oldest key in the cache if one exists else None.

has(key)

Returns True if the key is in the cache else False.

has_space()

Returns True if the cache has not reached the max_size else False.

items()

Returns list with (key, [obj, timestamp]) pairs.

load(filename)

Load the cache content from the given filename.

Returns:True on success else False.

Note

The cache stops adding new items when it reaches the max_size.

remove_old()

Remove old items from the cache.

Removes all the items with invalid timestamp from the cache and returns the number of the items removed.

store(filename)

Store the cache to the given filename.

Returns:True on success else False.

Footnotes

[1]<https://techblog.willshouse.com/2012/01/03/most-common-user-agents/>