Welcome to MyGene.py’s documentation!¶
MyGene.Info provides simple-to-use REST web services to query/retrieve gene annotation data. It’s designed with simplicity and performance emphasized. mygene, is an easy-to-use Python wrapper to access MyGene.Info services.
Optional dependencies¶
Installation¶
- Option 1
- pip install mygene
- Option 2
download/extract the source code and run:
python setup.py install- Option 3
install the latest code directly from the repository:
pip install -e hg+https://bitbucket.org/newgene/mygene#egg=mygene
Version history¶
API¶
- mygene.alwayslist(value)[source]¶
If input value if not a list/tuple type, return it as a single value list.
Example:
>>> x = 'abc' >>> for xx in alwayslist(x): ... print xx >>> x = ['abc', 'def'] >>> for xx in alwayslist(x): ... print xx
- class mygene.MyGeneInfo(url='http://mygene.info/v2')[source]¶
This is the client for MyGene.info web services. Example:
>>> mg = MyGeneInfo()
- findgenes(id_li, **kwargs)[source]¶
Deprecated since version 2.0.0.
Use querymany() instead. It’s kept here as an alias of querymany() method.
- getgene(geneid, fields='symbol, name, taxid, entrezgene', **kwargs)[source]¶
Return the gene object for the give geneid. This is a wrapper for GET query of “/gene/<geneid>” service.
Parameters: - geneid – entrez/ensembl gene id, entrez gene id can be either a string or integer
- fields – fields to return, a list or a comma-separated string. If fields=”all”, all available fields are returned
- species – optionally, you can pass comma-separated species names or taxonomy ids
- email – optionally, pass your email to help us to track usage
- filter – alias for fields parameter
Returns: a gene object as a dictionary
Ref: http://mygene.info/doc/annotation_service.html for available fields, extra kwargs and more.
Example:
>>> mg.getgene(1017, email='abc@example.com') >>> mg.getgene('1017', fields='symbol,name,entrezgene,refseq') >>> mg.getgene('1017', fields='symbol,name,entrezgene,refseq.rna') >>> mg.getgene('1017', fields=['symbol', 'name', 'pathway.kegg']) >>> mg.getgene('ENSG00000123374', fields='all')
Hint
The supported field names passed to fields parameter can be found from any full gene object (when fields=”all”). Note that field name supports dot notation for nested data structure as well, e.g. you can pass “refseq.rna” or “pathway.kegg”.
- getgenes(geneids, fields='symbol, name, taxid, entrezgene', **kwargs)[source]¶
Return the list of gene objects for the given list of geneids. This is a wrapper for POST query of “/gene” service.
Parameters: - geneids – a list or comm-sep entrez/ensembl gene ids
- fields – fields to return, a list or a comma-separated string. If fields=”all”, all available fields are returned
- species – optionally, you can pass comma-separated species names or taxonomy ids
- email – optionally, pass your email to help us to track usage
- filter – alias for fields
- as_dataframe – if True, return object as DataFrame (requires Pandas).
- df_index – if True (default), index returned DataFrame by ‘query’, otherwise, index by number. Only applicable if as_dataframe=True.
Returns: a list of gene objects or a pandas DataFrame object (when as_dataframe is True)
Ref: http://mygene.info/doc/annotation_service.html for available fields, extra kwargs and more.
Example:
>>> mg.getgenes([1017, '1018','ENSG00000148795'], email='abc@example.com') >>> mg.getgenes([1017, '1018','ENSG00000148795'], fields="entrezgene,uniprot") >>> mg.getgenes([1017, '1018','ENSG00000148795'], fields="all") >>> mg.getgenes([1017, '1018','ENSG00000148795'], as_dataframe=True)
Hint
A large list of more than 1000 input ids will be sent to the backend web service in batches (1000 at a time), and then the results will be concatenated together. So, from the user-end, it’s exactly the same as passing a shorter list. You don’t need to worry about saturating our backend servers.
- query(q, **kwargs)[source]¶
Return the query result. This is a wrapper for GET query of “/query?q=<query>” service.
Parameters: - q – a query string, detailed query syntax here
- fields – fields to return, a list or a comma-separated string. If fields=”all”, all available fields are returned
- species – optionally, you can pass comma-separated species names or taxonomy ids. Default: human,mouse,rat.
- size – the maximum number of results to return (with a cap of 1000 at the moment). Default: 10.
- skip – the number of results to skip. Default: 0.
- sort – Prefix with “-” for descending order, otherwise in ascending order. Default: sort by matching scores in decending order.
- entrezonly – if True, return only matching entrez genes, otherwise, including matching Ensemble-only genes (those have no matching entrez genes).
- email – optionally, pass your email to help us to track usage
- as_dataframe – if True, return object as DataFrame (requires Pandas).
- df_index – if True (default), index returned DataFrame by ‘query’, otherwise, index by number. Only applicable if as_dataframe=True.
Returns: a dictionary with returned gene hits or a pandas DataFrame object (when as_dataframe is True)
Ref: http://mygene.info/doc/query_service.html for available fields, extra kwargs and more.
Example:
>>> mg.query('cdk2') >>> mg.query('reporter:1000_at') >>> mg.query('symbol:cdk2', species='human') >>> mg.query('symbol:cdk*', species=10090, size=5, as_dataframe=True) >>> mg.query('q=chrX:151073054-151383976', species=9606)
- querymany(qterms, scopes=None, **kwargs)[source]¶
Return the batch query result. This is a wrapper for POST query of “/query” service.
Parameters: - qterms – a list of query terms, or a string of comma-separated query terms.
- scopes – type of types of identifiers, either a list or a comma-separated fields to specify type of input qterms, e.g. “entrezgene”, “entrezgene,symbol”, [“ensemblgene”, “symbol”] refer to “http://mygene.info/doc/query_service.html#available_fields” for full list of fields.
- fields – fields to return, a list or a comma-separated string. If fields=”all”, all available fields are returned
- species – optionally, you can pass comma-separated species names or taxonomy ids. Default: human,mouse,rat.
- entrezonly – if True, return only matching entrez genes, otherwise, including matching Ensemble-only genes (those have no matching entrez genes).
- returnall – if True, return a dict of all related data, including dup. and missing qterms
- verbose – if True (default), print out infomation about dup and missing qterms
- email – optionally, pass your email to help us to track usage
- as_dataframe – if True, return object as DataFrame (requires Pandas).
- df_index – if True (default), index returned DataFrame by ‘query’, otherwise, index by number. Only applicable if as_dataframe=True.
Returns: a list of gene objects or a pandas DataFrame object (when as_dataframe is True)
Ref: http://mygene.info/doc/query_service.html for available fields, extra kwargs and more.
Example:
>>> mg.querymany(['DDX26B', 'CCDC83'], scopes='symbol', species=9606) >>> mg.querymany(['1255_g_at', '1294_at', '1316_at', '1320_at'], scopes='reporter') >>> mg.querymany(['NM_003466', 'CDK2', 695, '1320_at', 'Q08345'], ... scopes='refseq,symbol,entrezgene,reporter,uniprot', species='human') >>> mg.querymany(['1255_g_at', '1294_at', '1316_at', '1320_at'], scopes='reporter', ... fields='ensembl.gene,symbol', as_dataframe=True)
Hint
querymany() is perfect for doing id mappings.
Hint
Just like getgenes(), passing a large list of ids (>1000) to querymany() is perfectly fine.