Metadata-Version: 2.1
Name: datar
Version: 0.5.2
Summary: Port of dplyr and other related R packages in python, using pipda.
Home-page: https://github.com/pwwang/datar
License: MIT
Author: pwwang
Author-email: pwwang@pwwang.com
Requires-Python: >=3.7.1,<4.0.0
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Requires-Dist: diot (>=0.1.1,<0.2.0)
Requires-Dist: pandas (>=1.2,<2.0)
Requires-Dist: pipda (>=0.4.5,<0.5.0)
Requires-Dist: python-slugify (>=5.0.2,<6.0.0)
Requires-Dist: scipy (>=1.6,<2.0)
Requires-Dist: wcwidth (>=0.2,<0.3)
Project-URL: Repository, https://github.com/pwwang/datar
Description-Content-Type: text/markdown

# datar

Port of [dplyr][2] and other related R packages in python, using [pipda][3].

<!-- badges -->
[![Pypi][6]][7] [![Github][8]][9] ![Building][10] [![Docs and API][11]][5] [![Codacy][12]][13] [![Codacy coverage][14]][13]

[Documentation][5] | [Reference Maps][15] | [Notebook Examples][16] | [API][17] | [Blog][18]

<img width="30%" style="margin: 10px 10px 10px 30px" align="right" src="logo.png">

Unlike other similar packages in python that just mimic the piping syntax, `datar` follows the API designs from the original packages as much as possible, and is tested thoroughly with the cases from the original packages. So that minimal effort is needed for those who are familar with those R packages to transition to python.


## Installtion

```shell
pip install -U datar
# to make sure dependencies to be up-to-date
# pip install -U varname pipda datar
```

`datar` requires python 3.7.1+ and is backended by `pandas (1.2+)`.

## Example usage

```python
from datar import f
from datar.dplyr import mutate, filter, if_else
from datar.tibble import tibble
# or
# from datar.all import f, mutate, filter, if_else, tibble

df = tibble(
    x=range(4),
    y=['zero', 'one', 'two', 'three']
)
df >> mutate(z=f.x)
"""# output
        x        y       z
  <int64> <object> <int64>
0       0     zero       0
1       1      one       1
2       2      two       2
3       3    three       3
"""

df >> mutate(z=if_else(f.x>1, 1, 0))
"""# output:
        x        y       z
  <int64> <object> <int64>
0       0     zero       0
1       1      one       0
2       2      two       1
3       3    three       1
"""

df >> filter(f.x>1)
"""# output:
        x        y
  <int64> <object>
0       2      two
1       3    three
"""

df >> mutate(z=if_else(f.x>1, 1, 0)) >> filter(f.z==1)
"""# output:
        x        y       z
  <int64> <object> <int64>
0       2      two       1
1       3    three       1
"""
```

```python
# works with plotnine
# example grabbed from https://github.com/has2k1/plydata
import numpy
from datar.base import sin, pi
from plotnine import ggplot, aes, geom_line, theme_classic

df = tibble(x=numpy.linspace(0, 2*pi, 500))
(df >>
  mutate(y=sin(f.x), sign=if_else(f.y>=0, "positive", "negative")) >>
  ggplot(aes(x='x', y='y')) +
  theme_classic() +
  geom_line(aes(color='sign'), size=1.2))
```

![example](./example.png)

```python
# easy to integrate with other libraries
# for example: klib
import klib
from pipda import register_verb
from datar.datasets import iris
from datar.dplyr import pull

dist_plot = register_verb(func=klib.dist_plot)
iris >> pull(f.Sepal_Length) >> dist_plot()
```

![example](./example2.png)

## CLI interface

See [datar-cli][19]

Example:
```shell
❯ datar import table2 | datar head
       country    year        type      count
      <object> <int64>    <object>    <int64>
0  Afghanistan    1999       cases        745
1  Afghanistan    1999  population   19987071
2  Afghanistan    2000       cases       2666
3  Afghanistan    2000  population   20595360
4       Brazil    1999       cases      37737
5       Brazil    1999  population  172006362
```

```shell
❯ datar import table2 | \
    datar mutate --count "if_else(f.year==1999, f.count*2, f.count)"
        country    year        type       count
       <object> <int64>    <object>     <int64>
0   Afghanistan    1999       cases        1490
1   Afghanistan    1999  population    39974142
2   Afghanistan    2000       cases        2666
3   Afghanistan    2000  population    20595360
4        Brazil    1999       cases       75474
5        Brazil    1999  population   344012724
6        Brazil    2000       cases       80488
7        Brazil    2000  population   174504898
8         China    1999       cases      424516
9         China    1999  population  2545830544
10        China    2000       cases      213766
11        China    2000  population  1280428583
```

[1]: https://tidyr.tidyverse.org/index.html
[2]: https://dplyr.tidyverse.org/index.html
[3]: https://github.com/pwwang/pipda
[4]: https://tibble.tidyverse.org/index.html
[5]: https://pwwang.github.io/datar/
[6]: https://img.shields.io/pypi/v/datar?style=flat-square
[7]: https://pypi.org/project/datar/
[8]: https://img.shields.io/github/v/tag/pwwang/datar?style=flat-square
[9]: https://github.com/pwwang/datar
[10]: https://img.shields.io/github/workflow/status/pwwang/datar/Build%20and%20Deploy?style=flat-square
[11]: https://img.shields.io/github/workflow/status/pwwang/datar/Build%20Docs?label=Docs&style=flat-square
[12]: https://img.shields.io/codacy/grade/3d9bdff4d7a34bdfb9cd9e254184cb35?style=flat-square
[13]: https://app.codacy.com/gh/pwwang/datar
[14]: https://img.shields.io/codacy/coverage/3d9bdff4d7a34bdfb9cd9e254184cb35?style=flat-square
[15]: https://pwwang.github.io/datar/reference-maps/ALL/
[16]: https://pwwang.github.io/datar/notebooks/across/
[17]: https://pwwang.github.io/datar/api/datar/
[18]: https://pwwang.github.io/datar-blog
[19]: https://github.com/pwwang/datar-cli

