Metadata-Version: 2.1
Name: datup
Version: 0.0.2
Summary: The version of this library and document is V 0.0.2
Home-page: https://datup.ai
Author: Cristhian Plazas Ortega
Author-email: cristhianpo@datup.ai
License: UNKNOWN
Description: # datup                                                   
        
        
        The version of this library and document is V 0.0.2
        This library has 3 methods and 1 Class
        
        ## How is it work?
            import datup as dt
        
        ## To Instance the Class
            job = dt.DataIO("aws_acces_key_id","aws_secret_access_key","datalake")
        
        ## Can I test my updates?
        Yes, there is a file called _testing.ipynb where you can test your changes. The variables, 
        must be initialized always for modularity.
        
        ## Class DataIO:
            A group of methods for I/O data from a AWS S3 Datalake
            
            Parameters
            ----------
            aws_acces_key_id : str
                The class must be intialized with aws_acces_key_id
            aws_secret_access_key : str
                The class must be intialized with aws_secret_access_key
            datalake : str
                The class must be intialized with the name of the datalake
            prefix_s3 : str, default "s3://"
                The s3 prefix used to denote a s3 address
            local_path : str, default "/tmp/"
                The local path used by boto3 for upload from Temp folder to S3 bucket.
                
            
            Methdos
            -------
            download_csv(self,stage=None,filename=None,datecols=False,sep=",",encoding="ISO-8859-1",infer_datetime_format=True,low_memory=False,indexcol=None,ts_csv=False,freq=None)
                
                    Return a dataframe downloaded from a specified datalake
            
                    This function takes the aws credentials from DataIO class and use it for download 
                    the required data.
            
                    Parameters
                    ----------
                    stage : str, default None
                        It is the set of folders after the datalake to the file that is required to download 
                    filename : str, default None
                        Is the name of the filename to download without csv suffix
                    datecols: bool or list of int or names or list of lists or dict, default False
                        Took it from Pandas read_csv parse_dates description
                        The behavior is as follows:
                            * boolean. If True -> try parsing the index.
                            * list of int or names. e.g. If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column.
                            * list of lists. e.g. If [[1, 3]] -> combine columns 1 and 3 and parse as a single date column.
                            * dict, e.g. {‘foo’ : [1, 3]} -> parse columns 1, 3 as date and call result ‘foo’
                            If a column or index cannot be represented as an array of datetimes, say because of an unparseable value
                            or a mixture of timezones, the column or index will be returned unaltered as an object data type. For
                            non-standard datetime parsing, use pd.to_datetime after pd.read_csv. To parse an index or column with
                            a mixture of timezones, specify date_parser to be a partially-applied pandas.to_datetime() with
                            utc=True. See Parsing a CSV with mixed timezones for more.
                            Note: A fast-path exists for iso8601-formatted dates.
                    sep: str, default ‘,’
                        Took it from Pandas read_csv sep description
                        Delimiter to use. If sep is None, the C engine cannot automatically detect the separator, but the Python
                        parsing engine can, meaning the latter will be used and automatically detect the separator by Python’s
                        builtin sniffer tool, csv.Sniffer. In addition, separators longer than 1 character and different from '\s+'
                        will be interpreted as regular expressions and will also force the use of the Python parsing engine. Note
                        that regex delimiters are prone to ignoring quoted data. Regex example: '\r\t'.
                    encoding: str, default ‘ISO-8859-1’
                        Took it from Pandas read_csv encoding description
                        Encoding to use for UTF when reading/writing (ex. ‘utf-8’).
                    infer_datetime_format: bool, default True
                        Took it from Pandas read_csv infer_datetime_format description
                        If True and parse_dates is enabled, pandas will attempt to infer the format of the datetime strings in the
                        columns, and if it can be inferred, switch to a faster method of parsing them. In some cases this can
                        increase the parsing speed by 5-10x.
                    low_memory: bool, default True
                        Took it from Pandas read_csv low_memory description
                        Internally process the file in chunks, resulting in lower memory use while parsing, but possibly mixed
                        type inference. To ensure no mixed types either set False, or specify the type with the dtype parameter.
                        Note that the entire file is read into a single DataFrame regardless, use the chunksize or iterator
                        parameter to return the data in chunks. (Only valid with C parser).
                    indexcol : int, str, sequence of int / str, or False, default None
                        Took it from Pandas read_csv index_col description
                        Column(s) to use as the row labels of the DataFrame, either given as string name or column index. If a
                        sequence of int / str is given, a MultiIndex is used. 
                        Note: index_col=False can be used to force pandas to not use the first column as the index, e.g. when
                        you have a malformed file with delimiters at the end of each line.
                    ts_csv : bool, default False
                        If ts_csv is True then activates the ts_csv downloaded. If True, indexcol and freq, both must
                        be different to None
                    freq : str, default W-MON
                        Is the frequency of getting data
                        
                    Returns
                    -------
                    DataFrame
                        A DataFrame is return as two-dimensional data structure
            
                    Examples
                    --------
                    >>> object.download_csv(stage='stage',filename='filename')  # doctest: +SKIP
            
            download_csvm(self,uris)
        
                Return a set of dataframes downloaded from a specified datalake in a list
        
                This function takes the aws credentials from DataIO class and use it for download 
                the required data through download_csv method.
        
                Parameters
                ----------
                uris : dict
                    Is the dictionary with all parameters necessary for the download_csv method
                
                Returns
                -------
                List of DataFrames
                    A list of DataFrames are returned
                List of DataFrames Names
                    A list of DataFrames Names are return in string type
                
                Examples
                --------
                >>> uris = {
                        "uri_1":{
                            "stage":"stage",
                            "filename":"filename",
                            "datecols":False,
                            "sep":";",
                            "encoding":"ISO-8859-1"
                        },
                        "uri_2":{
                            "stage":"stage",
                            "filename":"filename",
                            "datecols":False,
                            "sep":";",
                            "encoding":"ISO-8859-1"
                        }
                    }
                >>> df, df_names =object.download_csvm(uris=uris)  # doctest: +SKIP
        
            upload_csv(self,df,stage=None,filename=None,index=False,header=True,date_format="%Y-%m-%d",ts_csv=False)
        
                Return a uri where the dataframes was uploaded in csv format 
        
                This function takes the aws credentials from DataIO class and use it for upload
                the required data.
        
                Parameters
                ----------
                df : DataFrame
                    Is the DataFrame for uploading to S3 datalake
                stage : str, default None
                    It is the set of folders after the datalake to the file that is required to upload 
                filename : str, default None
                    Is the name of the filename to upload without csv suffix
                index : bool, default False
                    Took it from Pandas to_csv index description
                    Write row names (index).
                header : bool or list of str, default True
                    Took it from Pandas to_csv index description
                    Write out the column names. If a list of strings is given it is assumed to be aliases for
                    the column names.
                date_format : str, default %Y-%m-%d
                    Took it from Pandas to_csv index description
                    Format string for datetime objects.
                ts_csv : bool, default False
                    If ts_csv is True then activates the ts_csv upload. If True, index must be different to False
                
                Returns
                -------
                Str
                    The uri where DataFrame was uploaded into S3
                        
                Examples
                --------
                >>> object.upload_csv(df,stage="stage",filename="filename")  # doctest: +SKIP
            
                    
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown
