Predicting Bitcoin price with BigQuery ML

In the previous blog I described how to fetch a bitcoin data set and prepare it for AutoML Tables to create a (highly accurate) machine learning model to predict Bitcoin price.

This blog post describes a complementary (or alternative) approach – how to use BigQuery ML to create a (simpler) regression model using a SQL like syntax. The advantage of this approach is that training time is very rapid and that it is very easy to use for batch-based predictions also using SQL like syntax. The blog assumes that you have a table with data in BigQuery (see previous blog post for how to do that)

  1. Test the query that is the core part of the model – training data

Most of the data needed is already in the table, but in addition need to create the label to predict using LEAD() method and since BigQuery ML requires non-NULL data (e.g. not having NaN values) that is set to 0 for the label with IFNULL.

Figure 1 – BigQuery SQL query generating training data

2. Generate the regression model for predicting Bitcoin price

3. Evaluate model – get predicted_label as new value

Conclusion

Have shown how to use BigQuery ML regression on a BitCoin dataset to predict Bitcoin price, given how easy this is to use even at large scale (e.g. several hundred billion rows) this can be good start when doing predictions in on tabular data.

Best regards,

Amund Tveit

Continue Reading

Predicting Bitcoin Price with AutoML Tables

Applying Artificial Intelligence (AI) frequently require surprising amount of (tedious) manual work. Tools for automation can make AI available to more people and to rapidly solve many more important challenges. This blog posts tests such a tool – AutoML Tables.

Figure 1 – Bitcoin price prediction – is it going up, down or sideways?

This blog post generates a data set from an API and applies automated AI – AutoML Tables for regression to predict numbers – in this case Bitcoin closing price next hour based on data from the current hour.

1. Introduction

AutoML Tables – can used on tabular data (e.g. from databases or spreadsheets) for either: classification (e.g. classify whether it is an Valyrian steel Sword or not – as shown in Figure 2a) or regression (predict a particular number, e.g. reach of Scorpion canon aiming at dragons as shown in the top figure 2b)

Figure 2a- Classification Example – Is Aria’s Needle Sword of Valyrian Steel?
Figure 2b – Regression Example – reach of Euron’s arrow

2. Choosing and Building Dataset

However, I didn’t find data sets that I could use for Valerian steel classification or Scorpion arrow reach regression (let me know if such data exists), but instead found a free API to get bitcoin related data over time instead and since I assume Bitcoin is completely unrelated to Valyrian steel and Scorpion (however, I might be wrong about that given that Valyrian steel furnaces might compete with Bitcoin about energy – perhaps a potential confounding variable to explain a potential relationship between prices of Valyrian swords and Bitcoin?) .

Scientific selection of Bitcoin data API: Since I am not an expert in cryptocurrency I just searched for free bitcoin api (or something in that direction) and found/selected cryptocompare.com

2.1 Python code to fetch and prepare API data

Materials & Methods

I used a colab (colab.research.google.com) to fetch and prepare API data, in combination with AutoML web UI and a few Google Cloud command line commands (gsutil and gcloud methods). Also used Bigquery for storing results and AutoML stored some output related to evaluation in Bigquery

Imports and authentication (Google Cloud)

Method to fetch Bitcoin related trade data

Method to fetch Bitcoin related social & activity data

(code duplication isn’t wrong – or is it? – leave refactoring of this and previous method as an exercise for the reader)

Method to combine the 2 types of API data

Method for fetching and preprocessing data from API

2.2 Python Code to prepare Bitcoin data for BigQuery and AutoML Tables

Actually fetch some results (16000 hours = 1.82 years)

Write as 1 json per line to a file

Set active Google Cloud project (! in front in colab means shell command line command)

Creating a Google Cloud storage bucket to store data

Create a Bigquery schema based on the API data fetched

Note: bigquery-schema-generator was a nice tool, but had to change INTEGER to FLOAT in the generated schema in addition to prepare data (ref perl oneliner)

Generate (or fetch existing) Bigquery data set & create Bigquery Table

Note: I used the project id ‘predicting’, replace with your – ref: bq command further down.

Load API data into (new) Bigquery Table

Check that the table exists and query it

Figure 3 – output from select query towards Bitcoin data in Bigquery

We have input (x) features, but not a feature (y) to predict(!)

Create a column to predict can be done by creating a new column that is time shifted, e.g. for a time t=0 there is a particular row that require a t=1 feature to train – the feature we want to predict is the Bitcoin close price next hour (e.g. not exactly quant/high-frequency trading – but a more soothing once-per-hour experience, if it works out ok it can be automated – for the risk taking?). This can be generated either in Bigquery with select and LEAD() method or with a Python Pandas Dataframe shift – showing both approached underneath.

Prepare final data with NEXTCLOSE column (as csv) for AutoML and copy to Google Cloud bucket

3. AutoML prediction

Now the data is ready for AutoML (Note that the step with Bigquery could have been avoided in this case, but could also be another direction since AutoML can import directly from Bigquery). Underneath you can see an example of a created dataset in AutoML Console.

Figure 4 – AutoML Console – with an example data set named bitcoindata

Creating a new dataset

Figure 5 – create new AutoML Tables dataset

Importing data from Google Cloud Bucket

Figure 6 – Import data to AutoML from Google Cloud Storage
Figure 7 – Importing data

Set target variable (NEXTCLOSE) and look at statistics of features

Figure 8 – select target column and data split (train/validation/testing)
Figure 9 – inspect correlation with target variable

Train Model with AutoML

Figure 10 – select budget for resources to generate model

Look at core metrics regarding accuracy

Figure 11 – Metrics from training (MAE, RMSE,R^2, MAPE) and important features

Deploy and Use the AutoML model – Batch Mode

Figure 12 – Batch based prediction
Figure 13 – online prediction

Conclusion

Have shown an example of using AutoML – the main part was about getting data from an API and preparing to use it (section 2), and using it in AutoML to train a model and look into evaluation. This model aims to predict the next hour bitcoin closing based on data from the current hour, but can probably be extended in several ways – how would you extend it?

Best regards,

Amund Tveit

DISCLAIMER: this blog only represent my PERSONAL opinions and views

Continue Reading

Thoughts on AI replacing coders by 2040

This blog post looks into which methods and technologies that can potentially lead to the replacement of coders in the future, some are of futuristic nature but some are more “low-hanging” wrt automation of (parts of) coding.

The background for this blog post is that researchers Jay Jay Billings, Alexander J. McCaskey, Geoffroy Vallee, and Greg Watson at Oak Ridge Laboratory wrote a paper: Will humans even write code in 2040 and what would that mean for extreme heterogeneity in computing?

Also related to this is (Tesla AI director) Andrej Karpathy’s article: Software 2.0 where he looks back at how AI (primarily Deep Learning) has replaced custom methods for e.g. image and speech recognition, machine translation (++) and generalizes how Deep Learning can further replace a lot of other software in years to come (note: examplified by Google’s recent paper The Case for Learned Index Structures)

1. FACT: Programming environments (IDEs) have barely changed the last 30 years

One of primary purposes of programming is to provide efficient automation, however programming itself is still a highly manual and labour intensive process – except for refactoring the difference between modern IDEs compared to e.g. Turbo Pascal in 1989 is surprisingly small? (Turbo Pascal came out more than 30 years ago and improved gradually towards 1989)

Turbo Pascal 5.5 in 1989 Eclipse in 2017

2. FACT: For (close to) all FUNCTIONS written there exists one or several TESTS for it

For any method already written (or to be written) in any (of the most popular) languages currently used in programming  there already exists a test for it – in the same language or in a similar language (e.g. a C# test for a Java function). The obvious (big) data source for this is all private and public repositories in Github  (100M pull requests merged so far in 2017)

So why are most developers still writing unit tests instead of having an IDE/service find and translate the appropriate tests to the functions they write? (e.g. something along the lines of IntelliTest)

3. FACT: as 2 – For (close to) all TESTS there exists a (set of ) FUNCTIONs they test

Assuming e.g. with Test Driven Development (TDD) – where you are roughly writing the (test of the new) API first – and then alternating between creating in the code to (just enough) fulfill the API.

This seems like it has 2 potential ways of being increasingly automated – on the function writing part.

  1. Search for a set of function that matches a set of tests – instead of writing the functions just write the tests
  2. Automatically generate the code (fragments) to fulfill the test, this can potentially be done in many ways, e.g.
      1. Brute force with a lot of computers (e.g. a few thousand CPUs in the cloud should be more than capable of quickly generating and selecting the best of maybe up to 30-50 increments needed per test writing iteration, this resource could be shared by a large set of programmers). See also the science of brute force.
      2. Using sophisticated AI methods with GPUs – e.g.  Program Synthesis with Deep Learning and earlier AI methods such as variants of John Koza’s Genetic Programming.
      3. Quantum Computer – e.g. Logic Synthesis with Quantum Computing (see also Quantum Development Kit)

4. FACT: (Many?) Data Structures can now be replaced by AI (e.g. Bloom Filters & B-Trees)

Google – with Tim KraskaAlex BeutelEd H. ChiJeffrey DeanNeoklis Polyzotis published an interesting paper: The Case for Learned Index Structures where they showed that traditional and popular data structures such as B-Tree, Hash Index and Bloom Filter can be with advantage be replaced by AI Trained Index Structures. This is from my perspective pretty groundbreaking research and will be interesting to see what the use cases can be towards program synthesis (e.g for other categories of data structures and also logic operating on data structures). Key result figures from their paper:

 

5. FACT: Formal Methods are starting to work and can be used to support automation of code

Amazon has used formal methods since 2011 for Amazon Web Services, e.g. to ensure the quality of AWS S3 – see Use of Formal Methods at Amazon for an overview. Facebook is using formal methods for mobile application quality assurance, see Moving Fast with Software Verification. Formal methods has also been used to validate various blockchain technologies (potentially Bitcoin), see e.g. A Temporal Blockchain: A Formal Analysis, Blockchain Protocol Analysis and Security Engineering 2017, Validation of Decentralised Smart Contracts Through Game Theory and Formal Methods

A few years back I participated in a research project – Envisage (Engineering Virtualized Services) – where formal methods were used to prove that the TimSort algorithm was buggy in a lot of languages  (See Hacker News story below). Formal methods can potentially be used together with code generating methods to ensure that what is generated is correct.

Conclusion

Have presented various technologies that might play a role in automating coding forwards. A potential interesting direction on the generative side for coding is to use sequence to sequence deep learning in combination with GAN for synthesis, see e.g. Text2Action: Generative Adversarial Synthesis from Language to Action – for program synthesis this looks like a

Best regards,

Amund Tveit

 

Continue Reading

New Publications in Deep Learning Publication Navigator

Long overdue update of new publications in Deep Learning Publication Navigator (ai.amundtveit.com) – for now the easiest way to discover new publications is probably to convert screenshots (number of papers) per category in the before and after update screenshots below.

Examples of keywords (from publication title) with (several) new Deep Learning publications are:

  1. 3D
  2. Acoustic
  3. Active learning
  4. Adaptive
  5. Adversarial (123 new papers since last update, due to significant activity in GAN Research)
  6. Alzheimer’s (22 new papers related to a disease that cost more than a quarter trillion US$ annually to treat in the USA)
  7. Anomaly detection
  8. Autoencoders
  9. Bayesian
  10. Biomedical
  11. Chinese
  12. Clinical
  13. Collaborative filtering (e.g. for recommender systems)
  14. Dataset
  15. EEG (electric brain signals)
  16. Ensemble
  17. +++++ (many more!)

If you have feature ideas or other requests for Deep Learning Publication Navigator, feel free to reach out.

Best regards,

Amund Tveit

After update (with new papers):

Before update (without new papers):

 

Continue Reading

Serverless Thrift APIs in Python on AWS Lambda

I recently wrote a blog post called Serverless Thrift APIs in Python with AWS Lambda on the Zedge corporate blog, underneath is a reposting of it:

This blog post shows a basic example of a Serverless Thrift API with Python for AWS Lambda and AWS API Gateway.

1. Serverless Computing for Thrift APIs?

Serverless computing – also called Cloud Functions or Functions as a Service (FaaS) – is an interesting type of cloud service due to its simplicity. An interpretation of serverless computing is that you (with relatively low effort):

  1. Deploy only the function needing to do the work
  2. Only pay per request to the function
    1. With the notable exception of other cloud resources used, e.g. storage
  3. Get some security setup automation/support (e.g. SSL and API keys)
  4. Get support for request throttling (e.g. QPS) and quotas (e.g. per month)
  5. Get (reasonably) low latency – when the serverless function is kept warm
  6. Get support for easily setting up caching
  7. Get support for setting up custom domain name
  8. Lower direct (cloud costs) and indirect (management) costs?

These are characteristics that in my mind make Serverless computing an interesting infrastructure to develop and deploy Thrift APIs (or other types of APIs) for.
Perhaps over time even Serverless will be preferred over (more complex) container (Kubernetes/Docker) or virtual machine based (IaaS) or PaaS solutions for APIs?

2. Example Cloud Vendors providing Serverless Services

  1. AWS Lambda in combination with AWS API Gateway
  2. Google Cloud Functions
  3. IBM Bluemix Openwhisk (Apache Openwhisk)
  4. Microsoft Azure Functions

Since Python is a key language in my team, for this initial test I choose the AWS option also since I am most familiar with AWS and the open source tooling for AWS was best wrt Python (runner up was Microsoft Azure Functions).

3. Thrift (over HTTPS) on AWS Lambda and API Gateway with Python

This shows an example of the (classic) Apache Thrift tutorial Calculator API running on AWS Lambda and API Gateway, the service requires 2 thrift files:

  1. tutorial.thrift
  2. shared.thrift
3.1 Development Environment and Tools

The tool used for deployment in this blog post is Zappa, I recommend using Zappa together with Docker for Python 3.6 as described in this blog post, with a slight change of the Dockerfile if you want to build and compile Apache thrift Python library yourself, here is the altered Dockerfile. There hasn’t been official releases of Apache Thrift since 0.10.0 January 6th 2017, and there has been important improvement related to its Python support since last release – in particular the fix for supporting recursive thrift structs in Python

a. Dockerfile – for creating a Zappashell (same as Lambda runtime ) and builds Thrift

After building this Dockerfile (see command on top of file) and adding zappashell to your .bash_profile like this (source: the above mentioned blog post)

You can start your serverless deployment environment with the command zappashell (inside an new empty directory on your host platform e.g. a mac), this gives something like this – with an empty directory.

Install virtualenv and create/activate an environment(and assuming you installed thrift as shown in Dockerfile above)

Use thrift to generate python code for tutorial.thrift and shared.thrift

Convert the gen-py package into a python library (for convenient packaging) with a setup.py file as below (change version according to your wants)

setup.py

Copy the generated thrift library – note: thrift itself not the tutorial code – (ref thriftSomeVersion.tar.gz generated by python setup.by sdist in Dockerfile) to the same directory and add it to requirements.txt

requirements.txt should look something like this:

Run pip install -r requirements.txt

Create app.py that has code for calculator thrift

Create .aws directory with files:

credentials

config

Run zappa init and answers questions, it should look something like the image below:

you should now be able to deploy the API with

You can test the deployed API with the following client, remember to change the https address to the address that the deploy gave you

But wait, something is missing, this API is reachable by anyone. Let us add an API key (and update the client with the x-api-key). This can be done through AWS Console (and perhaps with Zappa itself through automation soon?) with the following steps:

Go to Amazon API Gateway Console and click on the generated API (perhaps named task-beta due to the Docker file path and the selected stage during zappa init)

Create a Usage Plan and associate it with the API (e.g. task-beta), then create an API Key (on the left side menu) and attach the API Key to the Usage Plan

Do a zappa update dev and and uncomment/update the transport.setCustomHeaders with x-api-key in the python client above to get authentication and throttling in place.

4. Conclusion

Have shown an example of getting thrift API running on Serverless that can relatively easily be automated, and when the API is initially created it is very little effort to update it (e.g. through continuous deployment).

A final note on roundtrip time performance, based on a few rough tests it looks like the roundtrip time for calls to API is around 300-400 milliseconds (with the test client based in Trondheim, Norway and accessing API Gateway in AWS and AWS Lambda in Germany), which is quite good. Believe that with an AWS Route53 Routing Policy one could have automatic selection of the closest AWS API Gateway/Lambda to get the lowest latency (note that one of the selections in zappa init was to deploy globally, but default was one availability zone).

Believe personally that Serverless computing has a strong future ahead wrt API development, and look forward to what cloud vendors software engineers/product managers add of new features, my wish list is:

  1. Strong Python support
  2. Built-in Thrift support and service discovery, as well as support for other RPC systems, e.g. gRPC, messagepack,++
  3. Improved software tooling for automation (e.g. simplified SSL/domain name setup/maintenance handling – get deep partnerships with letsencrypt.com for SSL certificates?)
  4. Increase caching support at various levels

Best regards,

Amund Tveit

Continue Reading

Creative AI on the iPhone (with GAN) and Dynamic Loading of CoreML models

Zedge summer interns developed a very cool app using ARKit and CoreML (on iOS11). As parts of their journey the published 2 blog posts on the Zedge corporate web site related to:

  1. How to develop and run Generative Adversarial Networks (GAN) for Creative AI on the iPhone using Apple’s CoreML tools, check out their blog post about it.
  2. Deep Learning models (e.g. for GAN) can take a lot of space on a mobile device (tens of Megabytes to perhaps even Gigabytes), in order to keep initial app download size relatively low it can be useful to dynamically load only the models you need. Check out their blog post about various approaches for hotswapping CoreML models.

Best regards,

Amund Tveit 

Continue Reading

Convolutional Neural Networks for Self-Driving Cars

This blog post are my notes from project 3 in the term 1 of the Udacity Nanodegree in Self Driving cars. The project is about developing and training a convolutional neural network of camera input (3 different camera angles) from a simulated car.

Best regards,

Amund Tveit

1. Modelling Convolutional Neural Network for Self Driving Car

Used the NVIDIA Autopilot Deep Learning model for self-driving as inspiration (ref: paper “End to End Learning for Self-Driving Cars” – https://arxiv.org/abs/1604.07316 and implementation of it: https://github.com/0bserver07/Nvidia-Autopilot-Keras), but did some changes to it:

  1. Added normalization in the model itself (ref Lambda(lambda x: x/255.0 – 0.5, input_shape=img_input_shape)), since it is likely to be faster than doing it in pure Python.
  2. Added Max Pooling after the first convolution layers, i.e. making the model a more “traditional” conv.net wrt being capable of detecting low level features such as edges (similar to classic networks such as LeNet).
  3. Added Batch Normalization in early layers to be more robust wrt different learning rates
  4. Used he_normal normalization (truncated normal distribution) since this type of normalization with TensorFlow has earlier mattered a lot
  5. Used L2 regularizer (ref: “rule of thumb” – https://www.quora.com/What-is-the-difference-between-L1-and-L2-regularization-How-does-it-solve-the-problem-of-overfitting-Which-regularizer-to-use-and-when )
  6. Made the model (much) smaller by reducing the fully connected layers (got problems running larger model on 1070 card, but in retrospect it was not the model size but my misunderstandings of Keras 2 that caused this trouble)
  7. Used selu (ref: paper “Self-Normalizing Neural Networks” https://arxiv.org/abs/1706.02515) instead of relu as rectifier functions in later layers (fully connected) – since previous experience have shown (with traffic sign classification and tensorflow) showed that using selu gave faster convergence rates (though not better final result).
  8. Used dropout in later layers to avoid overfitting
  9. Used l1 regularization on the final layer, since I’ve seen that it is good for regression problems (better than l2)

Image of Model model Image

Detailed model

####2. Attempts to reduce overfitting in the model

The model contains dropout layers in order to reduce overfitting (ref dropout_1 and dropout_2 in figure above and train_car_to_drive.ipynb).

Partially related: Used also balancing of data sets in the generator, see sample_weight in generator function and snippet below

The model was tested by running it through the simulator and ensuring that the vehicle could stay on the track. See modelthatworked.mp4 file in this github repository.

####3. Model parameter tuning

The model used an adam optimizer, so the learning rate was not tuned manually

####4. Appropriate training data

Used the training data that was provided as part of the project, and in addition added two runs of data to avoid problems (e.g. curve without lane line on the right side – until the bridge started and also a separate training set driving on the bridge). Data is available on https://amundtveit.com/DATA0.tgz).

###Model Architecture and Training Strategy

####1. Solution Design Approach

The overall strategy for deriving a model architecture was to use a conv.net, first tried the previous one I used for Traffic Sign detection based on LeNet, but it didn’t work (probably too big images as input), and then started with the Nvidia model (see above for details about changes to it).

In order to gauge how well the model was working, I split my image and steering angle data into a training and validation set. Primary finding was that numerical performance of the models I tried was not a good predictor of how well it would it perform on actual driving in the simulator. Perhaps overfitting could be good for this task (i.e. memorize track), but I attempted to get a correctly trained model without overfitting (ref. dropout/selu and batch normalization). There were many failed runs before the car actually could drive around the first track.

 

2. Creation of the Training Set & Training Process

I redrove and captured training data for the sections that were problematic (as mentioned the curve without lane lines on right and the bridge and part just before bridge). Regarding center-driving I didn’t get much success adding data for that, but perhaps my rebalancing (ref. generator output above) actually was counter-productive?

For each example line in the training data I generated 6 variants (for data augmentetation), i.e. flipped image (along center vertical axis) + also used the 3 different cameras (left, center and right) with adjustments for the angle.

After the collection process, I had 10485 lines in driving_log.csv, i.e. number of data points = 62430 (6*10485). Preprocessing used to flip image, convert images to numpy arrays and also (as part of Keras model) to scale values. Also did cropping of the image as part of the model. I finally randomly shuffled the data set and put 20 of the data into a validation set, see generator for details. Examples of images (before cropping inside model) is shown below:

Example of center camera image

center Image

Example of flipped center camera image

flippedcenter Image

Example of left camera image

left Image

Example of right camera image

right Image

generator

I used this training data for training the model. The validation helped determine if the model was over or under fitting. The ideal number of epochs was 5 as evidenced by the quick flattening of loss and validation loss (to around 0.03), in earlier runs validation loss increased above training loss when having more epochs. I used an adam optimizer so that manually training the learning rate wasn’t necessary.

3. Challenges

Challenges along the way – found it to be a very hard task, since the model loss and validation loss weren’t good predictors for actual driving performance, also had cases when adding more training data with nice driving data (at the center and far from the edges) actually gave worse results and made the car drive off the road. Other challenges were Keras 2 related, the semantics of parameters in Keras 1 and Keras 2 fooled me a bit using Keras 2, ref the steps_per_epoch. Also had issues with the progress bar not working in Keras 2 in Jupyter notebook, so had to use 3rd party library https://pypi.python.org/pypi/keras-tqdm/2.0.1

Continue Reading

Deep Learning in Energy Production

This blog post has recent publications about use of Deep Learning in Energy Production context (wind, gas and oil), e.g. wind power prediction, turbine risk assessment, reservoir discovery and price forecasting.

Best regards,

Amund Tveit

Wind

Year  Title Author
2017 Short-term Wind Energy Prediction Algorithm Based on SAGA-DBNs  W Fei, WU Zhong
2017 Wind Power Prediction using Deep Neural Network based Meta Regression and Transfer Learning  AS Qureshi, A Khan, A Zameer, A Usman
2017 Wind Turbine Failure Risk Assessment Model Based on DBN  C Fei, F Zhongguang
2017 The optimization of wind power interval forecast  X Yu, H Zang
2016 Deep Learning for Wind Speed Forecasting in Northeastern Region of Brazil  AT Sergio, TB Ludermir
2016 A very short term wind power prediction approach based on Multilayer Restricted Boltzmann Machine  X Peng, L Xiong, J Wen, Y Xu, W Fan, S Feng, B Wang
2016 Short-term prediction of wind power based on deep Long Short-Term Memory  Q Xiaoyun, K Xiaoning, Z Chao, J Shuai, M Xiuda
2016 Deep belief network based deterministic and probabilistic wind speed forecasting approach  HZ Wang, GB Wang, GQ Li, JC Peng, YT Liu
2016 A hybrid wind power prediction method  Y Tao, H Chen
2016 Deep learning based ensemble approach for probabilistic wind power forecasting  H Wang, G Li, G Wang, J Peng, H Jiang, Y Liu
2016 A hybrid wind power forecasting model based on data mining and wavelets analysis  R Azimi, M Ghofrani, M Ghayekhloo
2016 ELM Based Representational Learning for Fault Diagnosis of Wind Turbine Equipment  Z Yang, X Wang, PK Wong, J Zhong
2015 Deep Neural Networks for Wind Energy Prediction  D Díaz, A Torres, JR Dorronsoro
2015 Predictive Deep Boltzmann Machine for Multiperiod Wind Speed Forecasting  CY Zhang, CLP Chen, M Gan, L Chen
2015 Resilient Propagation for Multivariate Wind Power Prediction  J Stubbemann, NA Treiber, O Kramer
2015 Transfer learning for short-term wind speed prediction with deep neural networks  Q Hu, R Zhang, Y Zhou
2014 Wind Power Prediction and Pattern Feature Based on Deep Learning Method  Y Tao, H Chen, C Qiu

Gas

Year  Title Author
2017   Sample Document–Inversion Of The Permeability Of A Tight Gas Reservoir With The Combination Of A Deep Boltzmann Kernel …  L Zhu, C Zhang, Y Wei, X Zhou, Y Huang, C Zhang
2017   Deep Learning: Chance and Challenge for Deep Gas Reservoir Identification  C Junxing, W Shikai
2016   Finite-sensor fault-diagnosis simulation study of gas turbine engine using information entropy and deep belief networks  D Feng, M Xiao, Y Liu, H Song, Z Yang, Z Hu
2015   On Accurate and Reliable Anomaly Detection for Gas Turbine Combustors: A Deep Learning Approach  W Yan, L Yu
2015   A Review of Datasets and Load Forecasting Techniques for Smart Natural Gas and Water Grids: Analysis and Experiments.  M Fagiani, S Squartini, L Gabrielli, S Spinsante
2015   Short-term load forecasting for smart water and gas grids: A comparative evaluation  M Fagiani, S Squartini, R Bonfigli, F Piazza
2015   The early-warning model of equipment chain in gas pipeline based on DNN-HMM  J Qiu, W Liang, X Yu, M Zhang, L Zhang

Oil

Year  Title Author
2017   Development of a New Correlation for Bubble Point Pressure in Oil Reservoirs Using Artificial Intelligent Technique  S Elkatatny, M Mahmoud
2017   A deep learning ensemble approach for crude oil price forecasting  Y Zhao, J Li, L Yu
2016   Automatic Detection and Classification of Oil Tanks in Optical Satellite Images Based on Convolutional Neural Network  Q Wang, J Zhang, X Hu, Y Wang
2015   A Hierarchical Oil Tank Detector With Deep Surrounding Features for High-Resolution Optical Satellite Imagery  L Zhang, Z Shi, J Wu
Continue Reading

Lane Finding (on Roads) for Self Driving Cars with OpenCV

This blog post is a (basic) approach of how to potentially use OpenCV for Lane Finding for self-driving cars (i.e. the yellow and white stripes along the road) – did this as one of the projects of term 1 of Udacity’s self-driving car nanodegree (highly recommended online education!).

Disclaimer: the approach presented in this blog post is way to simple to use for an actual self-driving car, but was a good way (for me) to learn more about (non-deep learning based) computer vision and the lane finding problem.

See github.com/atveit/LaneFindingForSelfDrivingCars for more details about the approach (python code)

Best regards,

Amund Tveit

Lane Finding (On Roads) for Self Driving Cars with OpenCV

1. First I selected the region of interest (with hand-made vertices)

2. Converted the image to grayscale

3. Extracted likely white lane information from the grayscale image.

Used 220 as limit (255 is 100% white, but 220 is close enough)

4. Extracted likely yellow lane information from the (colorized) region of interest image.

RGB for Yellow is [255,255,0] but found [220,220,30] to be close enough

5. Converted the yellow lane information image to grayscale

6. Combined the likely yellow and white lane grayscale images into a new grayscale image (using max value)

7. Did a gaussian blur (with kernel size 3) followed by canny edge detection

Gaussian blur smooths out the image using Convolution, this is reduce false signalling to the (canny) edge detector

8. Did a hough (transform) image creation, I also modified the draw_lines function (see GitHub link above) by calculating average derivative and b value (i.e. calculating y = x-b for all the hough lines to find a and b, and then average over them).

For more information about Hough Transform, check out this hough transformation tutorial.

(side note: believe it perhaps could have been smarter to use hough line center points instead of hough lines, since the directions of them seem sometimes a bit unstable, and then use average of derivatives between center points instead)

9. Used the weighted image to overlay the hough image with lane detection on top of the original image

Continue Reading
1 2 3 6